[MDEV-22217] Make OS character sets "utf8" and "utf-8" map to MariaDB character set "utf8mb4" Created: 2020-04-10  Updated: 2020-04-12

Status: Open
Project: MariaDB Server
Component/s: Character Sets
Fix Version/s: None

Type: Task Priority: Major
Reporter: Geoff Montee (Inactive) Assignee: Ralf Gebhardt
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-8334 Rename utf8 to utf8mb3 Closed

 Description   

The OS character sets "utf8" and "utf-8" currently map to the MariaDB character set "utf8". In MariaDB, the "utf8" character set refers to the incomplete 3-byte version of the UTF-8 standard (which has "utf8mb3" as an alias").

It may be more appropriate if the OS character sets "utf8" and "utf-8" instead mapped to the MariaDB character set "utf8mb4". That way, UTF-8 clients would get access to the full UTF-8 standard by default in MariaDB.

MySQL 8.0 has already made this change:

The OS character set is mapped to the closest MySQL character set if there is no exact match. If the client does not support the matching character set, it uses the compiled-in default. For example, utf8 and utf-8 map to utf8mb4, and ucs2 is not supported as a connection character set, so it maps to the compiled-in default.

https://dev.mysql.com/doc/refman/8.0/en/charset-connection.html

For example, see here for MariaDB's current behavior:

geoff@geoff-Razer-Blade-Stealth-13:~$ printenv LANG
en_US.UTF-8
geoff@geoff-Razer-Blade-Stealth-13:~$ mariadb --host mydb.skysql.net --user myuser --password --ssl-ca ./skysql_chain.pem
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 3663
Server version: 10.4.12-6-MariaDB-enterprise-log MariaDB Enterprise Server
 
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
 
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
MariaDB [(none)]> SHOW SESSION VARIABLES 
  WHERE Variable_name 
  IN('character_set_client', 'character_set_connection', 'character_set_results', 'collation_connection');
+--------------------------+-----------------+
| Variable_name            | Value           |
+--------------------------+-----------------+
| character_set_client     | utf8            |
| character_set_connection | utf8            |
| character_set_results    | utf8            |
| collation_connection     | utf8_general_ci |
+--------------------------+-----------------+
4 rows in set (0.048 sec)

We can see the relevant mapping in the code here:

https://github.com/MariaDB/server/blob/mariadb-10.5.2/mysys/charset.c#L1384


Generated at Thu Feb 08 09:13:05 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.