Details
-
New Feature
-
Status: In Testing (View Workflow)
-
Critical
-
Resolution: Unresolved
-
None
Description
The OS character sets "utf8" and "utf-8" currently map to the MariaDB character set "utf8". In MariaDB, the "utf8" character set refers to the incomplete 3-byte version of the UTF-8 standard (which has "utf8mb3" as an alias").
It may be more appropriate if the OS character sets "utf8" and "utf-8" instead mapped to the MariaDB character set "utf8mb4". That way, UTF-8 clients would get access to the full UTF-8 standard by default in MariaDB.
MySQL 8.0 has already made this change:
The OS character set is mapped to the closest MySQL character set if there is no exact match. If the client does not support the matching character set, it uses the compiled-in default. For example, utf8 and utf-8 map to utf8mb4, and ucs2 is not supported as a connection character set, so it maps to the compiled-in default.
https://dev.mysql.com/doc/refman/8.0/en/charset-connection.html
For example, see here for MariaDB's current behavior:
geoff@geoff-Razer-Blade-Stealth-13:~$ printenv LANG
|
en_US.UTF-8
|
geoff@geoff-Razer-Blade-Stealth-13:~$ mariadb --host mydb.skysql.net --user myuser --password --ssl-ca ./skysql_chain.pem
|
Welcome to the MariaDB monitor. Commands end with ; or \g.
|
Your MariaDB connection id is 3663
|
Server version: 10.4.12-6-MariaDB-enterprise-log MariaDB Enterprise Server
|
|
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
|
|
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
|
|
MariaDB [(none)]> SHOW SESSION VARIABLES
|
WHERE Variable_name
|
IN('character_set_client', 'character_set_connection', 'character_set_results', 'collation_connection');
|
+--------------------------+-----------------+
|
| Variable_name | Value |
|
+--------------------------+-----------------+
|
| character_set_client | utf8 |
|
| character_set_connection | utf8 |
|
| character_set_results | utf8 |
|
| collation_connection | utf8_general_ci |
|
+--------------------------+-----------------+
|
4 rows in set (0.048 sec)
|
We can see the relevant mapping in the code here:
https://github.com/MariaDB/server/blob/mariadb-10.5.2/mysys/charset.c#L1384
Attachments
Issue Links
- is duplicated by
-
MDEV-29414 Map utf8 OS locales to utf8mb4
- Closed
- relates to
-
MDEV-8334 Rename utf8 to utf8mb3
- Closed