Details
-
Bug
-
Status: Confirmed (View Workflow)
-
Critical
-
Resolution: Unresolved
-
11.5(EOL), 11.6(EOL), 11.7(EOL), 11.8
-
Can result in unexpected behaviour
Description
The documenting task is now handled by:
https://mariadbcorp.atlassian.net/browse/DOCS-6015
In MDEV-25829 and MDEV-19123, the default character set and collation were changed from the long-time default latin1 and latin1_swedish_ci to utf8mb4 and utf8mb4_uca1400_ai_ci.
As far as I understand, this has the following implications:
- Because the collation utf8mb4_uca1400_ai_ci was not implemented before
MDEV-27009, replication from MariaDB Server 11.8 or later versions to MariaDB Server 10.6 will be impacted. - The storage overhead related to CHAR and VARCHAR columns may be significantly increased. This includes persistent storage when the columns contain latin1 code points outside ASCII.
- Some comparison operations are significantly slower with utf8mb4_uca1400_ai_ci than with the old default collation latin1_swedish_ci, which worked on fixed-width character encoding and was accent-sensitive. MDEV-34427 is just one example.
- Some applications that may have relied on the old default collation could be broken; see MDEV-36286 for an example.
It would be useful if https://mariadb.com/docs/release-notes/community-server/11.8/what-is-mariadb-118 documented how to configure MariaDB Server 11.8 or later to use the same default character set and collation as 11.4 or older releases. It would also be useful to include a warning that such configuration is advisable when attempting replication to MariaDB Server 10.6.
Why we changed the default character set from latin1 to utf8mb4
- Help people all around the world use MariaDB server out of the box without additional configuration of character set and collation. Old defaults worked fine for West European languages only.
- People all around the world use supplementary characters such as Emoji. Old defaults with latin1 did not allow to store Emoji.
- For better MySQL-8.0 compatibility
Why we changed the default collations for Unicode character set from xxx_general_ci to xxx_uca1400_ai_ci
- The old default collation xxx_general_ci (e.g. utf8mb4_general_ci) considered all supplementary characters (with Unicode code point >=U+10000) as equal to each other. The new default collation xxx_uca1400_ai_ci (e.g. utf8mb4_uca1400_ai_ci) works with supplementary characters correctly
- The old default collation xxx_general_ci is a simplified collation. It does support things like character expansions and character contractions. The new default collation xxx_uca1400_ai_ci provides a better comparison and sorting order because it supports expansions and contractions from DUCET (Default Unicode Collation Element Table). For example, German character ß (U+00DF LATIN SMALL LETTER SHARP S) is correctly compared as equal to the combination of two letters "ss".
Restoring to the old defaults
This change of the defaults will make the data files incompatible with 10.6 (because 10.6 is missing MDEV-27009) and potentially slightly increase the storage and CPU consumption.
To return to the old defaults please edit your my.cnf file as follows
[mysqld]
|
character-set-server=latin1
|
collation-server=latin1_swedish_ci
|
character-set-collations=''
|
Attachments
Issue Links
- relates to
-
MDEV-19123 Change default charset from latin1 to utf8mb4
-
- Closed
-
-
MDEV-25829 Change default Unicode collation to uca1400_ai_ci
-
- Closed
-
-
MDEV-27009 Add UCA-14.0.0 collations
-
- Closed
-
-
MDEV-34427 BNL-H has not optimal implementation for varchar type
-
- Open
-
-
MDEV-36286 Inconsistent query results for LONGTEXT and TINYTEXT
-
- Confirmed
-
- links to