[MDEV-19123] Change default charset from latin1 to utf8mb4 Created: 2019-04-01 Updated: 2023-12-22 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | Character Sets |
| Fix Version/s: | 11.5 |
| Type: | New Feature | Priority: | Critical |
| Reporter: | Diego Dupin | Assignee: | Alexander Barkov |
| Resolution: | Unresolved | Votes: | 9 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
Goal of this task is to set default global variables to 4 bytes utf8 charset
Default changed in mysql 8.0.1 There are some questions which should be discussed before/while working on this task:
|
| Comments |
| Comment by Dario Seidl [ 2019-07-02 ] |
|
Please consider making this change. utf8mb4 is really the most sensible default nowadays. As pointed out, MySQL 8 also made the switch. |
| Comment by Otto Kekäläinen [ 2020-06-25 ] |
|
In the 10.5 we switched to UTFMB4 by default for new databases in https://github.com/MariaDB/server/commit/7c2079f600bacbd4d24762159550b3d40ad856c1 but then reverted in https://github.com/MariaDB/server/commit/039cb6f6bfaaeafeb87e6d10c88be2cac87654e7 |
| Comment by Sergei Golubchik [ 2020-06-25 ] |
|
No, wasn't reverted, only the client charset was reverted, it did not affect how the data is stored. |
| Comment by Otto Kekäläinen [ 2021-10-31 ] |
|
Is this still relevant? In MariaDB 10.6 the default charset was already changed to utf8mb3, which solves most of the issues people had and why many switched to utf8mb4 earlier? https://mariadb.com/kb/en/unicode/ |
| Comment by Sergei Golubchik [ 2021-11-01 ] |
|
no, the default hasn't been changed, iirc. The meaning of "utf8" was. This task is about making the change upstream |
| Comment by Otto Kekäläinen [ 2021-11-02 ] |
|
Roger that, full UTF-8 (=utfmb4) is indeed needed to support emojis (e.g. U+01F4A9 PILE OF POO �) and other characters in the full UTF-8 spec. For followers of this Jira, the post https://mathiasbynens.be/notes/mysql-utf8mb4 is a good explanation of the topic (though it does not mention utf8mb3). |
| Comment by Dario Seidl [ 2021-11-03 ] |
I agree, it should really not be `utf8mb4_general_ci`, that one sacrifices correctness for performance and that shouldn't be the default. There's `utf8mb4_unicode_ci` which correctly implements Unicode sorting, and the newer `utf8mb4_unicode_520_ci` with updated weight keys. MySQL also has the even newer `utf8mb4_0900_ai_ci` which doesn't exist in MariaDB yet, I think. |