[MDEV-8334] Rename utf8 to utf8mb3 Created: 2015-06-18 Updated: 2023-11-22 Resolved: 2021-05-19 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Character Sets |
| Fix Version/s: | 10.6.1 |
| Type: | Task | Priority: | Blocker |
| Reporter: | Alexander Barkov | Assignee: | Oleksandr Byelkin |
| Resolution: | Fixed | Votes: | 8 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Sub-Tasks: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
Currently MariaDB's has two utf8 character sets:
In long terms we want the name utf8 mean the full featured UTF-8. 1. Change the main name of the 3-byte character set from "utf8" to "utf8m3" and make "utf8" alias for "utf8mb3". This will change all SHOW and INFORMATION_SCHEMA output to display utf8mb3 instread of utf8, as well as change mysqldump to dump utf8mb3 instead of just utf8. 2. Add a new server option, say --utf8-is-utf8mb3, which will be true by default, but the DBA will be able to change it to false and thus make "utf8" mean "utf8mb4". 3. A few releases later we'll change --utf8-is-utf8mb3 to be "false" by default. Or 2. do not add any new server options and Or Do not add any new server options and implement charset aliases via the SQL standard statement:
Alternative solutionOriginally, there were two reasons to have two utf8 implementations:
So we could have just one "utf8", with the following aliases:
After the upgrade, SHOW for old tables with the 3-byte utf8 could be displayed about like this:
where is_bmp_only() is a new built-in function to test if a string contains only Basic Multilingual Plane characters and returning:
The exact API for the constrain function may be different, e.g. it could test for an arbitrary Unicode character range (not only BMP vs non-BMP). This could be useful for other purposes as well. Open questions:
|
| Comments |
| Comment by Alexander Barkov [ 2018-11-26 ] |
|
ralf.gebhardt@mariadb.com, this is a good idea. We can keep this MDEV as a "super task", and have individual three tasks for every step. |
| Comment by Todd Michael [ 2019-05-17 ] |
|
This might both conflict with and agree with the long-term usage envisaged for MySQL ... : https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-sets.html ------------------------------------------------------------------------------- |
| Comment by Nuno [ 2019-07-02 ] |
|
Hello, Is this going to fix/improve the fact that, many times, when inserting 150 characters on a VARCHAR(200) returns a truncation error? This happens when using utf8 or utf8mb4. Thank you. |
| Comment by Alexander Barkov [ 2019-10-30 ] |
|
julien.fritsch, yes, I want to finish it before beta. |
| Comment by Rick James [ 2020-01-31 ] |
|
I fear is that upgrades will fail. And downgrades will be problematic. Think about these issues when changing the meaning of utf8, even if it is to the equivalent utf8mb3. Also be aware that doing something different than Oracle will lead to a lot of grief when people try to move from (or to) MySQL. |
| Comment by Martin Häcker [ 2020-08-25 ] |
|
I'm getting lots of change mail from this bug, but I don't see any changes. Is there a script running amok here perhaps? |
| Comment by Julien Fritsch [ 2020-08-25 ] |
|
dwt you are getting all those emails from this task, because bar is working on it and is updating the description. If you don't want to get those, you can stop to watch it. |
| Comment by Oleksandr Byelkin [ 2020-11-17 ] |
|
The plan is:
|
| Comment by Nuno [ 2020-11-17 ] |
|
Guys, I just want to ask, |
| Comment by Sergei Golubchik [ 2020-11-17 ] |
|
Correct. |
| Comment by Rucha Deodhar [ 2021-04-17 ] |
|
PR for mariadb-connector-c: https://github.com/mariadb-corporation/mariadb-connector-c/pull/169 |
| Comment by Marko Mäkelä [ 2021-04-21 ] |
|
The Connector/C part has apparently been applied. I merged it to 10.6 and adjusted tests/mysql_client_test.c accordingly. |
| Comment by Oleksandr Byelkin [ 2021-04-22 ] |
|
OK to push |
| Comment by Todd Michael [ 2021-04-29 ] |
|
See new documentation on OLD_MODE for more info: |
| Comment by Sergei Golubchik [ 2021-05-11 ] |
|
commit 3072ba1b7ca is ok to push, thanks! |
| Comment by Martin Häcker [ 2021-05-19 ] |
|
As the guy who triggered all of this with a bug report many years ago - after all this time - I just want to say thank you for the work you guys put in to make this happen. Stopping the confusion of utf8 (utf8mb3) with utf8mb4 in MariaDB is a huge thing and still something I have to fight all the time because people just miss it. This will help a lot! Thanks! |
| Comment by Roel Van de Paar [ 2021-06-15 ] |
|
See |