Details
-
New Feature
-
Status: Open (View Workflow)
-
Critical
-
Resolution: Unresolved
-
None
Description
Goal of this task is to set default global variables to 4 bytes utf8 charset
meaning :
- character_set_client : from from utf8 to utf8mb4.
- character_set_database : from latin1 to utf8mb4
- character_set_server : from latin1 to utf8mb4
- character_set_results: from utf8 to utf8mb4
- character_set_connection: from utf8 to utf8mb4
- collation_database: from latin1_swedish_ci to utf8mb4_general_ci
- collation_server: from latin1_swedish_ci to utf8mb4_general_ci
Default changed in mysql 8.0.1
There are some questions which should be discussed before/while working on this task:
- Should we change the default collation for utf8mb4 from utf8mb4_general_ci to uca1400_ai_ci? The problem is that utf8mb4_general_ci is very bad for non-BMP characters - it considers all non-BMP charcters as equal to each other. See MDEV-25829
- Should we reassign the UTF8 Linux Locale from utf8mb3 to utf8mb4 in the client? Or to what the server side uses as the alias for "utf8". See MDEV-19123
- Should we change system_charset_info from utf8mb3 to utf8mb4 and allow non-BMP characters in identifiers?
- If so, table name to file name encoding should be extended to support non-BMP characters. See MDEV-27490
- system charset cannot be utf8mb4 until we fix the collation as above
- Should we change numerous INFORMATION_SCHEMA columns from utf8mb3 to utf8mb4?
- they should be in the system_charset_info, as they store identifiers
Attachments
Issue Links
- blocks
-
MDEV-30041 don't set utf8_is_utf8mb3 by default in the old-mode
-
- Open
-
- is blocked by
-
MDEV-22981 Bad "default-character-set" option in [client] option group 50-client.cnf on Debian/Ubuntu
-
- Closed
-
-
MDEV-25829 Change default collation to utf8mb4_uca1400_ai_ci
-
- In Review
-
-
MDEV-27009 Add UCA-14.0.0 collations
-
- Closed
-
-
MDEV-29446 Change SHOW CREATE TABLE to display default collations
-
- Closed
-
-
MDEV-30556 UPPER() returns an empty string for U+0251 in Unicode-5.2.0+ collations for utf8
-
- Closed
-
-
MDEV-30577 Case folding for uca1400 collations is not up to date
-
- Closed
-
-
MDEV-30661 UPPER() returns an empty string for U+0251 in uca1400 collations for utf8
-
- Closed
-
- is duplicated by
-
MDEV-17662 Default to UTF8
-
- Closed
-
- relates to
-
MDEV-8872 Performance regressions with utf8mb4 vs utf8 in WordPress
-
- Closed
-
-
MDEV-7128 Configuring charsets or collations as utf8 yields surprising result and leads to data loss
-
- Closed
-
-
MDEV-8334 Rename utf8 to utf8mb3
-
- Closed
-
-
MDEV-8872 Performance regressions with utf8mb4 vs utf8 in WordPress
-
- Closed
-
-
MDEV-27490 Allow full utf8mb4 for identifiers
-
- Stalled
-
-
MDEV-29414 Map utf8 OS locales to utf8mb4
-
- Open
-