Goal of this task is to set default global variables to 4 bytes utf8 charset
- character_set_client : from from utf8 to utf8mb4.
- character_set_database : from latin1 to utf8mb4
- character_set_server : from latin1 to utf8mb4
- character_set_results: from utf8 to utf8mb4
- character_set_connection: from utf8 to utf8mb4
- collation_database: from latin1_swedish_ci to utf8mb4_general_ci
- collation_server: from latin1_swedish_ci to utf8mb4_general_ci
Default changed in mysql 8.0.1
There are some questions which should be discussed before/while working on this task:
- Should we change the default collation for utf8mb4 from utf8mb4_general_ci to uca1400_ai_ci? The problem is that utf8mb4_general_ci is very bad for non-BMP characters - it considers all non-BMP charcters as equal to each other. See MDEV-25829
- Should we reassign the UTF8 Linux Locale from utf8mb3 to utf8mb4 in the client? Or to what the server side uses as the alias for "utf8". See MDEV-19123
- Should we change system_charset_info from utf8mb3 to utf8mb4 and allow non-BMP characters in identifiers?
- If so, table name to file name encoding should be extended to support non-BMP characters. See MDEV-27490
- system charset cannot be utf8mb4 until we fix the collation as above
- Should we change numerous INFORMATION_SCHEMA columns from utf8mb3 to utf8mb4?
- they should be in the system_charset_info, as they store identifiers