Details
-
Task
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
None
Description
We're planning to change default collation for Unicode character sets to uca1400_ai_ci soon.
A hard coded change in the collation library is not desirable, and is hardly even possible.
It should be implemented on the server (rather than collation library) level, to set default collation overrides per-session:
- For old-to-new replication
- Some users may still want to use the legacy xxx_general_ci instead of uca1400_ai_ci.
- We'll likely add more UCA versions in the future, so users will also want to choose the desired default UCA version.
- Some users will want to have accent sensitive comparison by default, i.e. uca140_as_ci instead of uca1400_ai_ci.
A new system variable @@character_set_collations
The easiest way to solve all these problems is introducing a new system variable, in this format:
'charset1=collation1,charset2=collation2,charset3=collation3'.
That is, a comma separated list of charset/collation pairs.
For example:
SET @@character_set_collations='utf8mb4=utf8mb4_uca1400_ai_ci, utf8mb3=utf8mb4_uca1400_ai_ci'; |
The new variable will understand contextually typed collations:
# No needs for the charset prefix in the collation name: |
SET @@character_set_collations='utf8mb4=uca1400_ai_ci, utf8mb3=uca1400_ai_ci'; |
The new variable will be global and session. The global value should be settable in the server command line and in my.cnf.
Convenience syntax (postponed)
Originally there was a proposal to add a convenience syntax, however we won't add it under terms of this task:
Let's additionally introduce a nicer syntax to set this variable in a more readable way:
SET CHARACTER SET COLLATION uca1400_ai_ci FOR utf8mb3, utf8mb4, ucs2, utf16, utf32;
Usage example:
SET @@character_set_collations='utf8mb4=uca1400_ai_ci'; |
CREATE OR REPLACE TABLE t1 (a TEXT CHARACTER SET utf8mb4); |
SHOW CREATE TABLE t1; |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
| Table | Create Table |
|
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
| t1 | CREATE TABLE `t1` (
|
`a` text CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci DEFAULT NULL
|
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_swedish_ci |
|
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|
Notice, the column was created with the collation utf8mb4_uca1400_ai_ci, which is the current session default collation for utf8mb4 as stated in @@character_set_collations.
The new variable will take effect in all cases where a character set is explicitly or implicitly specified without an explicit COLLATE clause, including but not limited to:
- Column collation
- Table collation
- Database collation
- CHAR(expr USING csname)
- CONVERT(expr USING csname)
- CAST(expr AS CHAR CHARACTER SET csname)
- '' - character string literal
- _utf8mb3'text' - a character string literal with an introducer
- _utf8mb3 X'61' - a character string literal with an introducer with hex notation
- _utf8mb3 0x61 - a character string literal with an introducer with hex hybrid notation
- @@collation_connection after a SET NAMES without COLLATE
Attachments
Issue Links
- blocks
-
MDEV-25829 Change default Unicode collation to uca1400_ai_ci
- Closed
- is caused by
-
MDEV-33182 Server assertion fails when trying to test the connection with DBeaver
- Closed
- relates to
-
MDEV-31748 Reuse Simple_tokenizer
- Open
-
MDEV-31739 Document @@character_set_collations (MDEV-30164)
- Closed
-
MDEV-32192 @@character_set_collations to resolve IMPLICIT collation on JOINs
- Closed