Details
-
Bug
-
Status: Closed (View Workflow)
-
Minor
-
Resolution: Not a Bug
-
None
-
None
Description
Using:
- Linux / MariaDB 10.11.8
- Windows MariaDB 10.11.6
- Windows MariaDB 11.4.2
The Unicode 14.0 collations have the character set values set to NULL.
SELECT * FROM information_schema.COLLATIONS |
WHERE CHARACTER_SET_NAME='utf8mb4' |
OR CHARACTER_SET_NAME IS NULL |
ORDER BY COLLATION_NAME ASC; |
Recently I was finally able to upgrade to a version of MariaDB on a server that supports Unicode 14.0. I prefer low-level programing whenever possible for control and performance reasons. So I directly querying the information_schema table instead of referencing SHOW for example. It doesn't seem right that the CHARACTER_SET_NAME values for the Unicode 14.0 collations are NULL. I don't imagine that the newer collations suddenly don't require character set support.
This introduces a minor inconvenience for a collation tool I created though I've dealt with much worse from other software. I wouldn't mind some insight in to the correlation and why these values are NULL.
Attachments
Issue Links
- is caused by
-
MDEV-27009 Add UCA-14.0.0 collations
-
- Closed
-
The logic here that collation as such does not have an id. Collation is how to compare characters, where 'a' is less than, greater than, or equal to 'ä'. Character set is how to store characters, whether 'ä' is x'E4' or x'C3A4'.
Every valid character set + collation combination has a unique id. But since 10.10 MariaDB has collations that apply to many character sets. For example, collation uca1400_latvian_ai_ci applies to utf8mb4, to utf16, utf32, etc. In all these different character sets you'll have exactly the same character comparison rules, because it's exactly the same collation.
So since 10.10 collations no longer have unique ids, because since 10.10 collation name no longer have to include a character set.
You need to use COLLATION_CHARACTER_SET_APPLICABILITY table, it lists character set + collation combination, and every such combination has a unique id. Old collations, that apply only to one character set, have a unique id shown already in COLLATIONS table.