[MDEV-34631] Internal Unicode 14.0 collation character sets are NULL - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Minor
Resolution: Not a Bug
Affects Version/s: None
Fix Version/s: N/A
Component/s: Character Sets, Information Schema
Labels:
None

Description

Using:

Linux / MariaDB 10.11.8
Windows MariaDB 10.11.6
Windows MariaDB 11.4.2

The Unicode 14.0 collations have the character set values set to NULL.

SELECT * FROM information_schema.COLLATIONS

WHERE CHARACTER_SET_NAME='utf8mb4'

OR CHARACTER_SET_NAME IS NULL

ORDER BY COLLATION_NAME ASC;

Recently I was finally able to upgrade to a version of MariaDB on a server that supports Unicode 14.0. I prefer low-level programing whenever possible for control and performance reasons. So I directly querying the information_schema table instead of referencing SHOW for example. It doesn't seem right that the CHARACTER_SET_NAME values for the Unicode 14.0 collations are NULL. I don't imagine that the newer collations suddenly don't require character set support.

This introduces a minor inconvenience for a collation tool I created though I've dealt with much worse from other software. I wouldn't mind some insight in to the correlation and why these values are NULL.

Attachments

Issue Links

is caused by

MDEV-27009 Add UCA-14.0.0 collations

Closed

Activity

Sergei Golubchik added a comment - 2024-07-21 21:20

The logic here that collation as such does not have an id. Collation is how to compare characters, where 'a' is less than, greater than, or equal to 'ä'. Character set is how to store characters, whether 'ä' is x'E4' or x'C3A4'.

Every valid character set + collation combination has a unique id. But since 10.10 MariaDB has collations that apply to many character sets. For example, collation uca1400_latvian_ai_ci applies to utf8mb4, to utf16, utf32, etc. In all these different character sets you'll have exactly the same character comparison rules, because it's exactly the same collation.

So since 10.10 collations no longer have unique ids, because since 10.10 collation name no longer have to include a character set.

You need to use COLLATION_CHARACTER_SET_APPLICABILITY table, it lists character set + collation combination, and every such combination has a unique id. Old collations, that apply only to one character set, have a unique id shown already in COLLATIONS table.

Sergei Golubchik added a comment - 2024-07-21 21:20 The logic here that collation as such does not have an id. Collation is how to compare characters, where 'a' is less than, greater than, or equal to 'ä'. Character set is how to store characters, whether 'ä' is x'E4' or x'C3A4'. Every valid character set + collation combination has a unique id. But since 10.10 MariaDB has collations that apply to many character sets. For example, collation uca1400_latvian_ai_ci applies to utf8mb4, to utf16, utf32, etc. In all these different character sets you'll have exactly the same character comparison rules, because it's exactly the same collation. So since 10.10 collations no longer have unique ids, because since 10.10 collation name no longer have to include a character set. You need to use COLLATION_CHARACTER_SET_APPLICABILITY table, it lists character set + collation combination, and every such combination has a unique id. Old collations, that apply only to one character set, have a unique id shown already in COLLATIONS table.

People

Assignee:: Sergei Golubchik

Reporter:: John Bilicki

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 2024-07-20 01:30

Updated:: 2024-07-21 21:21

Resolved:: 2024-07-21 21:21

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server