Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34631

Internal Unicode 14.0 collation character sets are NULL

Details

    Description

      Using:

      • Linux / MariaDB 10.11.8
      • Windows MariaDB 10.11.6
      • Windows MariaDB 11.4.2

      The Unicode 14.0 collations have the character set values set to NULL.

      SELECT * FROM information_schema.COLLATIONS 
      WHERE CHARACTER_SET_NAME='utf8mb4' 
      OR CHARACTER_SET_NAME IS NULL 
      ORDER BY COLLATION_NAME ASC;
      

      Recently I was finally able to upgrade to a version of MariaDB on a server that supports Unicode 14.0. I prefer low-level programing whenever possible for control and performance reasons. So I directly querying the information_schema table instead of referencing SHOW for example. It doesn't seem right that the CHARACTER_SET_NAME values for the Unicode 14.0 collations are NULL. I don't imagine that the newer collations suddenly don't require character set support.

      This introduces a minor inconvenience for a collation tool I created though I've dealt with much worse from other software. I wouldn't mind some insight in to the correlation and why these values are NULL.

      Attachments

        Issue Links

          Activity

            The logic here that collation as such does not have an id. Collation is how to compare characters, where 'a' is less than, greater than, or equal to 'ä'. Character set is how to store characters, whether 'ä' is x'E4' or x'C3A4'.

            Every valid character set + collation combination has a unique id. But since 10.10 MariaDB has collations that apply to many character sets. For example, collation uca1400_latvian_ai_ci applies to utf8mb4, to utf16, utf32, etc. In all these different character sets you'll have exactly the same character comparison rules, because it's exactly the same collation.

            So since 10.10 collations no longer have unique ids, because since 10.10 collation name no longer have to include a character set.

            You need to use COLLATION_CHARACTER_SET_APPLICABILITY table, it lists character set + collation combination, and every such combination has a unique id. Old collations, that apply only to one character set, have a unique id shown already in COLLATIONS table.

            serg Sergei Golubchik added a comment - The logic here that collation as such does not have an id. Collation is how to compare characters, where 'a' is less than, greater than, or equal to 'ä'. Character set is how to store characters, whether 'ä' is x'E4' or x'C3A4'. Every valid character set + collation combination has a unique id. But since 10.10 MariaDB has collations that apply to many character sets. For example, collation uca1400_latvian_ai_ci applies to utf8mb4, to utf16, utf32, etc. In all these different character sets you'll have exactly the same character comparison rules, because it's exactly the same collation. So since 10.10 collations no longer have unique ids, because since 10.10 collation name no longer have to include a character set. You need to use COLLATION_CHARACTER_SET_APPLICABILITY table, it lists character set + collation combination, and every such combination has a unique id. Old collations, that apply only to one character set, have a unique id shown already in COLLATIONS table.

            People

              serg Sergei Golubchik
              JAB Creations John Bilicki
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.