Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30577

Case folding for uca1400 collations is not up to date

Details

    Description

      UCA1400 collations (added by MDEV-27009) currently use Unicode-5.2.0 case folding tables.

      They should use Unicode-14.0.0 tables instead.

      The difference (see attached diff-520-1400.diff) between these two files:

      shows that a few hundred new case folding mapping pairs where added in these letter scripts:

      Cyrillic, Gergian, Cherokee, Glagolitic, Coptic, Latin, Osage, Vithkuqi, Old Hungarian, Warang Citi, Medefaidrin, Adlam.

      This SQL script demonstrates the out-dated case folding:

      CREATE OR REPLACE TABLE t1 (a VARCHAR(10) CHARACTER SET utf8 COLLATE uca1400_ai_ci);
      # Insert letters appeared in Unicode-6.1 (released in January 2012)
      INSERT INTO t1 VALUES (_ucs2 0xA792) /* U+A792 LATIN CAPITAL LETTER C WITH BAR */;
      INSERT INTO t1 VALUES (_ucs2 0xA793) /* U+A793 LATIN SMALL LETTER C WITH BAR */;
      SELECT HEX(a), HEX(LOWER(a)), HEX(UPPER(a)), a, LOWER(a), UPPER(a) FROM t1;
      

      +--------+---------------+---------------+------+----------+----------+
      | HEX(a) | HEX(LOWER(a)) | HEX(UPPER(a)) | a    | LOWER(a) | UPPER(a) |
      +--------+---------------+---------------+------+----------+----------+
      | EA9E92 | EA9E92        | EA9E92        | Ꞓ    | Ꞓ        | Ꞓ        |
      | EA9E93 | EA9E93        | EA9E93        | ꞓ    | ꞓ        | ꞓ        |
      +--------+---------------+---------------+------+----------+----------+
      

      The above two characters (first appeared in Unicode-6.1) are expected to map to each other by functions UPPER and LOWER.

      Attachments

        Issue Links

          Activity

            bar Alexander Barkov created issue -
            bar Alexander Barkov made changes -
            Field Original Value New Value
            bar Alexander Barkov made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            Description UCA1400 collations (added by MDEV-27009) currently use Unicode-5.2.0 case folding tables.

            They should use Unicode-14.0.0 tables instead.

            The difference (see attached diff-520-1400.diff) between these two files:
            - https://www.unicode.org/Public/5.2.0/ucd/CaseFolding.txt
            - https://www.unicode.org/Public/14.0.0/ucd/CaseFolding.txt

            shows that a few hundred new case folding mapping pairs where added in these letter scripts:

            Cyrillic, Gergian, Cherokee, Glagolitic, Coptic, Latin, Osage, Vithkuqi, Old Hungarian, Warang Citi, Medefaidrin, Adlam.

            This SQL script demonstrates the out-dated case folding:

            {code:sql}
            CREATE OR REPLACE TABLE t1 (a VARCHAR(10) CHARACTER SET utf8 COLLATE uca1400_ai_ci);
            # Insert letters appeared in Unicode-6.1 (released in January 2012)
            INSERT INTO t1 VALUES (_ucs2 0xA792) /* U+A792 LATIN CAPITAL LETTER C WITH BAR */;
            INSERT INTO t1 VALUES (_ucs2 0xA793) /* U+A793 LATIN SMALL LETTER C WITH BAR */;
            SELECT HEX(a), HEX(LOWER(a)), HEX(UPPER(a)), a, LOWER(a), UPPER(a) FROM t1;
            {code}
            {noformat}
            +--------+---------------+---------------+------+----------+----------+
            | HEX(a) | HEX(LOWER(a)) | HEX(UPPER(a)) | a | LOWER(a) | UPPER(a) |
            +--------+---------------+---------------+------+----------+----------+
            | EA9E92 | EA9E92 | EA9E92 | Ꞓ | Ꞓ | Ꞓ |
            | EA9E93 | EA9E93 | EA9E93 | ꞓ | ꞓ | ꞓ |
            +--------+---------------+---------------+------+----------+----------+
            {noformat}

            The above two characters (first appeared in Unicode-6.1) are expected to map to each other by functions UPPER and LOWER.
            UCA1400 collations (added by MDEV-27009) currently use Unicode-5.2.0 case folding tables.

            They should use Unicode-14.0.0 tables instead.

            The difference (see attached diff-520-1400.diff) between these two files:
            - https://www.unicode.org/Public/5.2.0/ucd/CaseFolding.txt
            - https://www.unicode.org/Public/14.0.0/ucd/CaseFolding.txt

            shows that a few hundred new case folding mapping pairs where added in these letter scripts:

            Cyrillic, Gergian, Cherokee, Glagolitic, Coptic, Latin, Osage, Vithkuqi, Old Hungarian, Warang Citi, Medefaidrin, Adlam.

            This SQL script demonstrates the out-dated case folding:

            {code:sql}
            CREATE OR REPLACE TABLE t1 (a VARCHAR(10) CHARACTER SET utf8 COLLATE uca1400_ai_ci);
            # Insert letters appeared in Unicode-6.1 (released in January 2012)
            INSERT INTO t1 VALUES (_ucs2 0xA792) /* U+A792 LATIN CAPITAL LETTER C WITH BAR */;
            INSERT INTO t1 VALUES (_ucs2 0xA793) /* U+A793 LATIN SMALL LETTER C WITH BAR */;
            SELECT HEX(a), HEX(LOWER(a)), HEX(UPPER(a)), a, LOWER(a), UPPER(a) FROM t1;
            {code}
            {noformat}
            +--------+---------------+---------------+------+----------+----------+
            | HEX(a) | HEX(LOWER(a)) | HEX(UPPER(a)) | a | LOWER(a) | UPPER(a) |
            +--------+---------------+---------------+------+----------+----------+
            | EA9E92 | EA9E92 | EA9E92 | Ꞓ | Ꞓ | Ꞓ |
            | EA9E93 | EA9E93 | EA9E93 | ꞓ | ꞓ | ꞓ |
            +--------+---------------+---------------+------+----------+----------+
            {noformat}

            The above two characters (first appeared in Unicode-6.1) are expected to map to each other by functions UPPER and LOWER.
            bar Alexander Barkov made changes -
            serg Sergei Golubchik made changes -
            Fix Version/s 10.11 [ 27614 ]
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            issue.field.resolutiondate 2023-04-18 08:20:05.0 2023-04-18 08:20:05.322
            bar Alexander Barkov made changes -
            Fix Version/s 10.10.4 [ 28522 ]
            Fix Version/s 10.11.3 [ 28524 ]
            Fix Version/s 11.1.1 [ 28704 ]
            Fix Version/s 11.0.2 [ 28706 ]
            Fix Version/s 10.10 [ 27530 ]
            Fix Version/s 10.11 [ 27614 ]
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Closed [ 6 ]
            bar Alexander Barkov made changes -
            Attachment diff-500-1400.diff [ 72118 ]
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -

            People

              bar Alexander Barkov
              bar Alexander Barkov
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.