Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30556

UPPER() returns an empty string for U+0251 in Unicode-5.2.0+ collations for utf8

    XMLWordPrintable

Details

    Description

      This script erroneously returns an empty string in the column UPPER(c) on the second row:

      CREATE OR REPLACE TABLE t1
      (
        c VARCHAR(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_520_ci
      );
      INSERT INTO t1 VALUES
      (_ucs2 0x2C6D /* U+0251 LATIN SMALL LETTER ALPHA */),
      (_ucs2 0x0251 /* U+2C6D LATIN CAPITAL LETTER ALPHA */);
      SELECT c, hex(c), UPPER(c), LOWER(c) FROM t1;
      

      +------+--------+----------+----------+
      | c    | hex(c) | UPPER(c) | LOWER(c) |
      +------+--------+----------+----------+
      | Ɑ    | E2B1AD | Ɑ        | ɑ        |
      | ɑ    | C991   |          | ɑ        |
      +------+--------+----------+----------+
      

      So does this:

      CREATE OR REPLACE TABLE t1
      (
        c VARCHAR(32) CHARACTER SET utf8mb3 COLLATE utf8mb3_unicode_520_ci
      );
      INSERT INTO t1 VALUES
      (_ucs2 0x2C6D /* U+0251 LATIN SMALL LETTER ALPHA */),
      (_ucs2 0x0251 /* U+2C6D LATIN CAPITAL LETTER ALPHA */);
      SELECT c, hex(c), UPPER(c), LOWER(c) FROM t1;
      

      +------+--------+----------+----------+
      | c    | hex(c) | UPPER(c) | LOWER(c) |
      +------+--------+----------+----------+
      | Ɑ    | E2B1AD | Ɑ        | ɑ        |
      | ɑ    | C991   |          | ɑ        |
      +------+--------+----------+----------+
      

      So does this:

      CREATE OR REPLACE TABLE t1
      (
        c VARCHAR(32) CHARACTER SET utf8mb4 COLLATE uca1400_ai_ci
      );
      INSERT INTO t1 VALUES
      (_ucs2 0x2C6D /* U+0251 LATIN SMALL LETTER ALPHA */),
      (_ucs2 0x0251 /* U+2C6D LATIN CAPITAL LETTER ALPHA */);
      SELECT c, hex(c), UPPER(c), LOWER(c) FROM t1;
      

      +------+--------+----------+----------+
      | c    | hex(c) | UPPER(c) | LOWER(c) |
      +------+--------+----------+----------+
      | Ɑ    | E2B1AD | Ɑ        | ɑ        |
      | ɑ    | C991   |          | ɑ        |
      +------+--------+----------+----------+
      

      So does this:

      CREATE OR REPLACE TABLE t1
      (
        c VARCHAR(32) CHARACTER SET utf8mb3 COLLATE uca1400_ai_ci
      );
      INSERT INTO t1 VALUES
      (_ucs2 0x2C6D /* U+0251 LATIN SMALL LETTER ALPHA */),
      (_ucs2 0x0251 /* U+2C6D LATIN CAPITAL LETTER ALPHA */);
      SELECT c, hex(c), UPPER(c), LOWER(c) FROM t1;
      

      +------+--------+----------+----------+
      | c    | hex(c) | UPPER(c) | LOWER(c) |
      +------+--------+----------+----------+
      | Ɑ    | E2B1AD | Ɑ        | ɑ        |
      | ɑ    | C991   |          | ɑ        |
      +------+--------+----------+----------+
      

      With utf16 collations it works fine. For example:

      CREATE OR REPLACE TABLE t1
      (
        c VARCHAR(32) CHARACTER SET utf16 COLLATE utf16_unicode_520_ci
      );
      INSERT INTO t1 VALUES
      (_ucs2 0x2C6D /* U+0251 LATIN SMALL LETTER ALPHA */),
      (_ucs2 0x0251 /* U+2C6D LATIN CAPITAL LETTER ALPHA */);
      SELECT c, hex(c), UPPER(c), LOWER(c) FROM t1;
      

      +------+--------+----------+----------+
      | c    | hex(c) | UPPER(c) | LOWER(c) |
      +------+--------+----------+----------+
      | Ɑ    | 2C6D   | Ɑ        | ɑ        |
      | ɑ    | 0251   | Ɑ        | ɑ        |
      +------+--------+----------+----------+
      

      Attachments

        Issue Links

          Activity

            People

              bar Alexander Barkov
              bar Alexander Barkov
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.