Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26743

InnoDB: CHAR+nopad does not work well

    XMLWordPrintable

    Details

      Description

      InnoDB does not perform well primary key uniqueness on CHAR columns with NOPAD collations.

      Basic latin letter vs equal accented letter

      SET NAMES utf8mb3;
      DROP TABLE IF EXISTS t1;
      CREATE TABLE t1 (a CHAR(2), PRIMARY KEY(a)) COLLATE utf8_unicode_nopad_ci ENGINE=InnoDB ROW_FORMAT=COMPACT;
      INSERT INTO t1 VALUES ('a'),('ä');
      

      Query OK, 2 rows affected (0.001 sec)
      Records: 2  Duplicates: 0  Warnings: 0
      

      Looks wrong. The expected result is to throw a duplicate key error.
      Both values have only one character, so both are padded with one extra space when stored to the column.
      The values 'a ' and 'ä ' are equal in this collation:

      SELECT _utf8mb3 'a ' COLLATE utf8_unicode_nopad_ci AS c1 UNION SELECT _utf8mb3 'ä ';
      

      +------+
      | c1   |
      +------+
      | a    | <- notice only one row
      +------+
      

      Two letters vs equal (but space padded) expansion

      SET NAMES utf8mb3;
      DROP TABLE IF EXISTS t1;
      CREATE TABLE t1 (a CHAR(2), PRIMARY KEY(a)) COLLATE utf8_unicode_nopad_ci ENGINE=InnoDB ROW_FORMAT=COMPACT;
      INSERT INTO t1 VALUES ('ss'),('ß');
      

      ERROR 1062 (23000): Duplicate entry 'ß' for key 'PRIMARY'
      

      Looks wrong. The expected result is to accept both values.
      The 'ss' has two characters. It is stored to the column as is. The 'ß' has only one character, so it is padded with one space to 'ß '.
      The values 'ss' and 'ß ' are different in this collation:

      SELECT _utf8mb3 'ss' COLLATE utf8_unicode_nopad_ci AS c1 UNION SELECT _utf8mb3 'ß ';
      

      +------+
      | c1   |
      +------+
      | ss   |
      | ß    |
      +------+
      

      Basic latin letter (but followed by an ignorable character) vs equal accented letter

      SET NAMES utf8mb3;
      DROP TABLE IF EXISTS t1;
      CREATE TABLE t1 (a CHAR(3), PRIMARY KEY(a)) CHARACTER SET utf8mb3 COLLATE utf8mb3_unicode_nopad_ci ENGINE=InnoDB ROW_FORMAT=COMPACT;
      INSERT INTO t1 VALUES (CONCAT('a',_utf8mb3 0x01)),('ä');
      

      ERROR 1062 (23000): Duplicate entry 'ä' for key 'PRIMARY'
      

      Looks wrong. The expected result is to accept both values.
      The first value CONCAT('a',_utf8mb3 0x01) returns two characters, so it gets padded with a space when stored to the column. So the value which is actually stored is CONCAT('a',_utf8mb3 0x01,' ').
      The 'ä' is just one character, so it gets padded with two spaces to 'ä ' when stored. These two values are different in this collation:

      SELECT HEX(c1) FROM (SELECT CONCAT('a',_utf8mb3 0x01, ' ') COLLATE utf8mb3_unicode_nopad_ci AS c1 UNION SELECT 'ä  ') td1;
      

      +----------+
      | HEX(c1)  |
      +----------+
      | 610120   |
      | C3A42020 |
      +----------+
      

      SET NAMES utf8mb3;
      DROP TABLE IF EXISTS t1;
      CREATE TABLE t1 (a CHAR(2), PRIMARY KEY(a)) COLLATE utf8_unicode_nopad_ci ENGINE=InnoDB ROW_FORMAT=COMPACT;
      INSERT INTO t1 VALUES (CONCAT('a',_utf8mb3 0x01)),('ä');
      

      ERROR 1062 (23000): Duplicate entry 'ä' for key 'PRIMARY'
      

      Looks wrong. The expected result is to accept both values.
      The first value CONCAT('a',_utf8mb3 0x01) consists of two characters. It is stored as is.
      The 'ä' is just one character, so it gets padded with one spaces to 'ä ' when stored. These two values are different in this collation:

      SELECT HEX(c1) FROM (SELECT CONCAT('a',_utf8mb3 0x01) COLLATE utf8mb3_unicode_nopad_ci AS c1 UNION SELECT 'ä ') td1;
      

      +---------+
      | HEX(c1) |
      +---------+
      | 6101    |
      | C3A420  |
      +---------+
      

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              bar Alexander Barkov
              Reporter:
              bar Alexander Barkov
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:

                  Git Integration