Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30048

Prefix keys for CHAR work differently for MyISAM vs InnoDB

Details

    Description

      # MyISAM is wrong
      DROP TABLE IF EXISTS t1;
      CREATE TABLE t1 (a CHAR(10) COLLATE utf8mb3_unicode_nopad_ci, UNIQUE KEY(a)) ENGINE=MyISAM;
      INSERT INTO t1 VALUES ('ss'),('ß');
      

      ERROR 1062 (23000): Duplicate entry 'ß' for key 'a'
      

      # Aria is wrong
      DROP TABLE IF EXISTS t1;
      CREATE TABLE t1 (a CHAR(10) COLLATE utf8mb3_unicode_nopad_ci, UNIQUE KEY(a)) ENGINE=Aria;
      INSERT INTO t1 VALUES ('ss'),('ß');
      

      ERROR 1062 (23000): Duplicate entry 'ß' for key 'a'
      

      # InnoDB is correct
      DROP TABLE IF EXISTS t1;
      CREATE TABLE t1 (a CHAR(10) COLLATE utf8mb3_unicode_nopad_ci, UNIQUE KEY(a)) ENGINE=InnoDB;
      INSERT INTO t1 VALUES ('ss'),('ß');
      

      Query OK, 2 rows affected (0.002 sec)
      Records: 2  Duplicates: 0  Warnings: 0
      

      # MEMORY is correct
      DROP TABLE IF EXISTS t1;
      CREATE TABLE t1 (a CHAR(10) COLLATE utf8mb3_unicode_nopad_ci, UNIQUE KEY(a)) ENGINE=MEMORY;
      INSERT INTO t1 VALUES ('ss'),('ß');
      

      Query OK, 2 rows affected (0.001 sec)
      Records: 2  Duplicates: 0  Warnings: 0
      

      MyISAM and Aria work incorrectly.
      InnoDB and HEAP work correctly.

      Rationale

      We compare two fixed length strings with 10 characters:

      • Two characters 'ss' followed by 8 spaces
      • One 'ß' character followed by 9 spaces

      The comparison of 'ss' and 'ß' should work as follows for CHAR(10):

      • 'ss' is equal to 'ß' in this collation
      • 8 spaces are smaller than 9 spaces (because this is a NOPAD collation)
      • the result is "smaller"

      Note, left join works correctly:

      DROP TABLE IF EXISTS t1;
      CREATE TABLE t1 (a CHAR(10) COLLATE utf8mb3_unicode_nopad_ci, UNIQUE KEY(a)) ENGINE=InnoDB;
      INSERT INTO t1 VALUES ('ss'),('ß');
      select * from t1, t1 as t2 where t1.a=t2.a;
      

      +------+------+
      | a    | a    |
      +------+------+
      | ss   | ss   |
      | ß    | ß    |
      +------+------+
      

      The same problem is repeatable with prefix keys:

      # MyISAM is wrong
      DROP TABLE IF EXISTS t1;
      CREATE TABLE t1 (a CHAR(10) COLLATE utf8mb3_unicode_nopad_ci, UNIQUE KEY(a(2))) ENGINE=MyISAM;
      INSERT INTO t1 VALUES ('ss'),('ß');
      

      ERROR 1062 (23000): Duplicate entry 'ß' for key 'a'
      

      # Aria is wrong
      DROP TABLE IF EXISTS t1;
      CREATE TABLE t1 (a CHAR(10) COLLATE utf8mb3_unicode_nopad_ci, UNIQUE KEY(a(2))) ENGINE=ARIA;
      INSERT INTO t1 VALUES ('ss'),('ß');
      

      ERROR 1062 (23000): Duplicate entry 'ß' for key 'a'
      

      # InnoDB is correct
      DROP TABLE IF EXISTS t1;
      CREATE TABLE t1 (a CHAR(10) COLLATE utf8mb3_unicode_nopad_ci, UNIQUE KEY(a(2))) ENGINE=InnoDB;
      INSERT INTO t1 VALUES ('ss'),('ß');
      

      Query OK, 2 rows affected (0.002 sec)
      Records: 2  Duplicates: 0  Warnings: 0
      

      # MEMORY is correct
      DROP TABLE IF EXISTS t1;
      CREATE TABLE t1 (a CHAR(10) COLLATE utf8mb3_unicode_nopad_ci, UNIQUE KEY(a(2))) ENGINE=MEMORY;
      INSERT INTO t1 VALUES ('ss'),('ß');
      

      Query OK, 2 rows affected (0.000 sec)
      Records: 2  Duplicates: 0  Warnings: 0
      

      Attachments

        Issue Links

          Activity

            bar Alexander Barkov added a comment - - edited

            Heap works like InnoDB (this fact is now also noted in above description) :

            DROP TABLE IF EXISTS t1;
            CREATE TABLE t1 (a CHAR(10) COLLATE utf8mb3_unicode_nopad_ci, UNIQUE KEY(a)) ENGINE=HEAP;
            INSERT INTO t1 VALUES ('ss'),('ß');
            

            Query OK, 2 rows affected (0.000 sec)
            Records: 2  Duplicates: 0  Warnings: 0
            

            bar Alexander Barkov added a comment - - edited Heap works like InnoDB (this fact is now also noted in above description) : DROP TABLE IF EXISTS t1; CREATE TABLE t1 (a CHAR (10) COLLATE utf8mb3_unicode_nopad_ci, UNIQUE KEY (a)) ENGINE=HEAP; INSERT INTO t1 VALUES ( 'ss' ),( 'ß' ); Query OK, 2 rows affected (0.000 sec) Records: 2 Duplicates: 0 Warnings: 0
            bar Alexander Barkov added a comment - Hello serg , Can you please review a patch: https://github.com/MariaDB/server/commit/56fa1da9c676ebc480b5cb43627752150010cf4c Thanks.
            rjasdfiii Rick James added a comment -

            I hope this is not a repeat of the fiasco many years ago utf8_general_mysql500_ci .

            rjasdfiii Rick James added a comment - I hope this is not a repeat of the fiasco many years ago utf8_general_mysql500_ci .

            rjasdfiii, which fiasco do you mean?

            bar Alexander Barkov added a comment - rjasdfiii , which fiasco do you mean?
            rjasdfiii Rick James added a comment -

            @AlexanderBarkov - Many users experienced "duplicate key" in their PRIMARY KEY due to this incompatible change.

            ----- 2012-04-10 5.6.5 Milestone 8 & 2012-03-21 5.1.62 & 2012-02-17 5.5.21 – Functionality Added or Changed – -----

            New utf8_general_mysql500_ci and ucs2_general_mysql500_ci collations have been added that preserve the behavior of utf8_general_ci and ucs2_general_ci from versions of MySQL previous to 5.1.24. [CHECK TABLE](https://dev.mysql.com/doc/refman/5.7/en/check-table.html) produces this error:

            ----- 2008-11-14 5.1.30

            The CHECK TABLE ... FOR UPGRADE statement did not check for incompatible collation changes made in MySQL 5.1.24 (Bug #27877). This also affects mysqlcheck and mysql_upgrade, which cause that statement to be executed. See Checking Whether Tables or Indexes Must Be Rebuilt.
            Prior to this fix, a binary upgrade (performed without dumping tables with mysqldump before the upgrade and reloading the dump file after the upgrade) would corrupt tables that have indexes that use the utf8_general_ci or ucs2_general_ci collation for columns that contain 'ß' LATIN SMALL LETTER SHARP S (German). After the fix, CHECK TABLE ... FOR UPGRADE properly detects the problem and warns about tables that need repair.

            However, the fix is not backward compatible and can result in a downgrading problem under these circumstances:

            Perform a binary upgrade to a version of MySQL that includes the fix.

            Run CHECK TABLE ... FOR UPGRADE (or mysqlcheck or mysql_upgrade) to upgrade tables.

            Perform a binary downgrade to a version of MySQL that does not include the fix.

            The solution is to dump tables with mysqldump before the downgrade and reload the dump file after the downgrade. Alternatively, drop and recreate affected indexes. (Bug #40053)

            rjasdfiii Rick James added a comment - @AlexanderBarkov - Many users experienced "duplicate key" in their PRIMARY KEY due to this incompatible change. ----- 2012-04-10 5.6.5 Milestone 8 & 2012-03-21 5.1.62 & 2012-02-17 5.5.21 – Functionality Added or Changed – ----- New utf8_general_mysql500_ci and ucs2_general_mysql500_ci collations have been added that preserve the behavior of utf8_general_ci and ucs2_general_ci from versions of MySQL previous to 5.1.24. [CHECK TABLE] ( https://dev.mysql.com/doc/refman/5.7/en/check-table.html ) produces this error: ----- 2008-11-14 5.1.30 The CHECK TABLE ... FOR UPGRADE statement did not check for incompatible collation changes made in MySQL 5.1.24 (Bug #27877). This also affects mysqlcheck and mysql_upgrade, which cause that statement to be executed. See Checking Whether Tables or Indexes Must Be Rebuilt. Prior to this fix, a binary upgrade (performed without dumping tables with mysqldump before the upgrade and reloading the dump file after the upgrade) would corrupt tables that have indexes that use the utf8_general_ci or ucs2_general_ci collation for columns that contain 'ß' LATIN SMALL LETTER SHARP S (German). After the fix, CHECK TABLE ... FOR UPGRADE properly detects the problem and warns about tables that need repair. However, the fix is not backward compatible and can result in a downgrading problem under these circumstances: Perform a binary upgrade to a version of MySQL that includes the fix. Run CHECK TABLE ... FOR UPGRADE (or mysqlcheck or mysql_upgrade) to upgrade tables. Perform a binary downgrade to a version of MySQL that does not include the fix. The solution is to dump tables with mysqldump before the downgrade and reload the dump file after the downgrade. Alternatively, drop and recreate affected indexes. (Bug #40053)

            99ffd7e3934 is ok to push.

            I'd still suggest to split the commit in two — a cleanup commit that introduces all functions in my_compare.h and replaces them as needed, but does not change the behavior (doesn't use ha_compare_char_fixed for HA_KEYTYPE_TEXT). And the second commit that fixes the bug.

            serg Sergei Golubchik added a comment - 99ffd7e3934 is ok to push. I'd still suggest to split the commit in two — a cleanup commit that introduces all functions in my_compare.h and replaces them as needed, but does not change the behavior (doesn't use ha_compare_char_fixed for HA_KEYTYPE_TEXT). And the second commit that fixes the bug.

            People

              bar Alexander Barkov
              bar Alexander Barkov
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.