Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27653

long uniques don't work with unicode collations

Details

    Description

      set names utf8;
      create table t1 (a text collate utf8_general_ci unique);
      show create table t1;
      insert t1 values ('a');
      insert t1 values ('ä');
      select * from t1;
      select distinct * from t1;
      select 'a' = 'ä';
      drop table t1;
      

      shows that one can insert two characters into a long unique index that compare equally.

      It happens because calc_hash_for_unique() hashes string length in bytes together with the string content. So even if two utf8 strings hash to the same value, they can have different length in bytes and long unique hash will be different.

      The effect of this bug is that if one has a table with
      CREATE TABLE
      ...
      xxx VARCHAR(255) COLLATE 'utf8mb3_general_ci',
      ...
      UNIQUE INDEX `xxx` (column,...) USING HASH...

      Then the following problems may occur:

      • The table will accept duplicates for UTF8 strings that are different but should compare equal.
      • If master and slave are of different versions where the bug is fixed on one of them or the hash value is different (see MDEV-32093) then the following can also happen:
      • Insert works on master (which does not notice the duplicate key) but fails on slave (which notices the duplicate key)

      Fix is to drop and add back the unique index or run ALTER TABLE xxx ENGINE=InnoDB (for InnoDB tables) to reconstruct the hash index.

      Attachments

        Issue Links

          Activity

            serg Sergei Golubchik created issue -
            serg Sergei Golubchik made changes -
            Field Original Value New Value
            Affects Version/s 10.4 [ 22408 ]
            Affects Version/s 10.5 [ 23123 ]
            Affects Version/s 10.6 [ 24028 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 10.4 [ 22408 ]
            Fix Version/s 10.5 [ 23123 ]
            Fix Version/s 10.6 [ 24028 ]
            serg Sergei Golubchik made changes -
            Assignee Nikita Malyavin [ nikitamalyavin ]
            serg Sergei Golubchik made changes -
            Description {code:sql}
            set names utf8;
            create table t1 (a text collate utf8_general_ci unique);
            show create table t1;
            insert t1 values ('a');
            insert t1 values ('ä');
            select * from t1;
            select distinct * from t1;
            select 'a' = 'ä';
            drop table t1;
            {code}
            shows that one can insert to characters into a long unique index that compare equally.

            It happens because {{calc_hash_for_unique()}} hashes string *length in bytes* together with the string content. So even if two utf8 strings hash to the same value, they can have different length in bytes and long unique hash will be different.
            {code:sql}
            set names utf8;
            create table t1 (a text collate utf8_general_ci unique);
            show create table t1;
            insert t1 values ('a');
            insert t1 values ('ä');
            select * from t1;
            select distinct * from t1;
            select 'a' = 'ä';
            drop table t1;
            {code}
            shows that one can insert two characters into a long unique index that compare equally.

            It happens because {{calc_hash_for_unique()}} hashes string *length in bytes* together with the string content. So even if two utf8 strings hash to the same value, they can have different length in bytes and long unique hash will be different.
            danblack Daniel Black made changes -
            serg Sergei Golubchik made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 10.7 [ 24805 ]
            Fix Version/s 10.8 [ 26121 ]
            Fix Version/s 10.9 [ 26905 ]
            serg Sergei Golubchik made changes -
            Assignee Nikita Malyavin [ nikitamalyavin ] Alexander Barkov [ bar ]
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            bar Alexander Barkov made changes -
            Status In Progress [ 3 ] In Review [ 10002 ]
            bar Alexander Barkov made changes -
            Assignee Alexander Barkov [ bar ] Sergei Golubchik [ serg ]
            serg Sergei Golubchik made changes -
            bar Alexander Barkov made changes -
            bar Alexander Barkov made changes -
            serg Sergei Golubchik made changes -
            Assignee Sergei Golubchik [ serg ] Alexander Barkov [ bar ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            bar Alexander Barkov made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            bar Alexander Barkov made changes -
            Assignee Alexander Barkov [ bar ] Sergei Golubchik [ serg ]
            Status In Progress [ 3 ] In Review [ 10002 ]
            serg Sergei Golubchik made changes -
            Assignee Sergei Golubchik [ serg ] Alexander Barkov [ bar ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            bar Alexander Barkov made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            bar Alexander Barkov made changes -
            Component/s Data types [ 13906 ]
            Fix Version/s 10.4.28 [ 28509 ]
            Fix Version/s 10.5.19 [ 28511 ]
            Fix Version/s 10.6.12 [ 28513 ]
            Fix Version/s 10.7.8 [ 28515 ]
            Fix Version/s 10.8.7 [ 28517 ]
            Fix Version/s 10.9.5 [ 28519 ]
            Fix Version/s 10.10.3 [ 28521 ]
            Fix Version/s 10.11.2 [ 28523 ]
            Fix Version/s 10.4 [ 22408 ]
            Fix Version/s 10.5 [ 23123 ]
            Fix Version/s 10.6 [ 24028 ]
            Fix Version/s 10.7 [ 24805 ]
            Fix Version/s 10.8 [ 26121 ]
            Fix Version/s 10.9 [ 26905 ]
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Closed [ 6 ]
            serg Sergei Golubchik made changes -
            marko Marko Mäkelä made changes -
            serg Sergei Golubchik made changes -
            serg Sergei Golubchik made changes -
            serg Sergei Golubchik made changes -
            monty Michael Widenius made changes -
            Description {code:sql}
            set names utf8;
            create table t1 (a text collate utf8_general_ci unique);
            show create table t1;
            insert t1 values ('a');
            insert t1 values ('ä');
            select * from t1;
            select distinct * from t1;
            select 'a' = 'ä';
            drop table t1;
            {code}
            shows that one can insert two characters into a long unique index that compare equally.

            It happens because {{calc_hash_for_unique()}} hashes string *length in bytes* together with the string content. So even if two utf8 strings hash to the same value, they can have different length in bytes and long unique hash will be different.
            {code:sql}
            set names utf8;
            create table t1 (a text collate utf8_general_ci unique);
            show create table t1;
            insert t1 values ('a');
            insert t1 values ('ä');
            select * from t1;
            select distinct * from t1;
            select 'a' = 'ä';
            drop table t1;
            {code}
            shows that one can insert two characters into a long unique index that compare equally.

            It happens because {{calc_hash_for_unique()}} hashes string *length in bytes* together with the string content. So even if two utf8 strings hash to the same value, they can have different length in bytes and long unique hash will be different.

            The effect of this bug is that if one has a table with
            CREATE TABLE
             ...
            xxx VARCHAR(255) COLLATE 'utf8mb3_general_ci',
            ...
            UNIQUE INDEX `xxx` (column,...) USING HASH...

            Then the following problems may occur:
            - The table will accept duplicates even if the value should be compared as
            monty Michael Widenius made changes -
            Description {code:sql}
            set names utf8;
            create table t1 (a text collate utf8_general_ci unique);
            show create table t1;
            insert t1 values ('a');
            insert t1 values ('ä');
            select * from t1;
            select distinct * from t1;
            select 'a' = 'ä';
            drop table t1;
            {code}
            shows that one can insert two characters into a long unique index that compare equally.

            It happens because {{calc_hash_for_unique()}} hashes string *length in bytes* together with the string content. So even if two utf8 strings hash to the same value, they can have different length in bytes and long unique hash will be different.

            The effect of this bug is that if one has a table with
            CREATE TABLE
             ...
            xxx VARCHAR(255) COLLATE 'utf8mb3_general_ci',
            ...
            UNIQUE INDEX `xxx` (column,...) USING HASH...

            Then the following problems may occur:
            - The table will accept duplicates even if the value should be compared as
            {code:sql}
            set names utf8;
            create table t1 (a text collate utf8_general_ci unique);
            show create table t1;
            insert t1 values ('a');
            insert t1 values ('ä');
            select * from t1;
            select distinct * from t1;
            select 'a' = 'ä';
            drop table t1;
            {code}
            shows that one can insert two characters into a long unique index that compare equally.

            It happens because {{calc_hash_for_unique()}} hashes string *length in bytes* together with the string content. So even if two utf8 strings hash to the same value, they can have different length in bytes and long unique hash will be different.

            The effect of this bug is that if one has a table with
            CREATE TABLE
             ...
            xxx VARCHAR(255) COLLATE 'utf8mb3_general_ci',
            ...
            UNIQUE INDEX `xxx` (column,...) USING HASH...

            Then the following problems may occur:
            - The table will accept duplicates for UTF8 strings that are different but should compare equal.

            monty Michael Widenius made changes -
            Description {code:sql}
            set names utf8;
            create table t1 (a text collate utf8_general_ci unique);
            show create table t1;
            insert t1 values ('a');
            insert t1 values ('ä');
            select * from t1;
            select distinct * from t1;
            select 'a' = 'ä';
            drop table t1;
            {code}
            shows that one can insert two characters into a long unique index that compare equally.

            It happens because {{calc_hash_for_unique()}} hashes string *length in bytes* together with the string content. So even if two utf8 strings hash to the same value, they can have different length in bytes and long unique hash will be different.

            The effect of this bug is that if one has a table with
            CREATE TABLE
             ...
            xxx VARCHAR(255) COLLATE 'utf8mb3_general_ci',
            ...
            UNIQUE INDEX `xxx` (column,...) USING HASH...

            Then the following problems may occur:
            - The table will accept duplicates for UTF8 strings that are different but should compare equal.

            {code:sql}
            set names utf8;
            create table t1 (a text collate utf8_general_ci unique);
            show create table t1;
            insert t1 values ('a');
            insert t1 values ('ä');
            select * from t1;
            select distinct * from t1;
            select 'a' = 'ä';
            drop table t1;
            {code}
            shows that one can insert two characters into a long unique index that compare equally.

            It happens because {{calc_hash_for_unique()}} hashes string *length in bytes* together with the string content. So even if two utf8 strings hash to the same value, they can have different length in bytes and long unique hash will be different.

            The effect of this bug is that if one has a table with
            CREATE TABLE
             ...
            xxx VARCHAR(255) COLLATE 'utf8mb3_general_ci',
            ...
            UNIQUE INDEX `xxx` (column,...) USING HASH...

            Then the following problems may occur:
            - The table will accept duplicates for UTF8 strings that are different but should compare equal.
            - If master and slave are of different versions where the bug is fixed on one of them or the hash value is different (see MDEV-32093) then the following can also happen:
              - Insert works on master (which does not notice the duplicate key) but fails on slave (which notices the duplicate key)

            Fix is to drop and add back the unique index or run ALTER TABLE xxx ENGINE=InnoDB (for InnoDB tables) to reconstruct the hash index.

            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk Related Tickets 144370 136244
            Johnny See Zhongyu Shi made changes -

            People

              bar Alexander Barkov
              serg Sergei Golubchik
              Votes:
              3 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.