Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26294

Duplicate entries in unique index not detected when changing collation with INPLACE / NOCOPY algoritm

Details

    Description

      When creating a table with a case sensitive collation on a column with unique constraint, then trying to alter the collation to a case insensitive one, entries that only differ in case lead to a duplicate entry error, aborting the ALTER TABLE, when using the COPY algorithm, as expected:

      CREATE TABLE t1 (
        id INT PRIMARY KEY,
        msg VARCHAR(100) CHARACTER SET utf8 COLLATE utf8_bin,
        UNIQUE(msg)
      ) ENGINE=INNODB;
       
      INSERT INTO t1 VALUES (1, 'aaa');
      INSERT INTO t1 VALUES (2, 'AAA');
       
      ALTER TABLE t1 MODIFY msg VARCHAR(100) CHARACTER SET utf8 COLLATE utf8_unicode_ci, ALGORITHM=COPY;
      

      This gives

      ERROR 1062 (23000): Duplicate entry 'AAA' for key 'msg'
      

      as expected.

      When trying the same with ALGORITHM=INPLACE or NOCOPY it works without detecting that we now have a duplicate entry though, so the new table version may now contain data violating the UNIQUE constraint.

      Attachments

        Issue Links

          Activity

            Looks like this issue is caused the following patch:

            commit 854c219a7f0e1878517d5a821992f650342380dd
            Author: Eugene Kosov <claprix@yandex.ru>
            Date:   Fri Jun 14 12:18:49 2019 +0300
             
                MDEV-17301 Change of COLLATE unnecessarily requires ALGORITHM=COPY
            

            thiru Thirunarayanan Balathandayuthapani added a comment - Looks like this issue is caused the following patch: commit 854c219a7f0e1878517d5a821992f650342380dd Author: Eugene Kosov <claprix@yandex.ru> Date: Fri Jun 14 12:18:49 2019 +0300   MDEV-17301 Change of COLLATE unnecessarily requires ALGORITHM=COPY
            bar Alexander Barkov added a comment - - edited

            This problem is also repeatable when altering between two _ci collations:

            SET NAMES utf8;
            CREATE OR REPLACE TABLE t1 (
              id INT PRIMARY KEY,
              msg VARCHAR(100) CHARACTER SET utf8 COLLATE utf8_swedish_ci,
              UNIQUE(msg)
            ) ENGINE=INNODB;
            INSERT INTO t1 VALUES (1, 'aaa');
            INSERT INTO t1 VALUES (2, 'ååå');
            ALTER TABLE t1 MODIFY msg VARCHAR(100) CHARACTER SET utf8 COLLATE utf8_unicode_ci, ALGORITHM=COPY;
            

            ERROR 1062 (23000): Duplicate entry 'ååå' for key 'msg'
            

            The error is correct:

            • in utf8_swedish_ci, 'Ã¥' is a separate letter which is sorted after 'z'
            • in utf8_unicode_ci, 'Ã¥' is a variant of 'a'

            Now I use ALGORITHM=NOCOPY:

            ALTER TABLE t1 MODIFY msg VARCHAR(100) CHARACTER SET utf8 COLLATE utf8_unicode_ci, ALGORITHM=NOCOPY;
            

            Query OK, 0 rows affected (0.009 sec)
            Records: 0  Duplicates: 0  Warnings: 0
            

            Looks wrong.

            See:
            https://collation-charts.org/mysql60/mysql604.utf8_swedish_ci.html
            https://collation-charts.org/mysql60/mysql604.utf8_unicode_ci.european.html

            bar Alexander Barkov added a comment - - edited This problem is also repeatable when altering between two _ci collations: SET NAMES utf8; CREATE OR REPLACE TABLE t1 ( id INT PRIMARY KEY , msg VARCHAR (100) CHARACTER SET utf8 COLLATE utf8_swedish_ci, UNIQUE (msg) ) ENGINE=INNODB; INSERT INTO t1 VALUES (1, 'aaa' ); INSERT INTO t1 VALUES (2, 'ååå' ); ALTER TABLE t1 MODIFY msg VARCHAR (100) CHARACTER SET utf8 COLLATE utf8_unicode_ci, ALGORITHM=COPY; ERROR 1062 (23000): Duplicate entry 'ååå' for key 'msg' The error is correct: in utf8_swedish_ci, 'å' is a separate letter which is sorted after 'z' in utf8_unicode_ci, 'å' is a variant of 'a' Now I use ALGORITHM=NOCOPY : ALTER TABLE t1 MODIFY msg VARCHAR (100) CHARACTER SET utf8 COLLATE utf8_unicode_ci, ALGORITHM=NOCOPY; Query OK, 0 rows affected (0.009 sec) Records: 0 Duplicates: 0 Warnings: 0 Looks wrong. See: https://collation-charts.org/mysql60/mysql604.utf8_swedish_ci.html https://collation-charts.org/mysql60/mysql604.utf8_unicode_ci.european.html

            I took a look at the given patch(https://github.com/MariaDB/server/commit/ab2414a6ceb376a8c03569a7d63d747ff215afdc).
            Problem with the patch is that InnoDB has to get the precise type list for row0merge.cc and row0log.cc.

            Moreover, column in the index points to old collation precise type
            and it could lead to crash in pageBulk::insert() whether the
            records are in ascending order.(MDEV-27280)

            Test case is

            --source include/have_innodb.inc
            CREATE TABLE t1 (
              id INT PRIMARY KEY,
              msg VARCHAR(100) CHARACTER SET utf8 COLLATE utf8_bin,
              id_2 INT not null,
              unique index(msg, id_2)
            ) ENGINE=INNODB;
             
            INSERT INTO t1 VALUES (1, 'aaa', 2);
            INSERT INTO t1 VALUES (2, 'AAA', 3);
            ALTER TABLE t1 MODIFY msg VARCHAR(100) CHARACTER SET utf8 COLLATE utf8_unicode_ci, ALGORITHM=NOCOPY;
            DROP TABLE t1;
             
            Assert:
             
            #3  0x00007ffff5aef502 in __GI___assert_fail (assertion=0x555556ed72b0 "cmp_rec_rec(rec, old_rec, offsets, old_offsets, m_index) > 0", file=0x555556ed71c8 "/home/thiru/mariarepo/server/10.4/storage/innobase/btr/btr0bulk.cc", line=200, function=0x555556ed8280 <PageBulk::insert(unsigned char const*, unsigned short*)::__PRETTY_FUNCTION__> "void PageBulk::insert(const rec_t*, rec_offs*)") at assert.c:101
            #4  0x0000555556756183 in PageBulk::insert (this=0x7fffa005d120, rec=0x7fffa005d73f "AAA\200", offsets=0x7fffa005d750) at /home/thiru/mariarepo/server/10.4/storage/innobase/btr/btr0bulk.cc:200
            #5  0x0000555556758937 in BtrBulk::insert (this=0x7fffeae62710, tuple=0x7fffa05d66e0, level=0) at /home/thiru/mariarepo/server/10.4/storage/innobase/btr/btr0bulk.cc:988
            
            

            InnoDB DDL has to have the index which has changed column collation value. Then only we could avoid this assert and re-arrange the record depends on
            new collation.

            Based on the approach, current patch does create new column in inplace_alter heap and index points to the new column.
            It leads to reloading of table during commit_inplace_alter_table(). This approach has some drawbacks when
            we're dealing with concurrent DML. Newly created column won't have relation to the table and it could
            lead to many crashes for concurrent DML. Patch is in bb-10.6-MDEV-26294

            Correct approach would be that InnoDB DDL has to use newly created column (collation changed one) and
            concurrent DML has to use old column. By using this approach, InnoDB could achieve that
            nocopy + instant operation can be done in nocopy algorithm.

            thiru Thirunarayanan Balathandayuthapani added a comment - I took a look at the given patch( https://github.com/MariaDB/server/commit/ab2414a6ceb376a8c03569a7d63d747ff215afdc ). Problem with the patch is that InnoDB has to get the precise type list for row0merge.cc and row0log.cc. Moreover, column in the index points to old collation precise type and it could lead to crash in pageBulk::insert() whether the records are in ascending order.( MDEV-27280 ) Test case is --source include/have_innodb.inc CREATE TABLE t1 ( id INT PRIMARY KEY, msg VARCHAR(100) CHARACTER SET utf8 COLLATE utf8_bin, id_2 INT not null, unique index(msg, id_2) ) ENGINE=INNODB;   INSERT INTO t1 VALUES (1, 'aaa', 2); INSERT INTO t1 VALUES (2, 'AAA', 3); ALTER TABLE t1 MODIFY msg VARCHAR(100) CHARACTER SET utf8 COLLATE utf8_unicode_ci, ALGORITHM=NOCOPY; DROP TABLE t1;   Assert:   #3 0x00007ffff5aef502 in __GI___assert_fail (assertion=0x555556ed72b0 "cmp_rec_rec(rec, old_rec, offsets, old_offsets, m_index) > 0", file=0x555556ed71c8 "/home/thiru/mariarepo/server/10.4/storage/innobase/btr/btr0bulk.cc", line=200, function=0x555556ed8280 <PageBulk::insert(unsigned char const*, unsigned short*)::__PRETTY_FUNCTION__> "void PageBulk::insert(const rec_t*, rec_offs*)") at assert.c:101 #4 0x0000555556756183 in PageBulk::insert (this=0x7fffa005d120, rec=0x7fffa005d73f "AAA\200", offsets=0x7fffa005d750) at /home/thiru/mariarepo/server/10.4/storage/innobase/btr/btr0bulk.cc:200 #5 0x0000555556758937 in BtrBulk::insert (this=0x7fffeae62710, tuple=0x7fffa05d66e0, level=0) at /home/thiru/mariarepo/server/10.4/storage/innobase/btr/btr0bulk.cc:988 InnoDB DDL has to have the index which has changed column collation value. Then only we could avoid this assert and re-arrange the record depends on new collation. Based on the approach, current patch does create new column in inplace_alter heap and index points to the new column. It leads to reloading of table during commit_inplace_alter_table() . This approach has some drawbacks when we're dealing with concurrent DML. Newly created column won't have relation to the table and it could lead to many crashes for concurrent DML. Patch is in bb-10.6- MDEV-26294 Correct approach would be that InnoDB DDL has to use newly created column (collation changed one) and concurrent DML has to use old column. By using this approach, InnoDB could achieve that nocopy + instant operation can be done in nocopy algorithm.

            thiru, your fix looked correct to me. I took the liberty to improve some comments and simplify the code. Please check if it looks OK to you.

            marko Marko Mäkelä added a comment - thiru , your fix looked correct to me. I took the liberty to improve some comments and simplify the code. Please check if it looks OK to you.

            In MariaDB Server 10.4.26 and 10.5.7, the bug will be fixed by refusing native ALTER TABLE when the collation of a column is changed such that it remains or becomes indexed:

            CREATE TABLE t1 (
              id INT PRIMARY KEY,
              msg VARCHAR(100) CHARACTER SET utf8 COLLATE utf8_swedish_ci UNIQUE
            ) ENGINE=INNODB;
            INSERT INTO t1 VALUES (1, 'aaa');
            INSERT INTO t1 VALUES (2, 'ååå');
            --error ER_ALTER_OPERATION_NOT_SUPPORTED_REASON
            ALTER TABLE t1
            MODIFY msg VARCHAR(100) CHARACTER SET utf8 COLLATE utf8_unicode_ci,
            ALGORITHM=NOCOPY;
            ALTER TABLE t1 DROP INDEX msg,
            MODIFY msg VARCHAR(100) CHARACTER SET utf8 COLLATE utf8_unicode_ci,
            ALGORITHM=NOCOPY;
            

            In MariaDB Server 10.6.9, the collation change will remain allowed, while any affected secondary indexes will be correctly (re)built online.

            marko Marko Mäkelä added a comment - In MariaDB Server 10.4.26 and 10.5.7, the bug will be fixed by refusing native ALTER TABLE when the collation of a column is changed such that it remains or becomes indexed: CREATE TABLE t1 ( id INT PRIMARY KEY , msg VARCHAR (100) CHARACTER SET utf8 COLLATE utf8_swedish_ci UNIQUE ) ENGINE=INNODB; INSERT INTO t1 VALUES (1, 'aaa' ); INSERT INTO t1 VALUES (2, 'ååå' ); --error ER_ALTER_OPERATION_NOT_SUPPORTED_REASON ALTER TABLE t1 MODIFY msg VARCHAR (100) CHARACTER SET utf8 COLLATE utf8_unicode_ci, ALGORITHM=NOCOPY; ALTER TABLE t1 DROP INDEX msg, MODIFY msg VARCHAR (100) CHARACTER SET utf8 COLLATE utf8_unicode_ci, ALGORITHM=NOCOPY; In MariaDB Server 10.6.9, the collation change will remain allowed, while any affected secondary indexes will be correctly (re)built online.

            People

              marko Marko Mäkelä
              hholzgra Hartmut Holzgraefe
              Votes:
              2 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.