Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30046

wrong row targeted with "insert ... on duplicate" and "replace", leading to data corruption

Details

    Description

      (Also reported here: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1015293)

      Using the MySQL interface, these statements:

      DROP TABLE IF EXISTS t;
      CREATE TABLE t (s BLOB, n INT, UNIQUE (s));
      INSERT INTO t VALUES ('Hrecvx_0004ln-00',1), ('Hrecvx_0004mm-00',1);
      INSERT INTO t VALUES ('Hrecvx_0004mm-00',2) ON DUPLICATE KEY UPDATE n = VALUES (n);
      SELECT * FROM t;
      

      produce this output:

      s n
      Hrecvx_0004ln-00 2
      Hrecvx_0004mm-00 1

      So the latter "INSERT" updates the wrong row.

      This happens whether the first column is "BLOB" or "TEXT", but only
      with specific values. (In my actual use case with ~1 million rows,
      it happened a few dozen times, which might be consistent e.g. with
      collisions of a 32 bit hash or so.)

      Likewise, these statements:

      DROP TABLE IF EXISTS t;
      CREATE TABLE t (s BLOB, n INT, UNIQUE (s));
      INSERT INTO t VALUES ('Hrecvx_0004ln-00',1), ('Hrecvx_0004mm-00',1);
      REPLACE INTO t VALUES ('Hrecvx_0004mm-00',2);
      SELECT * FROM t;
      

      give the error:

      ERROR 1062 (23000) at line 4: Duplicate entry 'Hrecvx_0004mm-00' for key 's'

      In my understanding, this error should actually be impossible with
      "REPLACE INTO".

      It might be the same issue, i.e. it tries to delete the wrong row
      before inserting the new one, so it's still duplicate.

      Attachments

        Issue Links

          Activity

            alice Alice Sherepa added a comment - - edited

            Thanks for the report! I repeated as described on 10.4-10.11, InnoDB. Probably caused by MDEV-371

            --source include/have_innodb.inc 
            CREATE TABLE t (s BLOB, n INT, UNIQUE (s)) engine=innodb;
            INSERT INTO t VALUES ('Hrecvx_0004ln-00',1), ('Hrecvx_0004mm-00',1);
            REPLACE INTO t VALUES ('Hrecvx_0004mm-00',2);
            

            10.4 0d586d62e5326053383

            CURRENT_TEST: main.1_my
            mysqltest: At line 9: query 'REPLACE INTO t VALUES ('Hrecvx_0004mm-00',2)' failed: 1062: Duplicate entry 'Hrecvx_0004mm-00' for key 's'
            

            CREATE TABLE t (s BLOB, n INT, UNIQUE (s)) engine=innodb;
            INSERT INTO t VALUES ('Hrecvx_0004ln-00',1), ('Hrecvx_0004mm-00',1);
            INSERT INTO t VALUES ('Hrecvx_0004mm-00',2) ON DUPLICATE KEY UPDATE n = VALUES(n) ;
            SELECT * FROM t;
            

            SELECT * FROM t;
            s	n
            Hrecvx_0004ln-00	2
            Hrecvx_0004mm-00	1
            

            alice Alice Sherepa added a comment - - edited Thanks for the report! I repeated as described on 10.4-10.11, InnoDB. Probably caused by MDEV-371 --source include/have_innodb.inc CREATE TABLE t (s BLOB, n INT , UNIQUE (s)) engine=innodb; INSERT INTO t VALUES ( 'Hrecvx_0004ln-00' ,1), ( 'Hrecvx_0004mm-00' ,1); REPLACE INTO t VALUES ( 'Hrecvx_0004mm-00' ,2); 10.4 0d586d62e5326053383 CURRENT_TEST: main.1_my mysqltest: At line 9: query 'REPLACE INTO t VALUES ('Hrecvx_0004mm-00',2)' failed: 1062: Duplicate entry 'Hrecvx_0004mm-00' for key 's' CREATE TABLE t (s BLOB, n INT , UNIQUE (s)) engine=innodb; INSERT INTO t VALUES ( 'Hrecvx_0004ln-00' ,1), ( 'Hrecvx_0004mm-00' ,1); INSERT INTO t VALUES ( 'Hrecvx_0004mm-00' ,2) ON DUPLICATE KEY UPDATE n = VALUES (n) ; SELECT * FROM t; SELECT * FROM t; s n Hrecvx_0004ln-00 2 Hrecvx_0004mm-00 1

            Works fine with MyISAM:

            CREATE OR REPLACE TABLE t1 (s BLOB, n INT, UNIQUE (s)) engine=MyISAM;
            INSERT INTO t1 VALUES  ('Hrecvx_0004ln-00',1),
                                   ('Hrecvx_0004mm-00',1);
            REPLACE INTO t1 VALUES ('Hrecvx_0004mm-00',2);
            SELECT * FROM t1;
            

            +------------------+------+
            | s                | n    |
            +------------------+------+
            | Hrecvx_0004ln-00 |    1 |
            | Hrecvx_0004mm-00 |    2 |
            +------------------+------+
            

            CREATE OR REPLACE TABLE t1 (s BLOB, n INT, UNIQUE (s)) engine=MyISAM;
            INSERT INTO t1 VALUES ('Hrecvx_0004ln-00',1), ('Hrecvx_0004mm-00',1);
            INSERT INTO t1 VALUES ('Hrecvx_0004mm-00',2) ON DUPLICATE KEY UPDATE n = VALUE(n) ;
            SELECT * FROM t1;
            

            +------------------+------+
            | s                | n    |
            +------------------+------+
            | Hrecvx_0004ln-00 |    1 |
            | Hrecvx_0004mm-00 |    2 |
            +------------------+------+
            

            bar Alexander Barkov added a comment - Works fine with MyISAM: CREATE OR REPLACE TABLE t1 (s BLOB, n INT , UNIQUE (s)) engine=MyISAM; INSERT INTO t1 VALUES ( 'Hrecvx_0004ln-00' ,1), ( 'Hrecvx_0004mm-00' ,1); REPLACE INTO t1 VALUES ( 'Hrecvx_0004mm-00' ,2); SELECT * FROM t1; +------------------+------+ | s | n | +------------------+------+ | Hrecvx_0004ln-00 | 1 | | Hrecvx_0004mm-00 | 2 | +------------------+------+ CREATE OR REPLACE TABLE t1 (s BLOB, n INT , UNIQUE (s)) engine=MyISAM; INSERT INTO t1 VALUES ( 'Hrecvx_0004ln-00' ,1), ( 'Hrecvx_0004mm-00' ,1); INSERT INTO t1 VALUES ( 'Hrecvx_0004mm-00' ,2) ON DUPLICATE KEY UPDATE n = VALUE(n) ; SELECT * FROM t1; +------------------+------+ | s | n | +------------------+------+ | Hrecvx_0004ln-00 | 1 | | Hrecvx_0004mm-00 | 2 | +------------------+------+

            Thanks bar for the pointers, the bug with IDEMPOTENT replication was found with long uniques, which uses the same logic as REPLACE, and even contains the copy-paste from there. I guess it was made so to allocate and reuse the key buffer memory on the stack, saving from extra malloc. I think our priority now is to minimize stack usage vs extra mallocs, so it's not the point anymore.

            I made some refactoring and extracted common code. Also moved handler's RND search initialization to Write_rows_log_event::do_before_row_operations. This fixed many long unique bugs in relplication and optimized the use a little bit.

            https://github.com/MariaDB/server/commit/703e73e221a42638f2f05379124b35c57482da93

            One more refactoring should be done to generalize handler initialization (and de-initialization) across REPLACE, LOAD DATA, IDEMPOTENT replication, and improve memory usage.

            nikitamalyavin Nikita Malyavin added a comment - Thanks bar for the pointers, the bug with IDEMPOTENT replication was found with long uniques, which uses the same logic as REPLACE, and even contains the copy-paste from there. I guess it was made so to allocate and reuse the key buffer memory on the stack, saving from extra malloc. I think our priority now is to minimize stack usage vs extra mallocs, so it's not the point anymore. I made some refactoring and extracted common code. Also moved handler's RND search initialization to Write_rows_log_event::do_before_row_operations . This fixed many long unique bugs in relplication and optimized the use a little bit. https://github.com/MariaDB/server/commit/703e73e221a42638f2f05379124b35c57482da93 One more refactoring should be done to generalize handler initialization (and de-initialization) across REPLACE, LOAD DATA, IDEMPOTENT replication, and improve memory usage.

            The commit https://github.com/MariaDB/server/commit/703e73e221a42638f2f05379124b35c57482da93 was never in a pull request nor included on any branch, and `git log -S MDEV-30046` does not yield any results from 10.5 branch, so I assume this issue is still open (and thus will also continue to keep https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1015293 open).

            otto Otto Kekäläinen added a comment - The commit https://github.com/MariaDB/server/commit/703e73e221a42638f2f05379124b35c57482da93 was never in a pull request nor included on any branch, and `git log -S MDEV-30046 ` does not yield any results from 10.5 branch, so I assume this issue is still open (and thus will also continue to keep https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1015293 open).

            The refactoring took a significant amount of time, but I think it will be very beneficial.

            We had two implementations of the replace behavior, one is usual, and another one is an idempotent insert. The latter was many fixes behind the primary one.

            First, the implementations should have been joined. I didn't stop only on that, though.
            I also reworked the implementation to a better readability, as well as, as I believe, maintainability.
            Also I have optimized it in several places.

            The commit message will bring the better detail.

            serg, please review commits the following commits

            013fbe19 MDEV-30046 Refactor write_record and fix idempotent replication
            aae554b1 MDEV-30046 wrong row targeted with "insert ... on duplicate" and "replace"
            

            Or take a look at the same-titled commits at branch bb-10.5-nikita-MDEV-30046.

            They are best to view with commit

            9f2ef5d4 MDEV-15990 REPLACE on a precise-versioned table returns ER_DUP_ENTRY
            

            as it further simplifies the REPLACE implementation, however it goes for review to midenok. Anyway, comments are welcome.

            nikitamalyavin Nikita Malyavin added a comment - The refactoring took a significant amount of time, but I think it will be very beneficial. We had two implementations of the replace behavior, one is usual, and another one is an idempotent insert. The latter was many fixes behind the primary one. First, the implementations should have been joined. I didn't stop only on that, though. I also reworked the implementation to a better readability, as well as, as I believe, maintainability. Also I have optimized it in several places. The commit message will bring the better detail. serg , please review commits the following commits 013fbe19 MDEV-30046 Refactor write_record and fix idempotent replication aae554b1 MDEV-30046 wrong row targeted with "insert ... on duplicate" and "replace" Or take a look at the same-titled commits at branch bb-10.5-nikita- MDEV-30046 . They are best to view with commit 9f2ef5d4 MDEV-15990 REPLACE on a precise-versioned table returns ER_DUP_ENTRY as it further simplifies the REPLACE implementation, however it goes for review to midenok . Anyway, comments are welcome.

            https://github.com/MariaDB/server/commit/972f6941df8b09b523f7f14d2465cee39a98fecb is ok to push, preferably with a clearer comment in sql_insert.cc.

            The refactoring commit is generally moving in the right direction, but it won't be ready before the release, so I don't think the actual bug fix should wait for it.

            serg Sergei Golubchik added a comment - https://github.com/MariaDB/server/commit/972f6941df8b09b523f7f14d2465cee39a98fecb is ok to push, preferably with a clearer comment in sql_insert.cc . The refactoring commit is generally moving in the right direction, but it won't be ready before the release, so I don't think the actual bug fix should wait for it.

            For the record, this was merged in as https://github.com/MariaDB/server/commit/72429cad7f5448ad680d8df7b55aa055462d5964 (not visible in this Jira)

            otto Otto Kekäläinen added a comment - For the record, this was merged in as https://github.com/MariaDB/server/commit/72429cad7f5448ad680d8df7b55aa055462d5964 (not visible in this Jira)

            People

              nikitamalyavin Nikita Malyavin
              fh Frank Heckenbach
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.