Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32898

Phantom rows caused by UPDATE of PRIMARY KEY

Details

    Description

      Under REPEATABLE-READ isolation level,an UPDATE statement which update the value of primary key caused phantom rows in another transaction.
      How to repeat:

      /* init */ CREATE TABLE t(a INT PRIMARY KEY, b INT);
      /* init */ INSERT INTO t VALUES (1, 1);
      /* init */ INSERT INTO t VALUES (2, 2);
      /* t1 */ SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;
      /* t2 */ SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;
       
      /* t1 */ BEGIN;
      /* t1 */ SELECT * FROM t LOCK IN SHARE MODE;
      /* t2 */ BEGIN;
      /* t2 */ SELECT * FROM t;  -- [(1, 1), (2, 2)]
      /* t1 */ UPDATE t SET a=3 WHERE b = 2;
      /* t1 */ COMMIT;
      /* t2 */ UPDATE t SET b=3;
      /* t2 */ SELECT * FROM t; -- [(1, 3), (2, 2), (3, 3)] 
      /* t2 */ COMMIT;
      

      It appears that a phantom row (2, 2) showed up in the second consistent read of T2. And if you commit the second transaction, the phantom row will disappear. I'm not sure whether this is a new bug or a duplicate one. From the user's perspective, I haven't inserted a new row, updating existing rows should not result in phantom rows.

      Attachments

        Issue Links

          Activity

            zhuangliu Zhuang Liu added a comment -

            This bug can also be reproduced in MySQL8.2.0 and has been verified by MySQL Verification Team. mysql# #113228

            zhuangliu Zhuang Liu added a comment - This bug can also be reproduced in MySQL8.2.0 and has been verified by MySQL Verification Team. mysql# #113228

            Thank you. Here is my mtr version of this:

            --source include/have_innodb.inc
            CREATE TABLE t(a INT PRIMARY KEY, b INT) ENGINE=InnoDB;
            INSERT INTO t VALUES (1,1),(2,2);
            --connect con1,localhost,root
            BEGIN; SELECT * FROM t LOCK IN SHARE MODE;
            --connection default
            BEGIN;
            SELECT * FROM t;
            --connection con1
            UPDATE t SET a=3 WHERE b=2;
            COMMIT;
            --disconnect con1
            --connection default
            UPDATE t SET b=3;
            SELECT * FROM t;
            COMMIT;
            DROP TABLE t;
            

            10.6 b3a628c7d4ec5b765768bd374d83beea59880522

            CREATE TABLE t(a INT PRIMARY KEY, b INT) ENGINE=InnoDB;
            INSERT INTO t VALUES (1,1),(2,2);
            connect con1,localhost,root;
            BEGIN;
            SELECT * FROM t LOCK IN SHARE MODE;
            a	b
            1	1
            2	2
            connection default;
            BEGIN;
            SELECT * FROM t;
            a	b
            1	1
            2	2
            connection con1;
            UPDATE t SET a=3 WHERE b=2;
            COMMIT;
            disconnect con1;
            connection default;
            UPDATE t SET b=3;
            SELECT * FROM t;
            a	b
            1	3
            2	2
            3	3
            COMMIT;
            DROP TABLE t;
            

            I got the same result on the 10.4 branch as well.

            I did not explicitly change the transaction isolation level, because REPEATABLE READ is the default. As far as I remember, some people who worked in the MySQL Falcon project back when Innobase Oy was an Oracle subsidiary but MySQL AB wasn’t, called this InnoDB anomaly the WRITE COMMITTED isolation level. That is, locking reads in InnoDB always were essentially READ COMMITTED. All we want is to successfully acquire a lock.

            I had discussed a possible way of fixing this in MDEV-14589. If I change the test to use the SERIALIZABLE isolation level in both connections, then the first UPDATE will be blocked.

            I think that this bug has been reported earlier, and finally I found MDEV-26642 and MDEV-26643, which are very similar.

            marko Marko Mäkelä added a comment - Thank you. Here is my mtr version of this: --source include/have_innodb.inc CREATE TABLE t(a INT PRIMARY KEY , b INT ) ENGINE=InnoDB; INSERT INTO t VALUES (1,1),(2,2); --connect con1,localhost,root BEGIN ; SELECT * FROM t LOCK IN SHARE MODE; --connection default BEGIN ; SELECT * FROM t; --connection con1 UPDATE t SET a=3 WHERE b=2; COMMIT ; --disconnect con1 --connection default UPDATE t SET b=3; SELECT * FROM t; COMMIT ; DROP TABLE t; 10.6 b3a628c7d4ec5b765768bd374d83beea59880522 CREATE TABLE t(a INT PRIMARY KEY, b INT) ENGINE=InnoDB; INSERT INTO t VALUES (1,1),(2,2); connect con1,localhost,root; BEGIN; SELECT * FROM t LOCK IN SHARE MODE; a b 1 1 2 2 connection default; BEGIN; SELECT * FROM t; a b 1 1 2 2 connection con1; UPDATE t SET a=3 WHERE b=2; COMMIT; disconnect con1; connection default; UPDATE t SET b=3; SELECT * FROM t; a b 1 3 2 2 3 3 COMMIT; DROP TABLE t; I got the same result on the 10.4 branch as well. I did not explicitly change the transaction isolation level, because REPEATABLE READ is the default. As far as I remember, some people who worked in the MySQL Falcon project back when Innobase Oy was an Oracle subsidiary but MySQL AB wasn’t, called this InnoDB anomaly the WRITE COMMITTED isolation level. That is, locking reads in InnoDB always were essentially READ COMMITTED . All we want is to successfully acquire a lock. I had discussed a possible way of fixing this in MDEV-14589 . If I change the test to use the SERIALIZABLE isolation level in both connections, then the first UPDATE will be blocked. I think that this bug has been reported earlier, and finally I found MDEV-26642 and MDEV-26643 , which are very similar.

            Does this report look like a duplicate of MDEV-26642 or MDEV-26643 to you?

            marko Marko Mäkelä added a comment - Does this report look like a duplicate of MDEV-26642 or MDEV-26643 to you?
            zhuangliu Zhuang Liu added a comment -

            I agree that they are duplicated, they appear to be different phenomena caused by the same bug, but I am still puzzled by the occurrence of the phantom row. It seems that the first UPDATE statement actually deletes a row (2, 2) and than inserts a new row (3, 2). By the way, the statement 'SELECT * FROM t LOCK IN SHARE MODE' is unnecessary for reproducing the bug.

            zhuangliu Zhuang Liu added a comment - I agree that they are duplicated, they appear to be different phenomena caused by the same bug, but I am still puzzled by the occurrence of the phantom row. It seems that the first UPDATE statement actually deletes a row (2, 2) and than inserts a new row (3, 2). By the way, the statement 'SELECT * FROM t LOCK IN SHARE MODE' is unnecessary for reproducing the bug.

            I was able to fix my version with a simple patch:

            10.6 d06b6de3050180ec2f96ef00963d1beab8e1b47a with patch

            mysqltest: At line 14: query 'UPDATE t SET b=3' failed: ER_LOCK_DEADLOCK (1213): Deadlock found when trying to get lock; try restarting transaction
            

            The patch will flag an error when an attempt is made to lock a PRIMARY KEY record that is not visible in the current read view. For now, I used an existing error code for this.

            diff --git a/storage/innobase/include/row0row.h b/storage/innobase/include/row0row.h
            index a1350740e2a..7056c77f2e6 100644
            --- a/storage/innobase/include/row0row.h
            +++ b/storage/innobase/include/row0row.h
            @@ -370,6 +370,12 @@ row_search_index_entry(
             	mtr_t*		mtr)	/*!< in: mtr */
             	MY_ATTRIBUTE((nonnull, warn_unused_result));
             
            +/** Get the byte offset of the DB_TRX_ID column
            +@param[in]	rec	clustered index record
            +@param[in]	index	clustered index
            +@return	the byte offset of DB_TRX_ID, from the start of rec */
            +ulint row_trx_id_offset(const rec_t* rec, const dict_index_t* index);
            +
             #define ROW_COPY_DATA		1
             #define ROW_COPY_POINTERS	2
             
            diff --git a/storage/innobase/lock/lock0lock.cc b/storage/innobase/lock/lock0lock.cc
            index c9072998e66..3cc4954eafc 100644
            --- a/storage/innobase/lock/lock0lock.cc
            +++ b/storage/innobase/lock/lock0lock.cc
            @@ -5924,6 +5924,12 @@ lock_clust_rec_read_check_and_lock(
             		return DB_SUCCESS;
             	}
             
            +	if (heap_no > PAGE_HEAP_NO_SUPREMUM && trx->read_view.is_open()
            +	    && !trx->read_view.changes_visible(
            +		trx_read_trx_id(rec + row_trx_id_offset(rec, index)))) {
            +		return DB_DEADLOCK;
            +	}
            +
             	dberr_t err = lock_rec_lock(false, gap_mode | mode,
             				    block, heap_no, index, thr);
             
            diff --git a/storage/innobase/row/row0umod.cc b/storage/innobase/row/row0umod.cc
            index a01eaea516a..52f54443911 100644
            --- a/storage/innobase/row/row0umod.cc
            +++ b/storage/innobase/row/row0umod.cc
            @@ -190,7 +190,7 @@ row_undo_mod_clust_low(
             @param[in]	rec	clustered index record
             @param[in]	index	clustered index
             @return	the byte offset of DB_TRX_ID, from the start of rec */
            -static ulint row_trx_id_offset(const rec_t* rec, const dict_index_t* index)
            +ulint row_trx_id_offset(const rec_t* rec, const dict_index_t* index)
             {
             	ut_ad(index->n_uniq <= MAX_REF_PARTS);
             	ulint trx_id_offset = index->trx_id_offset;
            
            

            This patch will cause 6 existing regression tests to fail. I will have to review each failure carefully to see if the patch could be improved, or if the tests need to be adjusted.
            gcol.innodb_virtual_debug_purge
            innodb.innodb-isolation
            innodb.innodb_bug14007649
            innodb.innodb_mysql
            innodb.innodb_timeout_rollback
            main.deadlock_innodb

            marko Marko Mäkelä added a comment - I was able to fix my version with a simple patch: 10.6 d06b6de3050180ec2f96ef00963d1beab8e1b47a with patch mysqltest: At line 14: query 'UPDATE t SET b=3' failed: ER_LOCK_DEADLOCK (1213): Deadlock found when trying to get lock; try restarting transaction The patch will flag an error when an attempt is made to lock a PRIMARY KEY record that is not visible in the current read view. For now, I used an existing error code for this. diff --git a/storage/innobase/include/row0row.h b/storage/innobase/include/row0row.h index a1350740e2a..7056c77f2e6 100644 --- a/storage/innobase/include/row0row.h +++ b/storage/innobase/include/row0row.h @@ -370,6 +370,12 @@ row_search_index_entry( mtr_t* mtr) /*!< in: mtr */ MY_ATTRIBUTE((nonnull, warn_unused_result)); +/** Get the byte offset of the DB_TRX_ID column +@param[in] rec clustered index record +@param[in] index clustered index +@return the byte offset of DB_TRX_ID, from the start of rec */ +ulint row_trx_id_offset(const rec_t* rec, const dict_index_t* index); + #define ROW_COPY_DATA 1 #define ROW_COPY_POINTERS 2 diff --git a/storage/innobase/lock/lock0lock.cc b/storage/innobase/lock/lock0lock.cc index c9072998e66..3cc4954eafc 100644 --- a/storage/innobase/lock/lock0lock.cc +++ b/storage/innobase/lock/lock0lock.cc @@ -5924,6 +5924,12 @@ lock_clust_rec_read_check_and_lock( return DB_SUCCESS; } + if (heap_no > PAGE_HEAP_NO_SUPREMUM && trx->read_view.is_open() + && !trx->read_view.changes_visible( + trx_read_trx_id(rec + row_trx_id_offset(rec, index)))) { + return DB_DEADLOCK; + } + dberr_t err = lock_rec_lock(false, gap_mode | mode, block, heap_no, index, thr); diff --git a/storage/innobase/row/row0umod.cc b/storage/innobase/row/row0umod.cc index a01eaea516a..52f54443911 100644 --- a/storage/innobase/row/row0umod.cc +++ b/storage/innobase/row/row0umod.cc @@ -190,7 +190,7 @@ row_undo_mod_clust_low( @param[in] rec clustered index record @param[in] index clustered index @return the byte offset of DB_TRX_ID, from the start of rec */ -static ulint row_trx_id_offset(const rec_t* rec, const dict_index_t* index) +ulint row_trx_id_offset(const rec_t* rec, const dict_index_t* index) { ut_ad(index->n_uniq <= MAX_REF_PARTS); ulint trx_id_offset = index->trx_id_offset; This patch will cause 6 existing regression tests to fail. I will have to review each failure carefully to see if the patch could be improved, or if the tests need to be adjusted. gcol.innodb_virtual_debug_purge innodb.innodb-isolation innodb.innodb_bug14007649 innodb.innodb_mysql innodb.innodb_timeout_rollback main.deadlock_innodb

            I checked the test failures:

            CURRENT_TEST: innodb.innodb_bug14007649
            mysqltest: At line 42: query 'update t1 set f2 = 6 where f1 = 1 and f2 is null' failed: ER_LOCK_DEADLOCK (1213): Deadlock found when trying to get lock; try restarting transaction
            

            The record had been inserted by connection b at line 29, insert into t1 values (3, 1, null). The transaction was started and committed after between start transaction with consistent snapshot and the failure in connection a. This test needs to be adjusted, or maybe removed.

            CURRENT_TEST: innodb.innodb_timeout_rollback
            mysqltest: In included file "./include/innodb_rollback_on_timeout.inc": 
            included from /mariadb/10.6/mysql-test/suite/innodb/t/innodb_timeout_rollback.test at line 3:
            At line 29: query 'insert into t1 values (2)' failed with wrong errno ER_LOCK_DEADLOCK (1213): 'Deadlock found when trying to get lock; try restarting transaction', instead of ER_LOCK_WAIT_TIMEOUT (1205)...
            

            The INSERT in con2 was not committed yet, so it will not be in the read view of con1. We could avoid this test failure if the patch didn’t validate the visibility before waiting for the lock (which would time out). Better, we could simply remove the read view creation from con1 to make the test pass:

            diff --git a/mysql-test/include/innodb_rollback_on_timeout.inc b/mysql-test/include/innodb_rollback_on_timeout.inc
            index 274bbe12566..883b0820589 100644
            --- a/mysql-test/include/innodb_rollback_on_timeout.inc
            +++ b/mysql-test/include/innodb_rollback_on_timeout.inc
            @@ -22,7 +22,6 @@ select * from t1;
             connection con1;
             begin work;
             insert into t1 values (5);
            -select * from t1;
             # Lock wait timeout set to 2 seconds in <THIS TEST>-master.opt; this
             # statement will time out; in 5.0.13+, it will not roll back transaction.
             --error ER_LOCK_WAIT_TIMEOUT
            

            CURRENT_TEST: gcol.innodb_virtual_debug_purge
            mysqltest: At line 151: query 'DELETE FROM t1 WHERE a = 3' failed: ER_LOCK_DEADLOCK (1213): Deadlock found when trying to get lock; try restarting transaction
            

            The record had been modified by DELETE FROM t1 WHERE a = 1 in connection default in line 139. The fix is simple: we do not actually need a read view in this transaction:

            diff --git a/mysql-test/suite/gcol/t/innodb_virtual_debug_purge.test b/mysql-test/suite/gcol/t/innodb_virtual_debug_purge.test
            index 09fba0285c7..7966953535c 100644
            --- a/mysql-test/suite/gcol/t/innodb_virtual_debug_purge.test
            +++ b/mysql-test/suite/gcol/t/innodb_virtual_debug_purge.test
            @@ -131,9 +131,8 @@ CREATE TABLE t1 (a INT, b INT, c INT GENERATED ALWAYS AS(a+b));
             
             INSERT INTO t1(a, b) VALUES (1, 1), (2, 2), (3, 3), (4, 4);
             
            -connection con1;
            ---echo # disable purge
            -BEGIN; SELECT * FROM t0;
            +connect (stop_purge,localhost,root,,);
            +START TRANSACTION WITH CONSISTENT SNAPSHOT;
             
             connection default;
             DELETE FROM t1 WHERE a = 1;
            @@ -148,13 +147,14 @@ send ALTER TABLE t1 ADD INDEX idx (c), ALGORITHM=INPLACE, LOCK=NONE;
             connection con1;
             SET DEBUG_SYNC= 'now WAIT_FOR uncommitted';
             
            +BEGIN;
             DELETE FROM t1 WHERE a = 3;
             
             UPDATE t1 SET a = 7, b = 7 WHERE a = 4;
             
             INSERT INTO t1(a, b) VALUES (8, 8);
             
            ---echo # enable purge
            +disconnect stop_purge;
             COMMIT;
             
             --echo # wait for purge to process the deleted/updated records.
            

            CURRENT_TEST: innodb.innodb-isolation
            mysqltest: At line 156: query 'SELECT c1, c2 FROM t1 WHERE c1 < ((SELECT COUNT(*) FROM t1) / 2) FOR UPDATE' failed with wrong errno ER_LOCK_DEADLOCK (1213): 'Deadlock found when trying to get lock; try restarting transaction', instead of ER_LOCK_WAIT_TIMEOUT (1205)...
            

            The record had been modified by UPDATE t1 SET c2 = c2 * 3 WHERE c1 = 1 in line 51 in the default connection. This transaction is still open. This test would need to be shrunk or removed to adjust for the code change.

            CURRENT_TEST: main.deadlock_innodb
            mysqltest: In included file "./include/deadlock.inc": 
            included from /mariadb/10.6/mysql-test/main/deadlock_innodb.test at line 17:
            At line 117: query 'reap' failed: ER_LOCK_DEADLOCK (1213): Deadlock found when trying to get lock; try restarting transaction
            

            The record had been modified by update t1 set x=1 where id = 0 in line 112. The fix is to avoid creating a read view:

            diff --git a/mysql-test/include/deadlock.inc b/mysql-test/include/deadlock.inc
            index abf217aea75..362d456e3f2 100644
            --- a/mysql-test/include/deadlock.inc
            +++ b/mysql-test/include/deadlock.inc
            @@ -103,7 +103,6 @@ connection con2;
             
             # The following query should hang because con1 is locking the record
             update t2 set a=2 where b = 0;
            -select * from t2;
             --send
             update t1 set x=2 where id = 0;
             --sleep 2
            

            CURRENT_TEST: innodb.innodb_mysql
            mysqltest: In included file "./include/innodb_rollback_on_timeout.inc": 
            included from ./include/mix1.inc at line 482:
            included from /mariadb/10.6/mysql-test/suite/innodb/t/innodb_mysql.test at line 19:
            At line 29: query 'insert into t1 values (2)' failed with wrong errno ER_LOCK_DEADLOCK (1213): 'Deadlock found when trying to get lock; try restarting transaction', instead of ER_LOCK_WAIT_TIMEOUT (1205)...
            

            This is fixed by removing a read view creation (the same fix as innodb.innodb_timeout_rollback).

            I pushed this for some initial testing. If everything goes as expected, pre-built packages of this change should be available in https://ci.mariadb.org/42468/ within an hour or two.

            marko Marko Mäkelä added a comment - I checked the test failures: CURRENT_TEST: innodb.innodb_bug14007649 mysqltest: At line 42: query 'update t1 set f2 = 6 where f1 = 1 and f2 is null' failed: ER_LOCK_DEADLOCK (1213): Deadlock found when trying to get lock; try restarting transaction The record had been inserted by connection b at line 29, insert into t1 values (3, 1, null) . The transaction was started and committed after between start transaction with consistent snapshot and the failure in connection a . This test needs to be adjusted, or maybe removed. CURRENT_TEST: innodb.innodb_timeout_rollback mysqltest: In included file "./include/innodb_rollback_on_timeout.inc": included from /mariadb/10.6/mysql-test/suite/innodb/t/innodb_timeout_rollback.test at line 3: At line 29: query 'insert into t1 values (2)' failed with wrong errno ER_LOCK_DEADLOCK (1213): 'Deadlock found when trying to get lock; try restarting transaction', instead of ER_LOCK_WAIT_TIMEOUT (1205)... The INSERT in con2 was not committed yet, so it will not be in the read view of con1 . We could avoid this test failure if the patch didn’t validate the visibility before waiting for the lock (which would time out). Better, we could simply remove the read view creation from con1 to make the test pass: diff --git a/mysql-test/include/innodb_rollback_on_timeout.inc b/mysql-test/include/innodb_rollback_on_timeout.inc index 274bbe12566..883b0820589 100644 --- a/mysql-test/include/innodb_rollback_on_timeout.inc +++ b/mysql-test/include/innodb_rollback_on_timeout.inc @@ -22,7 +22,6 @@ select * from t1; connection con1; begin work; insert into t1 values (5); -select * from t1; # Lock wait timeout set to 2 seconds in <THIS TEST>-master.opt; this # statement will time out; in 5.0.13+, it will not roll back transaction. --error ER_LOCK_WAIT_TIMEOUT CURRENT_TEST: gcol.innodb_virtual_debug_purge mysqltest: At line 151: query 'DELETE FROM t1 WHERE a = 3' failed: ER_LOCK_DEADLOCK (1213): Deadlock found when trying to get lock; try restarting transaction The record had been modified by DELETE FROM t1 WHERE a = 1 in connection default in line 139. The fix is simple: we do not actually need a read view in this transaction: diff --git a/mysql-test/suite/gcol/t/innodb_virtual_debug_purge.test b/mysql-test/suite/gcol/t/innodb_virtual_debug_purge.test index 09fba0285c7..7966953535c 100644 --- a/mysql-test/suite/gcol/t/innodb_virtual_debug_purge.test +++ b/mysql-test/suite/gcol/t/innodb_virtual_debug_purge.test @@ -131,9 +131,8 @@ CREATE TABLE t1 (a INT, b INT, c INT GENERATED ALWAYS AS(a+b)); INSERT INTO t1(a, b) VALUES (1, 1), (2, 2), (3, 3), (4, 4); -connection con1; ---echo # disable purge -BEGIN; SELECT * FROM t0; +connect (stop_purge,localhost,root,,); +START TRANSACTION WITH CONSISTENT SNAPSHOT; connection default; DELETE FROM t1 WHERE a = 1; @@ -148,13 +147,14 @@ send ALTER TABLE t1 ADD INDEX idx (c), ALGORITHM=INPLACE, LOCK=NONE; connection con1; SET DEBUG_SYNC= 'now WAIT_FOR uncommitted'; +BEGIN; DELETE FROM t1 WHERE a = 3; UPDATE t1 SET a = 7, b = 7 WHERE a = 4; INSERT INTO t1(a, b) VALUES (8, 8); ---echo # enable purge +disconnect stop_purge; COMMIT; --echo # wait for purge to process the deleted/updated records. CURRENT_TEST: innodb.innodb-isolation mysqltest: At line 156: query 'SELECT c1, c2 FROM t1 WHERE c1 < ((SELECT COUNT(*) FROM t1) / 2) FOR UPDATE' failed with wrong errno ER_LOCK_DEADLOCK (1213): 'Deadlock found when trying to get lock; try restarting transaction', instead of ER_LOCK_WAIT_TIMEOUT (1205)... The record had been modified by UPDATE t1 SET c2 = c2 * 3 WHERE c1 = 1 in line 51 in the default connection. This transaction is still open. This test would need to be shrunk or removed to adjust for the code change. CURRENT_TEST: main.deadlock_innodb mysqltest: In included file "./include/deadlock.inc": included from /mariadb/10.6/mysql-test/main/deadlock_innodb.test at line 17: At line 117: query 'reap' failed: ER_LOCK_DEADLOCK (1213): Deadlock found when trying to get lock; try restarting transaction The record had been modified by update t1 set x=1 where id = 0 in line 112. The fix is to avoid creating a read view: diff --git a/mysql-test/include/deadlock.inc b/mysql-test/include/deadlock.inc index abf217aea75..362d456e3f2 100644 --- a/mysql-test/include/deadlock.inc +++ b/mysql-test/include/deadlock.inc @@ -103,7 +103,6 @@ connection con2; # The following query should hang because con1 is locking the record update t2 set a=2 where b = 0; -select * from t2; --send update t1 set x=2 where id = 0; --sleep 2 CURRENT_TEST: innodb.innodb_mysql mysqltest: In included file "./include/innodb_rollback_on_timeout.inc": included from ./include/mix1.inc at line 482: included from /mariadb/10.6/mysql-test/suite/innodb/t/innodb_mysql.test at line 19: At line 29: query 'insert into t1 values (2)' failed with wrong errno ER_LOCK_DEADLOCK (1213): 'Deadlock found when trying to get lock; try restarting transaction', instead of ER_LOCK_WAIT_TIMEOUT (1205)... This is fixed by removing a read view creation (the same fix as innodb.innodb_timeout_rollback ). I pushed this for some initial testing . If everything goes as expected, pre-built packages of this change should be available in https://ci.mariadb.org/42468/ within an hour or two.

            My code change seems to also fix MDEV-26642 but not MDEV-26643.

            marko Marko Mäkelä added a comment - My code change seems to also fix MDEV-26642 but not MDEV-26643 .

            Come to think of it, I think that reporting the somewhat surprising ER_LOCK_DEADLOCK might be acceptable. To avoid any breakage of existing applications, we might want to introduce a Boolean global variable innodb_strict_isolation (default OFF).

            marko Marko Mäkelä added a comment - Come to think of it, I think that reporting the somewhat surprising ER_LOCK_DEADLOCK might be acceptable. To avoid any breakage of existing applications, we might want to introduce a Boolean global variable innodb_strict_isolation (default OFF ).

            I filed MDEV-33332 for the SIGSEGV that mleich reported inside buf_read_ahead_linear(). It is unrelated to this change.

            marko Marko Mäkelä added a comment - I filed MDEV-33332 for the SIGSEGV that mleich reported inside buf_read_ahead_linear() . It is unrelated to this change.

            The idea of the fix looks correct to me. The summary of what we already discussed in slack and what is also described in commit message:

            1. It would be good to have new error code and error message to avoid users confusing. We should also take care about replication events replaying on slaves when the error happens(see slave_transaction_retry_errors[] array).

            2. We need to introduce additional option, which would switch on the functionality, and add combination with and without the option for the failed mtr test cases.

            vlad.lesin Vladislav Lesin added a comment - The idea of the fix looks correct to me. The summary of what we already discussed in slack and what is also described in commit message: 1. It would be good to have new error code and error message to avoid users confusing. We should also take care about replication events replaying on slaves when the error happens(see slave_transaction_retry_errors[] array). 2. We need to introduce additional option, which would switch on the functionality, and add combination with and without the option for the failed mtr test cases.

            I found that there actually exists an error code that is applicable to exactly this scenario: HA_ERR_RECORD_CHANGED and ER_CHECKREAD. It had been previously implemented in ENGINE=Memory, ENGINE=MyISAM, ENGINE=Aria. None of these storage engines support transaction rollback; ENGINE=Aria does it on crash recovery only.

            The best parameter name that I could come up for enabling these and possible similar future checks is innodb_strict_isolation.

            marko Marko Mäkelä added a comment - I found that there actually exists an error code that is applicable to exactly this scenario: HA_ERR_RECORD_CHANGED and ER_CHECKREAD . It had been previously implemented in ENGINE=Memory , ENGINE=MyISAM , ENGINE=Aria . None of these storage engines support transaction rollback; ENGINE=Aria does it on crash recovery only. The best parameter name that I could come up for enabling these and possible similar future checks is innodb_strict_isolation .
            vlad.lesin Vladislav Lesin added a comment - https://github.com/MariaDB/server/pull/3067#pullrequestreview-1947147879

            serg has requested the performance for both values of innodb_snapshot_isolation to be tested, compared to the baseline. I do not expect much difference; in fact, I tested innodb_snapshot_isolation=OFF back in January and did not notice any change of throughput. For the case innodb_snapshot_isolation=ON we would be interested in the ‘optimistic’ case where transactions are not being rolled back due to the new error ER_CHECKREAD.

            marko Marko Mäkelä added a comment - serg has requested the performance for both values of innodb_snapshot_isolation to be tested, compared to the baseline. I do not expect much difference; in fact, I tested innodb_snapshot_isolation=OFF back in January and did not notice any change of throughput. For the case innodb_snapshot_isolation=ON we would be interested in the ‘optimistic’ case where transactions are not being rolled back due to the new error ER_CHECKREAD .
            axel Axel Schwenke added a comment -

            The results are somewhat surprising. The baseline and innodb_snapshot_isolation=on show about the same performance where innodb_snapshot_isolation=off falls back a bit. The averages are ~5750 tps and ~5600 tps.

            I double checked the configuration and plotted even the counters for Handler_commit and Handler_rollback. But the results are real. As expected we get more rollbacks for innodb_snapshot_isolation=on.

            Attached: tpcc1.pdf

            axel Axel Schwenke added a comment - The results are somewhat surprising. The baseline and innodb_snapshot_isolation=on show about the same performance where innodb_snapshot_isolation=off falls back a bit. The averages are ~5750 tps and ~5600 tps. I double checked the configuration and plotted even the counters for Handler_commit and Handler_rollback . But the results are real. As expected we get more rollbacks for innodb_snapshot_isolation=on . Attached: tpcc1.pdf

            So, 3x more rollbacks (and Monty said it'll be ~3x), as expected.

            axel, how was your tps calculated? Are rollbacks count towards tps? Are rolled back transactions re-applied?

            serg Sergei Golubchik added a comment - So, 3x more rollbacks (and Monty said it'll be ~3x), as expected. axel , how was your tps calculated? Are rollbacks count towards tps? Are rolled back transactions re-applied?
            axel Axel Schwenke added a comment -

            how was your tps calculated? Are rollbacks count towards tps?

            The TPS numbers come from sysbench. It's simply the number of executions of the event() function in the LUA script. Which for TPC-C calls one of 5 functions for one of 5 transactions. Of those only the NEW ORDER transaction does an explicit ROLLBACK. And AFAICS that rollback is not connected to an SQL error, but to an empty result.

            I have now added plots of Com_commit and Com_rollback counters (in tpcc1.pdf) . If I compare the TPS plot with Com_commit the latter is a bit lower. That would mean a rollback is counted towards TPS. But this changes nothing of the throughput comparison of the 3 commits/configurations.

            The numbers for Handler_rollback are higher than those for Com_rollback - but only for the case innodb_snapshot_isolation=on. Those rollbacks must be implicit then and are ignored by sysbench. However sysbench reports those errors independently.

            For innodb_snapshot_isolation=off:

            SQL statistics:
                queries performed:
                    read:                            262257215
                    write:                           272207592
                    other:                           40437558
                    total:                           574902365
                transactions:                        20218683 (5616.26 per sec.)
                queries:                             574902365 (159693.89 per sec.)
                ignored errors:                      87982  (24.44 per sec.)
                reconnects:                          0      (0.00 per sec.)
            

            And innodb_snapshot_isolation=on:

            SQL statistics:
                queries performed:
                    read:                            268903843
                    write:                           279039940
                    other:                           41682778
                    total:                           589626561
                transactions:                        20730792 (5758.51 per sec.)
                queries:                             589626561 (163783.92 per sec.)
                ignored errors:                      200264 (55.63 per sec.)
                reconnects:                          0      (0.00 per sec.)
            

            Are rolled back transactions re-applied?

            I can see that nowhere in the LUA code.

            axel Axel Schwenke added a comment - how was your tps calculated? Are rollbacks count towards tps? The TPS numbers come from sysbench. It's simply the number of executions of the event() function in the LUA script. Which for TPC-C calls one of 5 functions for one of 5 transactions. Of those only the NEW ORDER transaction does an explicit ROLLBACK. And AFAICS that rollback is not connected to an SQL error, but to an empty result. I have now added plots of Com_commit and Com_rollback counters (in tpcc1.pdf ) . If I compare the TPS plot with Com_commit the latter is a bit lower. That would mean a rollback is counted towards TPS. But this changes nothing of the throughput comparison of the 3 commits/configurations. The numbers for Handler_rollback are higher than those for Com_rollback - but only for the case innodb_snapshot_isolation=on . Those rollbacks must be implicit then and are ignored by sysbench. However sysbench reports those errors independently. For innodb_snapshot_isolation=off : SQL statistics: queries performed: read: 262257215 write: 272207592 other: 40437558 total: 574902365 transactions: 20218683 (5616.26 per sec.) queries: 574902365 (159693.89 per sec.) ignored errors: 87982 (24.44 per sec.) reconnects: 0 (0.00 per sec.) And innodb_snapshot_isolation=on : SQL statistics: queries performed: read: 268903843 write: 279039940 other: 41682778 total: 589626561 transactions: 20730792 (5758.51 per sec.) queries: 589626561 (163783.92 per sec.) ignored errors: 200264 (55.63 per sec.) reconnects: 0 (0.00 per sec.) Are rolled back transactions re-applied? I can see that nowhere in the LUA code.

            I cannot explain why the number of commits stayed the same even though the number of rollbacks increased 3x

            serg Sergei Golubchik added a comment - I cannot explain why the number of commits stayed the same even though the number of rollbacks increased 3x

            For what it is worth, in InnoDB a rollback is like a commit, with the added step of undoing all changes. It seems to me that Com_commit is defined in com_status_vars as the following entry:

              {"commit",               STMT_STATUS(SQLCOM_COMMIT)},
            

            This is something that would be updated outside InnoDB. Maybe the Lua scripts keep executing COMMIT statements after ignoring ER_CHECKREAD errors, or maybe autocommit transactions that are in fact rolled back due to an error will be counted as committed.

            marko Marko Mäkelä added a comment - For what it is worth, in InnoDB a rollback is like a commit, with the added step of undoing all changes. It seems to me that Com_commit is defined in com_status_vars as the following entry: { "commit" , STMT_STATUS(SQLCOM_COMMIT)}, This is something that would be updated outside InnoDB. Maybe the Lua scripts keep executing COMMIT statements after ignoring ER_CHECKREAD errors, or maybe autocommit transactions that are in fact rolled back due to an error will be counted as committed.

            People

              axel Axel Schwenke
              zhuangliu Zhuang Liu
              Votes:
              1 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.