[MDEV-32898] Phantom rows caused by UPDATE of PRIMARY KEY - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 11.2.2, 10.4(EOL), 10.5, 10.6, 10.11, 11.0(EOL), 11.1(EOL), 11.2(EOL)
Fix Version/s: 10.11.8, 10.6.18, 11.0.6, 11.1.5, 11.2.4, 11.4.2
Component/s: Storage Engine - InnoDB
Labels:
- innodb
Environment:
Ubuntu 22.04

Description

Under REPEATABLE-READ isolation level，an UPDATE statement which update the value of primary key caused phantom rows in another transaction.
How to repeat:

/* init */ CREATE TABLE t(a INT PRIMARY KEY, b INT);

/* init */ INSERT INTO t VALUES (1, 1);

/* init */ INSERT INTO t VALUES (2, 2);

/* t1 */ SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;

/* t2 */ SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;

/* t1 */ BEGIN;

/* t1 */ SELECT * FROM t LOCK IN SHARE MODE;

/* t2 */ BEGIN;

/* t2 */ SELECT * FROM t;  -- [(1, 1), (2, 2)]

/* t1 */ UPDATE t SET a=3 WHERE b = 2;

/* t1 */ COMMIT;

/* t2 */ UPDATE t SET b=3;

/* t2 */ SELECT * FROM t; -- [(1, 3), (2, 2), (3, 3)]

/* t2 */ COMMIT;

It appears that a phantom row (2, 2) showed up in the second consistent read of T2. And if you commit the second transaction, the phantom row will disappear. I'm not sure whether this is a new bug or a duplicate one. From the user's perspective, I haven't inserted a new row, updating existing rows should not result in phantom rows.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

tpcc1.pdf
191 kB
2024-03-23 16:31

Issue Links

blocks

MDEV-33802 Weird read view after ROLLBACK of other transactions.

Closed

is duplicated by

MDEV-34496 select wrong value after update statement when another session update the same row

Closed

relates to

MDEV-33762 Document innodb_snapshot_isolation

Closed

MDEV-35124 Set innodb_snapshot_isolation=ON by default

Closed

MDEV-14589 InnoDB should not lock a delete-marked record

Closed

MDEV-26642 Weird SELECT view when a record is modified to the same value by two transactions

Closed

MDEV-26643 Inconsistent behaviors of UPDATE under RU & RC isolation level

Closed

MDEV-33455 skip modified rows in parallel XA replication, don't wait

Open

MDEV-35140 Support innodb-snapshot-isolation in Galera cluster

Stalled

(4 relates to)

Activity

Ascending order - Click to sort in descending order

Zhuang Liu added a comment - 2023-11-28 07:08

This bug can also be reproduced in MySQL8.2.0 and has been verified by MySQL Verification Team. mysql# #113228

Zhuang Liu added a comment - 2023-11-28 07:08 This bug can also be reproduced in MySQL8.2.0 and has been verified by MySQL Verification Team. mysql# #113228

Marko Mäkelä added a comment - 2023-11-30 14:08

Thank you. Here is my mtr version of this:

--source include/have_innodb.inc

CREATE TABLE t(a INT PRIMARY KEY, b INT) ENGINE=InnoDB;

INSERT INTO t VALUES (1,1),(2,2);

--connect con1,localhost,root

BEGIN; SELECT * FROM t LOCK IN SHARE MODE;

--connection default

BEGIN;

SELECT * FROM t;

--connection con1

UPDATE t SET a=3 WHERE b=2;

COMMIT;

--disconnect con1

--connection default

UPDATE t SET b=3;

SELECT * FROM t;

COMMIT;

DROP TABLE t;

10.6 b3a628c7d4ec5b765768bd374d83beea59880522
CREATE TABLE t(a INT PRIMARY KEY, b INT) ENGINE=InnoDB;
INSERT INTO t VALUES (1,1),(2,2);
connect con1,localhost,root;
BEGIN;
SELECT * FROM t LOCK IN SHARE MODE;
a b
1 1
2 2
connection default;
BEGIN;
SELECT * FROM t;
a b
1 1
2 2
connection con1;
UPDATE t SET a=3 WHERE b=2;
COMMIT;
disconnect con1;
connection default;
UPDATE t SET b=3;
SELECT * FROM t;
a b
1 3
2 2
3 3
COMMIT;
DROP TABLE t;

I got the same result on the 10.4 branch as well.

I did not explicitly change the transaction isolation level, because REPEATABLE READ is the default. As far as I remember, some people who worked in the MySQL Falcon project back when Innobase Oy was an Oracle subsidiary but MySQL AB wasn’t, called this InnoDB anomaly the WRITE COMMITTED isolation level. That is, locking reads in InnoDB always were essentially READ COMMITTED. All we want is to successfully acquire a lock.

I had discussed a possible way of fixing this in ~~MDEV-14589~~. If I change the test to use the SERIALIZABLE isolation level in both connections, then the first UPDATE will be blocked.

I think that this bug has been reported earlier, and finally I found ~~MDEV-26642~~ and ~~MDEV-26643~~, which are very similar.

Marko Mäkelä added a comment - 2023-11-30 14:08 Thank you. Here is my mtr version of this: --source include/have_innodb.inc CREATE TABLE t(a INT PRIMARY KEY , b INT ) ENGINE=InnoDB; INSERT INTO t VALUES (1,1),(2,2); --connect con1,localhost,root BEGIN ; SELECT * FROM t LOCK IN SHARE MODE; --connection default BEGIN ; SELECT * FROM t; --connection con1 UPDATE t SET a=3 WHERE b=2; COMMIT ; --disconnect con1 --connection default UPDATE t SET b=3; SELECT * FROM t; COMMIT ; DROP TABLE t; 10.6 b3a628c7d4ec5b765768bd374d83beea59880522 CREATE TABLE t(a INT PRIMARY KEY, b INT) ENGINE=InnoDB; INSERT INTO t VALUES (1,1),(2,2); connect con1,localhost,root; BEGIN; SELECT * FROM t LOCK IN SHARE MODE; a b 1 1 2 2 connection default; BEGIN; SELECT * FROM t; a b 1 1 2 2 connection con1; UPDATE t SET a=3 WHERE b=2; COMMIT; disconnect con1; connection default; UPDATE t SET b=3; SELECT * FROM t; a b 1 3 2 2 3 3 COMMIT; DROP TABLE t; I got the same result on the 10.4 branch as well. I did not explicitly change the transaction isolation level, because REPEATABLE READ is the default. As far as I remember, some people who worked in the MySQL Falcon project back when Innobase Oy was an Oracle subsidiary but MySQL AB wasn’t, called this InnoDB anomaly the WRITE COMMITTED isolation level. That is, locking reads in InnoDB always were essentially READ COMMITTED . All we want is to successfully acquire a lock. I had discussed a possible way of fixing this in MDEV-14589 . If I change the test to use the SERIALIZABLE isolation level in both connections, then the first UPDATE will be blocked. I think that this bug has been reported earlier, and finally I found MDEV-26642 and MDEV-26643 , which are very similar.

Marko Mäkelä added a comment - 2023-11-30 14:09

Does this report look like a duplicate of ~~MDEV-26642~~ or ~~MDEV-26643~~ to you?

Marko Mäkelä added a comment - 2023-11-30 14:09 Does this report look like a duplicate of MDEV-26642 or MDEV-26643 to you?

Zhuang Liu added a comment - 2023-11-30 14:51

I agree that they are duplicated, they appear to be different phenomena caused by the same bug, but I am still puzzled by the occurrence of the phantom row. It seems that the first UPDATE statement actually deletes a row (2, 2) and than inserts a new row (3, 2). By the way, the statement 'SELECT * FROM t LOCK IN SHARE MODE' is unnecessary for reproducing the bug.

Zhuang Liu added a comment - 2023-11-30 14:51 I agree that they are duplicated, they appear to be different phenomena caused by the same bug, but I am still puzzled by the occurrence of the phantom row. It seems that the first UPDATE statement actually deletes a row (2, 2) and than inserts a new row (3, 2). By the way, the statement 'SELECT * FROM t LOCK IN SHARE MODE' is unnecessary for reproducing the bug.

Marko Mäkelä added a comment - 2024-01-16 09:35

I was able to fix my version with a simple patch:

10.6 d06b6de3050180ec2f96ef00963d1beab8e1b47a with patch
mysqltest: At line 14: query 'UPDATE t SET b=3' failed: ER_LOCK_DEADLOCK (1213): Deadlock found when trying to get lock; try restarting transaction

The patch will flag an error when an attempt is made to lock a PRIMARY KEY record that is not visible in the current read view. For now, I used an existing error code for this.

diff --git a/storage/innobase/include/row0row.h b/storage/innobase/include/row0row.h

index a1350740e2a..7056c77f2e6 100644

--- a/storage/innobase/include/row0row.h

+++ b/storage/innobase/include/row0row.h

@@ -370,6 +370,12 @@ row_search_index_entry(

 	mtr_t*		mtr)	/*!< in: mtr */

 	MY_ATTRIBUTE((nonnull, warn_unused_result));

+/** Get the byte offset of the DB_TRX_ID column

+@param[in]	rec	clustered index record

+@param[in]	index	clustered index

+@return	the byte offset of DB_TRX_ID, from the start of rec */

+ulint row_trx_id_offset(const rec_t* rec, const dict_index_t* index);

 #define ROW_COPY_DATA		1

 #define ROW_COPY_POINTERS	2

diff --git a/storage/innobase/lock/lock0lock.cc b/storage/innobase/lock/lock0lock.cc

index c9072998e66..3cc4954eafc 100644

--- a/storage/innobase/lock/lock0lock.cc

+++ b/storage/innobase/lock/lock0lock.cc

@@ -5924,6 +5924,12 @@ lock_clust_rec_read_check_and_lock(

 		return DB_SUCCESS;

+	if (heap_no > PAGE_HEAP_NO_SUPREMUM && trx->read_view.is_open()

+	    && !trx->read_view.changes_visible(

+		trx_read_trx_id(rec + row_trx_id_offset(rec, index)))) {

+		return DB_DEADLOCK;

+	}

 	dberr_t err = lock_rec_lock(false, gap_mode | mode,

 				    block, heap_no, index, thr);

diff --git a/storage/innobase/row/row0umod.cc b/storage/innobase/row/row0umod.cc

index a01eaea516a..52f54443911 100644

--- a/storage/innobase/row/row0umod.cc

+++ b/storage/innobase/row/row0umod.cc

@@ -190,7 +190,7 @@ row_undo_mod_clust_low(

 @param[in]	rec	clustered index record

 @param[in]	index	clustered index

 @return	the byte offset of DB_TRX_ID, from the start of rec */

-static ulint row_trx_id_offset(const rec_t* rec, const dict_index_t* index)

+ulint row_trx_id_offset(const rec_t* rec, const dict_index_t* index)

 	ut_ad(index->n_uniq <= MAX_REF_PARTS);

 	ulint trx_id_offset = index->trx_id_offset;

This patch will cause 6 existing regression tests to fail. I will have to review each failure carefully to see if the patch could be improved, or if the tests need to be adjusted.
gcol.innodb_virtual_debug_purge
innodb.innodb-isolation
innodb.innodb_bug14007649
innodb.innodb_mysql
innodb.innodb_timeout_rollback
main.deadlock_innodb

Marko Mäkelä added a comment - 2024-01-16 09:35 I was able to fix my version with a simple patch: 10.6 d06b6de3050180ec2f96ef00963d1beab8e1b47a with patch mysqltest: At line 14: query 'UPDATE t SET b=3' failed: ER_LOCK_DEADLOCK (1213): Deadlock found when trying to get lock; try restarting transaction The patch will flag an error when an attempt is made to lock a PRIMARY KEY record that is not visible in the current read view. For now, I used an existing error code for this. diff --git a/storage/innobase/include/row0row.h b/storage/innobase/include/row0row.h index a1350740e2a..7056c77f2e6 100644 --- a/storage/innobase/include/row0row.h +++ b/storage/innobase/include/row0row.h @@ -370,6 +370,12 @@ row_search_index_entry( mtr_t* mtr) /*!< in: mtr */ MY_ATTRIBUTE((nonnull, warn_unused_result)); +/** Get the byte offset of the DB_TRX_ID column +@param[in] rec clustered index record +@param[in] index clustered index +@return the byte offset of DB_TRX_ID, from the start of rec */ +ulint row_trx_id_offset(const rec_t* rec, const dict_index_t* index); + #define ROW_COPY_DATA 1 #define ROW_COPY_POINTERS 2 diff --git a/storage/innobase/lock/lock0lock.cc b/storage/innobase/lock/lock0lock.cc index c9072998e66..3cc4954eafc 100644 --- a/storage/innobase/lock/lock0lock.cc +++ b/storage/innobase/lock/lock0lock.cc @@ -5924,6 +5924,12 @@ lock_clust_rec_read_check_and_lock( return DB_SUCCESS; } + if (heap_no > PAGE_HEAP_NO_SUPREMUM && trx->read_view.is_open() + && !trx->read_view.changes_visible( + trx_read_trx_id(rec + row_trx_id_offset(rec, index)))) { + return DB_DEADLOCK; + } + dberr_t err = lock_rec_lock(false, gap_mode | mode, block, heap_no, index, thr); diff --git a/storage/innobase/row/row0umod.cc b/storage/innobase/row/row0umod.cc index a01eaea516a..52f54443911 100644 --- a/storage/innobase/row/row0umod.cc +++ b/storage/innobase/row/row0umod.cc @@ -190,7 +190,7 @@ row_undo_mod_clust_low( @param[in] rec clustered index record @param[in] index clustered index @return the byte offset of DB_TRX_ID, from the start of rec */ -static ulint row_trx_id_offset(const rec_t* rec, const dict_index_t* index) +ulint row_trx_id_offset(const rec_t* rec, const dict_index_t* index) { ut_ad(index->n_uniq <= MAX_REF_PARTS); ulint trx_id_offset = index->trx_id_offset; This patch will cause 6 existing regression tests to fail. I will have to review each failure carefully to see if the patch could be improved, or if the tests need to be adjusted. gcol.innodb_virtual_debug_purge innodb.innodb-isolation innodb.innodb_bug14007649 innodb.innodb_mysql innodb.innodb_timeout_rollback main.deadlock_innodb

Marko Mäkelä added a comment - 2024-01-16 12:37

I checked the test failures:

CURRENT_TEST: innodb.innodb_bug14007649

mysqltest: At line 42: query 'update t1 set f2 = 6 where f1 = 1 and f2 is null' failed: ER_LOCK_DEADLOCK (1213): Deadlock found when trying to get lock; try restarting transaction

The record had been inserted by connection b at line 29, insert into t1 values (3, 1, null). The transaction was started and committed after between start transaction with consistent snapshot and the failure in connection a. This test needs to be adjusted, or maybe removed.

CURRENT_TEST: innodb.innodb_timeout_rollback

mysqltest: In included file "./include/innodb_rollback_on_timeout.inc":

included from /mariadb/10.6/mysql-test/suite/innodb/t/innodb_timeout_rollback.test at line 3:

At line 29: query 'insert into t1 values (2)' failed with wrong errno ER_LOCK_DEADLOCK (1213): 'Deadlock found when trying to get lock; try restarting transaction', instead of ER_LOCK_WAIT_TIMEOUT (1205)...

The INSERT in con2 was not committed yet, so it will not be in the read view of con1. We could avoid this test failure if the patch didn’t validate the visibility before waiting for the lock (which would time out). Better, we could simply remove the read view creation from con1 to make the test pass:

diff --git a/mysql-test/include/innodb_rollback_on_timeout.inc b/mysql-test/include/innodb_rollback_on_timeout.inc

index 274bbe12566..883b0820589 100644

--- a/mysql-test/include/innodb_rollback_on_timeout.inc

+++ b/mysql-test/include/innodb_rollback_on_timeout.inc

@@ -22,7 +22,6 @@ select * from t1;

 connection con1;

 begin work;

 insert into t1 values (5);

-select * from t1;

 # Lock wait timeout set to 2 seconds in <THIS TEST>-master.opt; this

 # statement will time out; in 5.0.13+, it will not roll back transaction.

 --error ER_LOCK_WAIT_TIMEOUT

CURRENT_TEST: gcol.innodb_virtual_debug_purge

mysqltest: At line 151: query 'DELETE FROM t1 WHERE a = 3' failed: ER_LOCK_DEADLOCK (1213): Deadlock found when trying to get lock; try restarting transaction

The record had been modified by DELETE FROM t1 WHERE a = 1 in connection default in line 139. The fix is simple: we do not actually need a read view in this transaction:

diff --git a/mysql-test/suite/gcol/t/innodb_virtual_debug_purge.test b/mysql-test/suite/gcol/t/innodb_virtual_debug_purge.test

index 09fba0285c7..7966953535c 100644

--- a/mysql-test/suite/gcol/t/innodb_virtual_debug_purge.test

+++ b/mysql-test/suite/gcol/t/innodb_virtual_debug_purge.test

@@ -131,9 +131,8 @@ CREATE TABLE t1 (a INT, b INT, c INT GENERATED ALWAYS AS(a+b));

 INSERT INTO t1(a, b) VALUES (1, 1), (2, 2), (3, 3), (4, 4);

-connection con1;

---echo # disable purge

-BEGIN; SELECT * FROM t0;

+connect (stop_purge,localhost,root,,);

+START TRANSACTION WITH CONSISTENT SNAPSHOT;

 connection default;

 DELETE FROM t1 WHERE a = 1;

@@ -148,13 +147,14 @@ send ALTER TABLE t1 ADD INDEX idx (c), ALGORITHM=INPLACE, LOCK=NONE;

 connection con1;

 SET DEBUG_SYNC= 'now WAIT_FOR uncommitted';

+BEGIN;

 DELETE FROM t1 WHERE a = 3;

 UPDATE t1 SET a = 7, b = 7 WHERE a = 4;

 INSERT INTO t1(a, b) VALUES (8, 8);

---echo # enable purge

+disconnect stop_purge;

 COMMIT;

 --echo # wait for purge to process the deleted/updated records.

CURRENT_TEST: innodb.innodb-isolation

mysqltest: At line 156: query 'SELECT c1, c2 FROM t1 WHERE c1 < ((SELECT COUNT(*) FROM t1) / 2) FOR UPDATE' failed with wrong errno ER_LOCK_DEADLOCK (1213): 'Deadlock found when trying to get lock; try restarting transaction', instead of ER_LOCK_WAIT_TIMEOUT (1205)...

The record had been modified by UPDATE t1 SET c2 = c2 * 3 WHERE c1 = 1 in line 51 in the default connection. This transaction is still open. This test would need to be shrunk or removed to adjust for the code change.

CURRENT_TEST: main.deadlock_innodb

mysqltest: In included file "./include/deadlock.inc":

included from /mariadb/10.6/mysql-test/main/deadlock_innodb.test at line 17:

At line 117: query 'reap' failed: ER_LOCK_DEADLOCK (1213): Deadlock found when trying to get lock; try restarting transaction

The record had been modified by update t1 set x=1 where id = 0 in line 112. The fix is to avoid creating a read view:

diff --git a/mysql-test/include/deadlock.inc b/mysql-test/include/deadlock.inc

index abf217aea75..362d456e3f2 100644

--- a/mysql-test/include/deadlock.inc

+++ b/mysql-test/include/deadlock.inc

@@ -103,7 +103,6 @@ connection con2;

 # The following query should hang because con1 is locking the record

 update t2 set a=2 where b = 0;

-select * from t2;

 --send

 update t1 set x=2 where id = 0;

 --sleep 2

CURRENT_TEST: innodb.innodb_mysql

mysqltest: In included file "./include/innodb_rollback_on_timeout.inc":

included from ./include/mix1.inc at line 482:

included from /mariadb/10.6/mysql-test/suite/innodb/t/innodb_mysql.test at line 19:

At line 29: query 'insert into t1 values (2)' failed with wrong errno ER_LOCK_DEADLOCK (1213): 'Deadlock found when trying to get lock; try restarting transaction', instead of ER_LOCK_WAIT_TIMEOUT (1205)...

This is fixed by removing a read view creation (the same fix as innodb.innodb_timeout_rollback).

I pushed this for some initial testing. If everything goes as expected, pre-built packages of this change should be available in https://ci.mariadb.org/42468/ within an hour or two.

Marko Mäkelä added a comment - 2024-01-16 12:37 I checked the test failures: CURRENT_TEST: innodb.innodb_bug14007649 mysqltest: At line 42: query 'update t1 set f2 = 6 where f1 = 1 and f2 is null' failed: ER_LOCK_DEADLOCK (1213): Deadlock found when trying to get lock; try restarting transaction The record had been inserted by connection b at line 29, insert into t1 values (3, 1, null) . The transaction was started and committed after between start transaction with consistent snapshot and the failure in connection a . This test needs to be adjusted, or maybe removed. CURRENT_TEST: innodb.innodb_timeout_rollback mysqltest: In included file "./include/innodb_rollback_on_timeout.inc": included from /mariadb/10.6/mysql-test/suite/innodb/t/innodb_timeout_rollback.test at line 3: At line 29: query 'insert into t1 values (2)' failed with wrong errno ER_LOCK_DEADLOCK (1213): 'Deadlock found when trying to get lock; try restarting transaction', instead of ER_LOCK_WAIT_TIMEOUT (1205)... The INSERT in con2 was not committed yet, so it will not be in the read view of con1 . We could avoid this test failure if the patch didn’t validate the visibility before waiting for the lock (which would time out). Better, we could simply remove the read view creation from con1 to make the test pass: diff --git a/mysql-test/include/innodb_rollback_on_timeout.inc b/mysql-test/include/innodb_rollback_on_timeout.inc index 274bbe12566..883b0820589 100644 --- a/mysql-test/include/innodb_rollback_on_timeout.inc +++ b/mysql-test/include/innodb_rollback_on_timeout.inc @@ -22,7 +22,6 @@ select * from t1; connection con1; begin work; insert into t1 values (5); -select * from t1; # Lock wait timeout set to 2 seconds in <THIS TEST>-master.opt; this # statement will time out; in 5.0.13+, it will not roll back transaction. --error ER_LOCK_WAIT_TIMEOUT CURRENT_TEST: gcol.innodb_virtual_debug_purge mysqltest: At line 151: query 'DELETE FROM t1 WHERE a = 3' failed: ER_LOCK_DEADLOCK (1213): Deadlock found when trying to get lock; try restarting transaction The record had been modified by DELETE FROM t1 WHERE a = 1 in connection default in line 139. The fix is simple: we do not actually need a read view in this transaction: diff --git a/mysql-test/suite/gcol/t/innodb_virtual_debug_purge.test b/mysql-test/suite/gcol/t/innodb_virtual_debug_purge.test index 09fba0285c7..7966953535c 100644 --- a/mysql-test/suite/gcol/t/innodb_virtual_debug_purge.test +++ b/mysql-test/suite/gcol/t/innodb_virtual_debug_purge.test @@ -131,9 +131,8 @@ CREATE TABLE t1 (a INT, b INT, c INT GENERATED ALWAYS AS(a+b)); INSERT INTO t1(a, b) VALUES (1, 1), (2, 2), (3, 3), (4, 4); -connection con1; ---echo # disable purge -BEGIN; SELECT * FROM t0; +connect (stop_purge,localhost,root,,); +START TRANSACTION WITH CONSISTENT SNAPSHOT; connection default; DELETE FROM t1 WHERE a = 1; @@ -148,13 +147,14 @@ send ALTER TABLE t1 ADD INDEX idx (c), ALGORITHM=INPLACE, LOCK=NONE; connection con1; SET DEBUG_SYNC= 'now WAIT_FOR uncommitted'; +BEGIN; DELETE FROM t1 WHERE a = 3; UPDATE t1 SET a = 7, b = 7 WHERE a = 4; INSERT INTO t1(a, b) VALUES (8, 8); ---echo # enable purge +disconnect stop_purge; COMMIT; --echo # wait for purge to process the deleted/updated records. CURRENT_TEST: innodb.innodb-isolation mysqltest: At line 156: query 'SELECT c1, c2 FROM t1 WHERE c1 < ((SELECT COUNT(*) FROM t1) / 2) FOR UPDATE' failed with wrong errno ER_LOCK_DEADLOCK (1213): 'Deadlock found when trying to get lock; try restarting transaction', instead of ER_LOCK_WAIT_TIMEOUT (1205)... The record had been modified by UPDATE t1 SET c2 = c2 * 3 WHERE c1 = 1 in line 51 in the default connection. This transaction is still open. This test would need to be shrunk or removed to adjust for the code change. CURRENT_TEST: main.deadlock_innodb mysqltest: In included file "./include/deadlock.inc": included from /mariadb/10.6/mysql-test/main/deadlock_innodb.test at line 17: At line 117: query 'reap' failed: ER_LOCK_DEADLOCK (1213): Deadlock found when trying to get lock; try restarting transaction The record had been modified by update t1 set x=1 where id = 0 in line 112. The fix is to avoid creating a read view: diff --git a/mysql-test/include/deadlock.inc b/mysql-test/include/deadlock.inc index abf217aea75..362d456e3f2 100644 --- a/mysql-test/include/deadlock.inc +++ b/mysql-test/include/deadlock.inc @@ -103,7 +103,6 @@ connection con2; # The following query should hang because con1 is locking the record update t2 set a=2 where b = 0; -select * from t2; --send update t1 set x=2 where id = 0; --sleep 2 CURRENT_TEST: innodb.innodb_mysql mysqltest: In included file "./include/innodb_rollback_on_timeout.inc": included from ./include/mix1.inc at line 482: included from /mariadb/10.6/mysql-test/suite/innodb/t/innodb_mysql.test at line 19: At line 29: query 'insert into t1 values (2)' failed with wrong errno ER_LOCK_DEADLOCK (1213): 'Deadlock found when trying to get lock; try restarting transaction', instead of ER_LOCK_WAIT_TIMEOUT (1205)... This is fixed by removing a read view creation (the same fix as innodb.innodb_timeout_rollback ). I pushed this for some initial testing . If everything goes as expected, pre-built packages of this change should be available in https://ci.mariadb.org/42468/ within an hour or two.

Marko Mäkelä added a comment - 2024-01-16 13:10

My code change seems to also fix ~~MDEV-26642~~ but not ~~MDEV-26643~~.

Marko Mäkelä added a comment - 2024-01-16 13:10 My code change seems to also fix MDEV-26642 but not MDEV-26643 .

Marko Mäkelä added a comment - 2024-01-16 13:37

Come to think of it, I think that reporting the somewhat surprising ER_LOCK_DEADLOCK might be acceptable. To avoid any breakage of existing applications, we might want to introduce a Boolean global variable innodb_strict_isolation (default OFF).

Marko Mäkelä added a comment - 2024-01-16 13:37 Come to think of it, I think that reporting the somewhat surprising ER_LOCK_DEADLOCK might be acceptable. To avoid any breakage of existing applications, we might want to introduce a Boolean global variable innodb_strict_isolation (default OFF ).

Marko Mäkelä added a comment - 2024-01-30 16:23

I filed ~~MDEV-33332~~ for the SIGSEGV that mleich reported inside buf_read_ahead_linear(). It is unrelated to this change.

Marko Mäkelä added a comment - 2024-01-30 16:23 I filed MDEV-33332 for the SIGSEGV that mleich reported inside buf_read_ahead_linear() . It is unrelated to this change.

Vladislav Lesin added a comment - 2024-02-01 16:03

The idea of the fix looks correct to me. The summary of what we already discussed in slack and what is also described in commit message:

1. It would be good to have new error code and error message to avoid users confusing. We should also take care about replication events replaying on slaves when the error happens(see slave_transaction_retry_errors[] array).

2. We need to introduce additional option, which would switch on the functionality, and add combination with and without the option for the failed mtr test cases.

Vladislav Lesin added a comment - 2024-02-01 16:03 The idea of the fix looks correct to me. The summary of what we already discussed in slack and what is also described in commit message: 1. It would be good to have new error code and error message to avoid users confusing. We should also take care about replication events replaying on slaves when the error happens(see slave_transaction_retry_errors[] array). 2. We need to introduce additional option, which would switch on the functionality, and add combination with and without the option for the failed mtr test cases.

Marko Mäkelä added a comment - 2024-02-15 12:20

I found that there actually exists an error code that is applicable to exactly this scenario: HA_ERR_RECORD_CHANGED and ER_CHECKREAD. It had been previously implemented in ENGINE=Memory, ENGINE=MyISAM, ENGINE=Aria. None of these storage engines support transaction rollback; ENGINE=Aria does it on crash recovery only.

The best parameter name that I could come up for enabling these and possible similar future checks is innodb_strict_isolation.

Marko Mäkelä added a comment - 2024-02-15 12:20 I found that there actually exists an error code that is applicable to exactly this scenario: HA_ERR_RECORD_CHANGED and ER_CHECKREAD . It had been previously implemented in ENGINE=Memory , ENGINE=MyISAM , ENGINE=Aria . None of these storage engines support transaction rollback; ENGINE=Aria does it on crash recovery only. The best parameter name that I could come up for enabling these and possible similar future checks is innodb_strict_isolation .

Vladislav Lesin added a comment - 2024-03-20 06:21

https://github.com/MariaDB/server/pull/3067#pullrequestreview-1947147879

Vladislav Lesin added a comment - 2024-03-20 06:21 https://github.com/MariaDB/server/pull/3067#pullrequestreview-1947147879

Marko Mäkelä added a comment - 2024-03-21 14:09

serg has requested the performance for both values of innodb_snapshot_isolation to be tested, compared to the baseline. I do not expect much difference; in fact, I tested innodb_snapshot_isolation=OFF back in January and did not notice any change of throughput. For the case innodb_snapshot_isolation=ON we would be interested in the ‘optimistic’ case where transactions are not being rolled back due to the new error ER_CHECKREAD.

Marko Mäkelä added a comment - 2024-03-21 14:09 serg has requested the performance for both values of innodb_snapshot_isolation to be tested, compared to the baseline. I do not expect much difference; in fact, I tested innodb_snapshot_isolation=OFF back in January and did not notice any change of throughput. For the case innodb_snapshot_isolation=ON we would be interested in the ‘optimistic’ case where transactions are not being rolled back due to the new error ER_CHECKREAD .

Axel Schwenke added a comment - 2024-03-22 18:03

The results are somewhat surprising. The baseline and innodb_snapshot_isolation=on show about the same performance where innodb_snapshot_isolation=off falls back a bit. The averages are ~5750 tps and ~5600 tps.

I double checked the configuration and plotted even the counters for Handler_commit and Handler_rollback. But the results are real. As expected we get more rollbacks for innodb_snapshot_isolation=on.

Attached: tpcc1.pdf

Axel Schwenke added a comment - 2024-03-22 18:03 The results are somewhat surprising. The baseline and innodb_snapshot_isolation=on show about the same performance where innodb_snapshot_isolation=off falls back a bit. The averages are ~5750 tps and ~5600 tps. I double checked the configuration and plotted even the counters for Handler_commit and Handler_rollback . But the results are real. As expected we get more rollbacks for innodb_snapshot_isolation=on . Attached: tpcc1.pdf

Sergei Golubchik added a comment - 2024-03-23 13:52

So, 3x more rollbacks (and Monty said it'll be ~3x), as expected.

axel, how was your tps calculated? Are rollbacks count towards tps? Are rolled back transactions re-applied?

Sergei Golubchik added a comment - 2024-03-23 13:52 So, 3x more rollbacks (and Monty said it'll be ~3x), as expected. axel , how was your tps calculated? Are rollbacks count towards tps? Are rolled back transactions re-applied?

Axel Schwenke added a comment - 2024-03-23 17:09

how was your tps calculated? Are rollbacks count towards tps?

The TPS numbers come from sysbench. It's simply the number of executions of the event() function in the LUA script. Which for TPC-C calls one of 5 functions for one of 5 transactions. Of those only the NEW ORDER transaction does an explicit ROLLBACK. And AFAICS that rollback is not connected to an SQL error, but to an empty result.

I have now added plots of Com_commit and Com_rollback counters (in tpcc1.pdf) . If I compare the TPS plot with Com_commit the latter is a bit lower. That would mean a rollback is counted towards TPS. But this changes nothing of the throughput comparison of the 3 commits/configurations.

The numbers for Handler_rollback are higher than those for Com_rollback - but only for the case innodb_snapshot_isolation=on. Those rollbacks must be implicit then and are ignored by sysbench. However sysbench reports those errors independently.

For innodb_snapshot_isolation=off:

SQL statistics:

    queries performed:

        read:                            262257215

        write:                           272207592

        other:                           40437558

        total:                           574902365

    transactions:                        20218683 (5616.26 per sec.)

    queries:                             574902365 (159693.89 per sec.)

    ignored errors:                      87982  (24.44 per sec.)

    reconnects:                          0      (0.00 per sec.)

And innodb_snapshot_isolation=on:

SQL statistics:

    queries performed:

        read:                            268903843

        write:                           279039940

        other:                           41682778

        total:                           589626561

    transactions:                        20730792 (5758.51 per sec.)

    queries:                             589626561 (163783.92 per sec.)

    ignored errors:                      200264 (55.63 per sec.)

    reconnects:                          0      (0.00 per sec.)

Are rolled back transactions re-applied?

I can see that nowhere in the LUA code.

Axel Schwenke added a comment - 2024-03-23 17:09 how was your tps calculated? Are rollbacks count towards tps? The TPS numbers come from sysbench. It's simply the number of executions of the event() function in the LUA script. Which for TPC-C calls one of 5 functions for one of 5 transactions. Of those only the NEW ORDER transaction does an explicit ROLLBACK. And AFAICS that rollback is not connected to an SQL error, but to an empty result. I have now added plots of Com_commit and Com_rollback counters (in tpcc1.pdf ) . If I compare the TPS plot with Com_commit the latter is a bit lower. That would mean a rollback is counted towards TPS. But this changes nothing of the throughput comparison of the 3 commits/configurations. The numbers for Handler_rollback are higher than those for Com_rollback - but only for the case innodb_snapshot_isolation=on . Those rollbacks must be implicit then and are ignored by sysbench. However sysbench reports those errors independently. For innodb_snapshot_isolation=off : SQL statistics: queries performed: read: 262257215 write: 272207592 other: 40437558 total: 574902365 transactions: 20218683 (5616.26 per sec.) queries: 574902365 (159693.89 per sec.) ignored errors: 87982 (24.44 per sec.) reconnects: 0 (0.00 per sec.) And innodb_snapshot_isolation=on : SQL statistics: queries performed: read: 268903843 write: 279039940 other: 41682778 total: 589626561 transactions: 20730792 (5758.51 per sec.) queries: 589626561 (163783.92 per sec.) ignored errors: 200264 (55.63 per sec.) reconnects: 0 (0.00 per sec.) Are rolled back transactions re-applied? I can see that nowhere in the LUA code.

Sergei Golubchik added a comment - 2024-03-24 13:43

I cannot explain why the number of commits stayed the same even though the number of rollbacks increased 3x

Sergei Golubchik added a comment - 2024-03-24 13:43 I cannot explain why the number of commits stayed the same even though the number of rollbacks increased 3x

Marko Mäkelä added a comment - 2024-03-25 08:28

For what it is worth, in InnoDB a rollback is like a commit, with the added step of undoing all changes. It seems to me that Com_commit is defined in com_status_vars as the following entry:

  {"commit",               STMT_STATUS(SQLCOM_COMMIT)},

This is something that would be updated outside InnoDB. Maybe the Lua scripts keep executing COMMIT statements after ignoring ER_CHECKREAD errors, or maybe autocommit transactions that are in fact rolled back due to an error will be counted as committed.

Marko Mäkelä added a comment - 2024-03-25 08:28 For what it is worth, in InnoDB a rollback is like a commit, with the added step of undoing all changes. It seems to me that Com_commit is defined in com_status_vars as the following entry: { "commit" , STMT_STATUS(SQLCOM_COMMIT)}, This is something that would be updated outside InnoDB. Maybe the Lua scripts keep executing COMMIT statements after ignoring ER_CHECKREAD errors, or maybe autocommit transactions that are in fact rolled back due to an error will be counted as committed.

People

Assignee:: Axel Schwenke

Reporter:: Zhuang Liu

Votes:: 1 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 2023-11-28 07:06

Updated:: 2025-01-28 16:01

Resolved:: 2024-03-22 18:04

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration