[MDEV-32898] Phantom rows caused by UPDATE of PRIMARY KEY Created: 2023-11-28  Updated: 2024-02-01

Status: Stalled
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.4, 10.5, 10.6, 10.11, 11.0, 11.1, 11.2, 11.2.2
Fix Version/s: 10.5, 10.6, 10.11, 11.0, 11.1, 11.2, 11.3, 11.4

Type: Bug Priority: Critical
Reporter: Zhuang Liu Assignee: Marko Mäkelä
Resolution: Unresolved Votes: 1
Labels: innodb
Environment:

Ubuntu 22.04


Issue Links:
Relates
relates to MDEV-14589 InnoDB should not lock a delete-marke... Closed
relates to MDEV-26642 Weird SELECT view when a record is mo... Confirmed
relates to MDEV-26643 Inconsistent behaviors of UPDATE unde... Open

 Description   

Under REPEATABLE-READ isolation level,an UPDATE statement which update the value of primary key caused phantom rows in another transaction.
How to repeat:

/* init */ CREATE TABLE t(a INT PRIMARY KEY, b INT);
/* init */ INSERT INTO t VALUES (1, 1);
/* init */ INSERT INTO t VALUES (2, 2);
/* t1 */ SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;
/* t2 */ SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;
 
/* t1 */ BEGIN;
/* t1 */ SELECT * FROM t LOCK IN SHARE MODE;
/* t2 */ BEGIN;
/* t2 */ SELECT * FROM t;  -- [(1, 1), (2, 2)]
/* t1 */ UPDATE t SET a=3 WHERE b = 2;
/* t1 */ COMMIT;
/* t2 */ UPDATE t SET b=3;
/* t2 */ SELECT * FROM t; -- [(1, 3), (2, 2), (3, 3)] 
/* t2 */ COMMIT;

It appears that a phantom row (2, 2) showed up in the second consistent read of T2. And if you commit the second transaction, the phantom row will disappear. I'm not sure whether this is a new bug or a duplicate one. From the user's perspective, I haven't inserted a new row, updating existing rows should not result in phantom rows.



 Comments   
Comment by Zhuang Liu [ 2023-11-28 ]

This bug can also be reproduced in MySQL8.2.0 and has been verified by MySQL Verification Team. mysql# #113228

Comment by Marko Mäkelä [ 2023-11-30 ]

Thank you. Here is my mtr version of this:

--source include/have_innodb.inc
CREATE TABLE t(a INT PRIMARY KEY, b INT) ENGINE=InnoDB;
INSERT INTO t VALUES (1,1),(2,2);
--connect con1,localhost,root
BEGIN; SELECT * FROM t LOCK IN SHARE MODE;
--connection default
BEGIN;
SELECT * FROM t;
--connection con1
UPDATE t SET a=3 WHERE b=2;
COMMIT;
--disconnect con1
--connection default
UPDATE t SET b=3;
SELECT * FROM t;
COMMIT;
DROP TABLE t;

10.6 b3a628c7d4ec5b765768bd374d83beea59880522

CREATE TABLE t(a INT PRIMARY KEY, b INT) ENGINE=InnoDB;
INSERT INTO t VALUES (1,1),(2,2);
connect con1,localhost,root;
BEGIN;
SELECT * FROM t LOCK IN SHARE MODE;
a	b
1	1
2	2
connection default;
BEGIN;
SELECT * FROM t;
a	b
1	1
2	2
connection con1;
UPDATE t SET a=3 WHERE b=2;
COMMIT;
disconnect con1;
connection default;
UPDATE t SET b=3;
SELECT * FROM t;
a	b
1	3
2	2
3	3
COMMIT;
DROP TABLE t;

I got the same result on the 10.4 branch as well.

I did not explicitly change the transaction isolation level, because REPEATABLE READ is the default. As far as I remember, some people who worked in the MySQL Falcon project back when Innobase Oy was an Oracle subsidiary but MySQL AB wasn’t, called this InnoDB anomaly the WRITE COMMITTED isolation level. That is, locking reads in InnoDB always were essentially READ COMMITTED. All we want is to successfully acquire a lock.

I had discussed a possible way of fixing this in MDEV-14589. If I change the test to use the SERIALIZABLE isolation level in both connections, then the first UPDATE will be blocked.

I think that this bug has been reported earlier, and finally I found MDEV-26642 and MDEV-26643, which are very similar.

Comment by Marko Mäkelä [ 2023-11-30 ]

Does this report look like a duplicate of MDEV-26642 or MDEV-26643 to you?

Comment by Zhuang Liu [ 2023-11-30 ]

I agree that they are duplicated, they appear to be different phenomena caused by the same bug, but I am still puzzled by the occurrence of the phantom row. It seems that the first UPDATE statement actually deletes a row (2, 2) and than inserts a new row (3, 2). By the way, the statement 'SELECT * FROM t LOCK IN SHARE MODE' is unnecessary for reproducing the bug.

Comment by Marko Mäkelä [ 2024-01-16 ]

I was able to fix my version with a simple patch:

10.6 d06b6de3050180ec2f96ef00963d1beab8e1b47a with patch

mysqltest: At line 14: query 'UPDATE t SET b=3' failed: ER_LOCK_DEADLOCK (1213): Deadlock found when trying to get lock; try restarting transaction

The patch will flag an error when an attempt is made to lock a PRIMARY KEY record that is not visible in the current read view. For now, I used an existing error code for this.

diff --git a/storage/innobase/include/row0row.h b/storage/innobase/include/row0row.h
index a1350740e2a..7056c77f2e6 100644
--- a/storage/innobase/include/row0row.h
+++ b/storage/innobase/include/row0row.h
@@ -370,6 +370,12 @@ row_search_index_entry(
 	mtr_t*		mtr)	/*!< in: mtr */
 	MY_ATTRIBUTE((nonnull, warn_unused_result));
 
+/** Get the byte offset of the DB_TRX_ID column
+@param[in]	rec	clustered index record
+@param[in]	index	clustered index
+@return	the byte offset of DB_TRX_ID, from the start of rec */
+ulint row_trx_id_offset(const rec_t* rec, const dict_index_t* index);
+
 #define ROW_COPY_DATA		1
 #define ROW_COPY_POINTERS	2
 
diff --git a/storage/innobase/lock/lock0lock.cc b/storage/innobase/lock/lock0lock.cc
index c9072998e66..3cc4954eafc 100644
--- a/storage/innobase/lock/lock0lock.cc
+++ b/storage/innobase/lock/lock0lock.cc
@@ -5924,6 +5924,12 @@ lock_clust_rec_read_check_and_lock(
 		return DB_SUCCESS;
 	}
 
+	if (heap_no > PAGE_HEAP_NO_SUPREMUM && trx->read_view.is_open()
+	    && !trx->read_view.changes_visible(
+		trx_read_trx_id(rec + row_trx_id_offset(rec, index)))) {
+		return DB_DEADLOCK;
+	}
+
 	dberr_t err = lock_rec_lock(false, gap_mode | mode,
 				    block, heap_no, index, thr);
 
diff --git a/storage/innobase/row/row0umod.cc b/storage/innobase/row/row0umod.cc
index a01eaea516a..52f54443911 100644
--- a/storage/innobase/row/row0umod.cc
+++ b/storage/innobase/row/row0umod.cc
@@ -190,7 +190,7 @@ row_undo_mod_clust_low(
 @param[in]	rec	clustered index record
 @param[in]	index	clustered index
 @return	the byte offset of DB_TRX_ID, from the start of rec */
-static ulint row_trx_id_offset(const rec_t* rec, const dict_index_t* index)
+ulint row_trx_id_offset(const rec_t* rec, const dict_index_t* index)
 {
 	ut_ad(index->n_uniq <= MAX_REF_PARTS);
 	ulint trx_id_offset = index->trx_id_offset;

This patch will cause 6 existing regression tests to fail. I will have to review each failure carefully to see if the patch could be improved, or if the tests need to be adjusted.
gcol.innodb_virtual_debug_purge
innodb.innodb-isolation
innodb.innodb_bug14007649
innodb.innodb_mysql
innodb.innodb_timeout_rollback
main.deadlock_innodb

Comment by Marko Mäkelä [ 2024-01-16 ]

I checked the test failures:

CURRENT_TEST: innodb.innodb_bug14007649
mysqltest: At line 42: query 'update t1 set f2 = 6 where f1 = 1 and f2 is null' failed: ER_LOCK_DEADLOCK (1213): Deadlock found when trying to get lock; try restarting transaction

The record had been inserted by connection b at line 29, insert into t1 values (3, 1, null). The transaction was started and committed after between start transaction with consistent snapshot and the failure in connection a. This test needs to be adjusted, or maybe removed.

CURRENT_TEST: innodb.innodb_timeout_rollback
mysqltest: In included file "./include/innodb_rollback_on_timeout.inc": 
included from /mariadb/10.6/mysql-test/suite/innodb/t/innodb_timeout_rollback.test at line 3:
At line 29: query 'insert into t1 values (2)' failed with wrong errno ER_LOCK_DEADLOCK (1213): 'Deadlock found when trying to get lock; try restarting transaction', instead of ER_LOCK_WAIT_TIMEOUT (1205)...

The INSERT in con2 was not committed yet, so it will not be in the read view of con1. We could avoid this test failure if the patch didn’t validate the visibility before waiting for the lock (which would time out). Better, we could simply remove the read view creation from con1 to make the test pass:

diff --git a/mysql-test/include/innodb_rollback_on_timeout.inc b/mysql-test/include/innodb_rollback_on_timeout.inc
index 274bbe12566..883b0820589 100644
--- a/mysql-test/include/innodb_rollback_on_timeout.inc
+++ b/mysql-test/include/innodb_rollback_on_timeout.inc
@@ -22,7 +22,6 @@ select * from t1;
 connection con1;
 begin work;
 insert into t1 values (5);
-select * from t1;
 # Lock wait timeout set to 2 seconds in <THIS TEST>-master.opt; this
 # statement will time out; in 5.0.13+, it will not roll back transaction.
 --error ER_LOCK_WAIT_TIMEOUT

CURRENT_TEST: gcol.innodb_virtual_debug_purge
mysqltest: At line 151: query 'DELETE FROM t1 WHERE a = 3' failed: ER_LOCK_DEADLOCK (1213): Deadlock found when trying to get lock; try restarting transaction

The record had been modified by DELETE FROM t1 WHERE a = 1 in connection default in line 139. The fix is simple: we do not actually need a read view in this transaction:

diff --git a/mysql-test/suite/gcol/t/innodb_virtual_debug_purge.test b/mysql-test/suite/gcol/t/innodb_virtual_debug_purge.test
index 09fba0285c7..7966953535c 100644
--- a/mysql-test/suite/gcol/t/innodb_virtual_debug_purge.test
+++ b/mysql-test/suite/gcol/t/innodb_virtual_debug_purge.test
@@ -131,9 +131,8 @@ CREATE TABLE t1 (a INT, b INT, c INT GENERATED ALWAYS AS(a+b));
 
 INSERT INTO t1(a, b) VALUES (1, 1), (2, 2), (3, 3), (4, 4);
 
-connection con1;
---echo # disable purge
-BEGIN; SELECT * FROM t0;
+connect (stop_purge,localhost,root,,);
+START TRANSACTION WITH CONSISTENT SNAPSHOT;
 
 connection default;
 DELETE FROM t1 WHERE a = 1;
@@ -148,13 +147,14 @@ send ALTER TABLE t1 ADD INDEX idx (c), ALGORITHM=INPLACE, LOCK=NONE;
 connection con1;
 SET DEBUG_SYNC= 'now WAIT_FOR uncommitted';
 
+BEGIN;
 DELETE FROM t1 WHERE a = 3;
 
 UPDATE t1 SET a = 7, b = 7 WHERE a = 4;
 
 INSERT INTO t1(a, b) VALUES (8, 8);
 
---echo # enable purge
+disconnect stop_purge;
 COMMIT;
 
 --echo # wait for purge to process the deleted/updated records.

CURRENT_TEST: innodb.innodb-isolation
mysqltest: At line 156: query 'SELECT c1, c2 FROM t1 WHERE c1 < ((SELECT COUNT(*) FROM t1) / 2) FOR UPDATE' failed with wrong errno ER_LOCK_DEADLOCK (1213): 'Deadlock found when trying to get lock; try restarting transaction', instead of ER_LOCK_WAIT_TIMEOUT (1205)...

The record had been modified by UPDATE t1 SET c2 = c2 * 3 WHERE c1 = 1 in line 51 in the default connection. This transaction is still open. This test would need to be shrunk or removed to adjust for the code change.

CURRENT_TEST: main.deadlock_innodb
mysqltest: In included file "./include/deadlock.inc": 
included from /mariadb/10.6/mysql-test/main/deadlock_innodb.test at line 17:
At line 117: query 'reap' failed: ER_LOCK_DEADLOCK (1213): Deadlock found when trying to get lock; try restarting transaction

The record had been modified by update t1 set x=1 where id = 0 in line 112. The fix is to avoid creating a read view:

diff --git a/mysql-test/include/deadlock.inc b/mysql-test/include/deadlock.inc
index abf217aea75..362d456e3f2 100644
--- a/mysql-test/include/deadlock.inc
+++ b/mysql-test/include/deadlock.inc
@@ -103,7 +103,6 @@ connection con2;
 
 # The following query should hang because con1 is locking the record
 update t2 set a=2 where b = 0;
-select * from t2;
 --send
 update t1 set x=2 where id = 0;
 --sleep 2

CURRENT_TEST: innodb.innodb_mysql
mysqltest: In included file "./include/innodb_rollback_on_timeout.inc": 
included from ./include/mix1.inc at line 482:
included from /mariadb/10.6/mysql-test/suite/innodb/t/innodb_mysql.test at line 19:
At line 29: query 'insert into t1 values (2)' failed with wrong errno ER_LOCK_DEADLOCK (1213): 'Deadlock found when trying to get lock; try restarting transaction', instead of ER_LOCK_WAIT_TIMEOUT (1205)...

This is fixed by removing a read view creation (the same fix as innodb.innodb_timeout_rollback).

I pushed this for some initial testing. If everything goes as expected, pre-built packages of this change should be available in https://ci.mariadb.org/42468/ within an hour or two.

Comment by Marko Mäkelä [ 2024-01-16 ]

My code change seems to also fix MDEV-26642 but not MDEV-26643.

Comment by Marko Mäkelä [ 2024-01-16 ]

Come to think of it, I think that reporting the somewhat surprising ER_LOCK_DEADLOCK might be acceptable. To avoid any breakage of existing applications, we might want to introduce a Boolean global variable innodb_strict_isolation (default OFF).

Comment by Marko Mäkelä [ 2024-01-30 ]

I filed MDEV-33332 for the SIGSEGV that mleich reported inside buf_read_ahead_linear(). It is unrelated to this change.

Comment by Vladislav Lesin [ 2024-02-01 ]

The idea of the fix looks correct to me. The summary of what we already discussed in slack and what is also described in commit message:

1. It would be good to have new error code and error message to avoid users confusing. We should also take care about replication events replaying on slaves when the error happens(see slave_transaction_retry_errors[] array).

2. We need to introduce additional option, which would switch on the functionality, and add combination with and without the option for the failed mtr test cases.

Generated at Thu Feb 08 10:34:52 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.