[MDEV-32898] Phantom rows caused by UPDATE of PRIMARY KEY Created: 2023-11-28 Updated: 2024-02-01 |
|
| Status: | Stalled |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.4, 10.5, 10.6, 10.11, 11.0, 11.1, 11.2, 11.2.2 |
| Fix Version/s: | 10.5, 10.6, 10.11, 11.0, 11.1, 11.2, 11.3, 11.4 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Zhuang Liu | Assignee: | Marko Mäkelä |
| Resolution: | Unresolved | Votes: | 1 |
| Labels: | innodb | ||
| Environment: |
Ubuntu 22.04 |
||
| Issue Links: |
|
||||||||||||||||
| Description |
|
Under REPEATABLE-READ isolation level,an UPDATE statement which update the value of primary key caused phantom rows in another transaction.
It appears that a phantom row (2, 2) showed up in the second consistent read of T2. And if you commit the second transaction, the phantom row will disappear. I'm not sure whether this is a new bug or a duplicate one. From the user's perspective, I haven't inserted a new row, updating existing rows should not result in phantom rows. |
| Comments |
| Comment by Zhuang Liu [ 2023-11-28 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
This bug can also be reproduced in MySQL8.2.0 and has been verified by MySQL Verification Team. mysql# #113228 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-11-30 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thank you. Here is my mtr version of this:
I got the same result on the 10.4 branch as well. I did not explicitly change the transaction isolation level, because REPEATABLE READ is the default. As far as I remember, some people who worked in the MySQL Falcon project back when Innobase Oy was an Oracle subsidiary but MySQL AB wasn’t, called this InnoDB anomaly the WRITE COMMITTED isolation level. That is, locking reads in InnoDB always were essentially READ COMMITTED. All we want is to successfully acquire a lock. I had discussed a possible way of fixing this in I think that this bug has been reported earlier, and finally I found MDEV-26642 and MDEV-26643, which are very similar. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-11-30 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Does this report look like a duplicate of MDEV-26642 or MDEV-26643 to you? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Zhuang Liu [ 2023-11-30 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I agree that they are duplicated, they appear to be different phenomena caused by the same bug, but I am still puzzled by the occurrence of the phantom row. It seems that the first UPDATE statement actually deletes a row (2, 2) and than inserts a new row (3, 2). By the way, the statement 'SELECT * FROM t LOCK IN SHARE MODE' is unnecessary for reproducing the bug. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2024-01-16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I was able to fix my version with a simple patch:
The patch will flag an error when an attempt is made to lock a PRIMARY KEY record that is not visible in the current read view. For now, I used an existing error code for this.
This patch will cause 6 existing regression tests to fail. I will have to review each failure carefully to see if the patch could be improved, or if the tests need to be adjusted. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2024-01-16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I checked the test failures:
The record had been inserted by connection b at line 29, insert into t1 values (3, 1, null). The transaction was started and committed after between start transaction with consistent snapshot and the failure in connection a. This test needs to be adjusted, or maybe removed.
The INSERT in con2 was not committed yet, so it will not be in the read view of con1. We could avoid this test failure if the patch didn’t validate the visibility before waiting for the lock (which would time out). Better, we could simply remove the read view creation from con1 to make the test pass:
The record had been modified by DELETE FROM t1 WHERE a = 1 in connection default in line 139. The fix is simple: we do not actually need a read view in this transaction:
The record had been modified by UPDATE t1 SET c2 = c2 * 3 WHERE c1 = 1 in line 51 in the default connection. This transaction is still open. This test would need to be shrunk or removed to adjust for the code change.
The record had been modified by update t1 set x=1 where id = 0 in line 112. The fix is to avoid creating a read view:
This is fixed by removing a read view creation (the same fix as innodb.innodb_timeout_rollback). I pushed this for some initial testing. If everything goes as expected, pre-built packages of this change should be available in https://ci.mariadb.org/42468/ within an hour or two. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2024-01-16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
My code change seems to also fix MDEV-26642 but not MDEV-26643. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2024-01-16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Come to think of it, I think that reporting the somewhat surprising ER_LOCK_DEADLOCK might be acceptable. To avoid any breakage of existing applications, we might want to introduce a Boolean global variable innodb_strict_isolation (default OFF). | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2024-01-30 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I filed MDEV-33332 for the SIGSEGV that mleich reported inside buf_read_ahead_linear(). It is unrelated to this change. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Lesin [ 2024-02-01 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The idea of the fix looks correct to me. The summary of what we already discussed in slack and what is also described in commit message: 1. It would be good to have new error code and error message to avoid users confusing. We should also take care about replication events replaying on slaves when the error happens(see slave_transaction_retry_errors[] array). 2. We need to introduce additional option, which would switch on the functionality, and add combination with and without the option for the failed mtr test cases. |