[MDEV-18706] ER_LOCK_DEADLOCK on concurrent read and insert into already locked gap Created: 2019-02-23  Updated: 2024-01-16

Status: Stalled
Project: MariaDB Server
Component/s: Storage Engine - InnoDB, Storage Engine - XtraDB
Affects Version/s: 5.5, 10.1, 10.2, 10.3, 10.4
Fix Version/s: 10.4

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Vladislav Lesin
Resolution: Unresolved Votes: 1
Labels: lock, performance

Attachments: Text File failing_thread.txt     Text File tampering_thread.txt     File v.test    
Issue Links:
PartOf
includes MDEV-22475 versioning.update failed in buildbot ... Closed
Relates
relates to MDEV-13697 DB_TRX_ID is not always reset when th... Closed
relates to MDEV-14829 Assertion `0' failed in Protocol::end... Closed
relates to MDEV-16406 Refactor the InnoDB record locks Open
relates to MDEV-17073 INSERT…ON DUPLICATE KEY UPDATE became... Closed
relates to MDEV-18392 Getting DEADLOCKS frequently on Versi... Closed
relates to MDEV-20605 Awaken transaction can miss inserted ... Closed
relates to MDEV-20612 Improve InnoDB lock_sys scalability Closed
relates to MDEV-24738 Improve the InnoDB deadlock checker Closed
relates to MDEV-10962 Deadlock with 3 concurrent DELETEs by... Closed
relates to MDEV-14066 [Draft] Assertion failed: rec_get_del... Closed
relates to MDEV-20274 Deadlock on implicit locking and para... Open

 Description   

mysql_ha_read: Got error 149 when reading table and ER_LOCK_DEADLOCK on concurrent reading/updating versioned partitioned table

Notes:

  • The test case in the given form fails for in the first few attempts, but it can vary on different machines and builds; run with --repeat=N.
  • There are a couple of lines in the middle of the test which are unimportant for the error, but they help with the timing, without them it fails less reliably. The can probably be replaced with a proper synchronization.
  • I am not sure if the deadlock and the error in the log always come together. Please not that the error in the log alone does not make an MTR test fail, because it's currently suppressed in include/mtr_warnings.sql. We are considering removing the suppression, but first we need to check how many of existing tests would be affected. Meanwhile, if you need the test to fail upon the log error, remove the suppression from the file locally.
  • The problem might be not related to versioning, but I couldn't remove either versioning or partitioning from the test case.

--source include/have_innodb.inc
--source include/have_partition.inc
 
CREATE TABLE t1 (
  pk INT,
  c CHAR(32),
  PRIMARY KEY (pk)
) ENGINE=InnoDB WITH SYSTEM VERSIONING PARTITION BY HASH(pk) PARTITIONS 2;
 
INSERT INTO t1 VALUES (1,'foo'),(2,'bar');
HANDLER t1 OPEN AS h;
 
--connect (con1,localhost,root,,test)
--send
  UPDATE t1 SET c = 4;
 
--connection default
 
# Two statements below are not required for the scenario,
# but apparently they help with the timing, it fails better with them
--error ER_UNKNOWN_TABLE
HANDLER unknown_handler CLOSE;
--error ER_NO_SUCH_TABLE
HANDLER unknown_table OPEN;
 
HANDLER h READ `PRIMARY` PREV;
 
# Cleanup
 
--connection con1
--reap
--disconnect con1
--connection default
HANDLER h CLOSE;
DROP TABLE t1;

10.3 4946eb7b

mysqltest: At line 26: query 'HANDLER h READ `PRIMARY` PREV' failed: 1213: Deadlock found when trying to get lock; try restarting transaction

CURRENT_TEST: bug.error149b
2019-02-23 21:55:43 19 [ERROR] mysql_ha_read: Got error 149 when reading table 'h'



 Comments   
Comment by Aleksey Midenkov [ 2019-08-06 ]

Reproducible without System versioning:

--source include/have_innodb.inc
 
create or replace table t1 (pk int primary key, x int) engine innodb;
insert into t1 values (3, 0);
start transaction;
update t1 set x= 1;
connect (con1,localhost,root,,test);
send select * from t1 for update;
connection default;
--let $wait_condition= select count(*) from information_schema.innodb_lock_waits
--source include/wait_condition.inc
insert into t1 values (2, 1);
commit;
connection con1;
--echo # Unexpected ER_LOCK_DEADLOCK here
reap;
connection default;
disconnect con1;
drop table t1;

Cause

No real deadlock, but incomplete conflict detection:

1. T1-S-W waits on T2-X lock;
2. T2-II-W has to wait T1-S-W and here conflict resolution starts;
3. Conflict resolution goes through a tree of locks:
a. (T2-II-W, T2-X): ok
b. (T2-II-W, T1-S-W): conflict, traversing from T1-S-W
c. (T1-S-W, T2-X): conflict, found cycle (because resolution started from T2).

The weak point here is 2. (lock_rec_has_to_wait()), because T2-II-W doesn't have to wait T1: T2 already got X next-key lock that also holds the gap before the record.

Fix

Allow lock in lock_rec_other_has_conflicting() if there is already non-waiting stronger or equal lock in this transaction (lock_rec_has_expl()).

Analysis

Comment by Marko Mäkelä [ 2019-09-09 ]

I can repeat the deadlock on the latest 5.5, 10.1, 10.2 as well. The bug could exist since MySQL 4.0 or even earlier.

Comment by Marko Mäkelä [ 2019-09-09 ]

Sorry for the delay. I agree that the deadlock looks bogus, but I would like the fix to be a little cleaner, to reduce the potential of correctness regressions around locking reads.

Comment by Marko Mäkelä [ 2019-10-18 ]

I made some similar observations when debugging a test case for MDEV-20605. A locking SELECT (which can also be part of UPDATE or DELETE) is unnecessarily locking too many records. It might be better to attempt to fix MDEV-20605 first. That fix might end up fixing this case as well.

Comment by Aleksey Midenkov [ 2020-05-12 ]

By info provided in MDEV-20605 I can judge that the problems are different. MDEV-20605 is about lost gap locks after deleting records, this task is about spurious conflict detection.

Comment by Marko Mäkelä [ 2020-08-12 ]

Can you please rebase the fix to a recent main branch? I’d review it after a successful CI run.

Comment by Aleksey Midenkov [ 2020-08-25 ]

bb-10.2-midenok

Comment by Marko Mäkelä [ 2020-11-05 ]

I updated bb-10.2-midenok-innodb today with the latest 10.2. The old results were no longer available in the grid view, but in the cross-reference I can see that the test galera.MW-328D was failing also for earlier versions of the branch. That test last failed in any other 10.2-based branch in 2018. So, it looks like this (or MDEV-10962, which was present in the same branch) will need some more work.

Comment by Matthias Leich [ 2020-11-05 ]

Results of some rather short (errors are too frequent) testing with RQG
# mysqld: /Server/bb-10.2-midenok-innodb/storage/innobase/lock/lock0lock.cc:5262: ulint lock_rec_queue_validate(ulint, const buf_block_t*, const rec_t*, const dict_index_t*, const rec_offs*): Assertion `lock_get_wait(other_lock)' failed.
# mysqld: /Server/bb-10.2-midenok-innodb/storage/innobase/lock/lock0lock.cc:5263: ulint lock_rec_queue_validate(ulint, const buf_block_t*, const rec_t*, const dict_index_t*, const rec_offs*): Assertion `lock_rec_has_expl(LOCK_X | 1024U, block, heap_no, impl_trx)' failed.
# mysqld: /Server/bb-10.2-midenok-innodb/storage/innobase/lock/lock0lock.cc:5309: ulint lock_rec_queue_validate(ulint, const buf_block_t*, const rec_t*, const dict_index_t*, const rec_offs*): Assertion `!other_lock' failed.
 
sdp:/RQG/storage/1604583752   with masses of archives including rr traces

Comment by Aleksey Midenkov [ 2021-06-17 ]

Rebased bb-10.3-midenok-MDEV-18706 with the latest 10.3 revision.
mleich Can you please add at least one broken test case into the ticket (cleanup for the test case is not required)? Or ideally 3 test cases, one per each failure type you listed.

Comment by Marko Mäkelä [ 2021-06-17 ]

midenok, I think that it would be much more useful to test this in a 10.6-based branch, where the locking subsystem was improved in MDEV-20612, MDEV-24738 and others. After you have done that and checked that the results on the CI systems are fine (they are not available for your 10.3-based update yet), you can request review and additional testing.

Comment by Aleksey Midenkov [ 2021-07-05 ]

Updated bb-10.6-midenok-MDEV-18706

Please review and do additional testing.

Comment by Matthias Leich [ 2021-07-08 ]

bb-10.6-midenok-MDEV-18706 did not behave bad during RQG testing except I hit
mysqld: /data/Server/bb-10.6-midenok-MDEV-18706/storage/innobase/row/row0umod.cc:385: dberr_t row_undo_mod_clust(undo_node_t*, que_thr_t*): Assertion `rec_get_deleted_flag(btr_pcur_get_rec(pcur), (node->table)->not_redundant())' failed.
The simplified RQG grammar is
 
rand_int:
    { $my_int = 1 } |
    { $my_int = $prng->int( 65, 512) } ;
 
dml:
    UPDATE t1 SET a = rand_int WHERE a = rand_int OR a = rand_int |
    INSERT INTO t1 (a) VALUES ( rand_int) , ( rand_int) |
    INSERT INTO t1 (a) VALUES ( rand_int) , ( rand_int) ;
 
thread1:
    dml ;
 
thread1_connect:
    ;
 
thread1_init:
    CREATE TABLE t1 (a INT PRIMARY KEY) ENGINE = InnoDB ;
 
thread2:
    dml ;
 
thread2_connect:
    ;
 
thread2_init:
    ;
 
That basically means that we need only two sessions which run some simple INSERT/UPDATE mix concurrent on the same simple table.
My RQG batch tool started every second one additional RQG run.
The first replay happened 83s after the start of the batch tool.
In that moment were ~ 130 concurrent RQG tests around.
Replaying this problem under "rr" was possible but needs serious more attempts.
 
No replay of the problem on
origin/10.6 ae62f115beea8cb7c532ef1a72c359548316df9c 2021-07-20T10:55:03+03:00
with 400 finished attempts.

Comment by Aleksey Midenkov [ 2021-07-12 ]

The failure seems to be caused by MDEV-13697:

365             btr_pcur_commit_specify_mtr(pcur, &mtr);
366             DEBUG_SYNC_C("rollback_undo_pk");
367
368             if (err != DB_SUCCESS) {
369                     goto func_exit;
370             }
371
372             /* FIXME: Perform the below operations in the above
373             mini-transaction when possible. */
374
375             if (node->rec_type == TRX_UNDO_UPD_DEL_REC) {
376                     /* In delete-marked records, DB_TRX_ID must
377                     always refer to an existing update_undo log record. */
378                     ut_ad(node->new_trx_id);
379
380                     mtr.start();
381                     if (!btr_pcur_restore_position(BTR_MODIFY_LEAF, pcur, &mtr)) {
382                             goto mtr_commit_exit;
383                     }
384
385                     ut_ad(rec_get_deleted_flag(btr_pcur_get_rec(pcur),
386                                                dict_table_is_comp(node->table)));

Latch is released at 365, then another thread changes the record and assertion fails at 385. See failing_thread.txt and tampering_thread.txt for stack traces.

Comment by Marko Mäkelä [ 2021-07-20 ]

mleich says that the simplified RQG grammar does not fail for him on the latest 10.6 (without the code changes) but fails very easily with the code changes.

Locking forms up to 3 cornerstones of ACID. Records in transactions are protected in 2 means:

  • implicit locks: DB_TRX_ID on a record points to an open transaction; until MDEV-16232 is fixed, this only applies to INSERT, never to UPDATE or DELETE.
  • explicit locks: a record lock is explicitly registered in lock_sys.rec_hash either by the lock holder thread, or as a result of another thread requesting a record lock while an implicit lock exists (implicit-to-explicit conversion).

ROLLBACK operations are always supposed to be covered by an exclusive record lock of either type. The correctness of rollback must not rely on page latches. A failure in the highlighted code suggests that the being-rolled-back transaction is not holding a lock on the record that is being modified. This strongly suggests that there is an error in the suggested code change.

Comment by Marko Mäkelä [ 2021-07-20 ]

I quickly read the patch. I noticed an uncovered DEBUG_SYNC_C("row_search_lock_wait"). Also, the purpose and the correctness of tweaking the restored persistent cursor position in row_search_mvcc() is not clear to me. We must also keep in mind that in 10.3, MDEV-11369 introduced an optional hidden metadata record at the start of the clustered index.

Given that this change deals with gap locks, it should be better to address MDEV-20605 first, to make gap locks perform correctly.

Comment by Marko Mäkelä [ 2021-07-20 ]

In which way would this fix differ from the following MySQL change?
Bug #23755664 DEADLOCK WITH 3 CONCURRENT DELETES BY UNIQUE KEY

Comment by Aleksey Midenkov [ 2021-07-21 ]

In which way would this fix differ from the following MySQL change?
Bug #23755664 DEADLOCK WITH 3 CONCURRENT DELETES BY UNIQUE KEY

This is similar but more complex approach to solve MDEV-10962. I guess it doesn't solve MDEV-18706.

Comment by Aleksey Midenkov [ 2021-07-21 ]

ROLLBACK operations are always supposed to be covered by an exclusive record lock of either type. The correctness of rollback must not rely on page latches. A failure in the highlighted code suggests that the being-rolled-back transaction is not holding a lock on the record that is being modified. This strongly suggests that there is an error in the suggested code change.

I suspect the situation is more complex: race condition between undo, purge and optimistic insert.

While row_undo_mod_clust() does mtr commit, purge thread frees the record and sets PAGE_FREE field to point to that record (heap_no 23). Then 3rd thread gets into the scene and inserts into that record.

That 3rd thread first tries row_ins_clust_index_entry_by_modify() but fails to lock heap_no 23: undo thread holds S-lock (maybe X-lock as well, but I've seen S-lock).

After that 3rd thread does btr_cur_optimistic_insert() and checks the gap at heap_no 2. The gap is free to insert (LOCK_X | LOCK_GAP | LOCK_INSERT_INTENTION).

Optimistic insert tries to insert after infimum: that's why it locks heap_no 2 instead of 23. But when it does page_cur_insert_rec_low() and finds free record via PAGE_FREE field it gets record 23.

Meanwhile undo thread is still at that mtr commit call.

In short words: insert thread tried to insert by update but failed to lock as expected, then it tried optimistic insert and succeeded to write into the very same record where it failed in the first attempt.

Comment by Aleksey Midenkov [ 2021-07-26 ]

Reproduced by v.test.

New lock X|GAP|INSERT_INTENTION was considered weaker than X|GAP. That
was wrong because GAP locks without INSERT_INTENTION don't wait for
anything: that is the weakest locks.

Fixed by denying pure GAP to be stronger than GAP|INSERT_INTENTION.

Comment by Aleksey Midenkov [ 2021-07-26 ]

mleich Can you please test again bb-10.6-midenok-MDEV-18706 ?

Comment by Marko Mäkelä [ 2021-09-29 ]

I think that this is best reviewed (and initially tested) by vlad.lesin, after he is done with the gap lock fixes that were observed in MDEV-20605.

Comment by Vladislav Lesin [ 2023-08-08 ]

Why the case is considered as a bug?

1. trx 1 executes "update t1 set x= 1;" and X-locks (3,1) record
2. trx 2 executes "select * from t1 for update", it requests next-key lock on (3,1) records and suspends as trx 1 owns conflicting lock
3. trx 1 executes "insert into t1 values (2, 1);", it requests conflicting with trx 2 lock, deadlock is detected, as trx 1 should wait for trx 2, which should wait for trx 1. trx 1 has more weight, as it contains more undo log records, that's why trx 2 is chosen as a victim
4. trx 2 is killed with "deadlock" error

It behaves as expected.

If trx 1 did not conflict with waiting trx 2 lock on step 3, it would be ACID violation, as trx 2 is executed in "repeatable read" isolation level.

See also this comment and MDEV-27992.

Comment by Vladislav Lesin [ 2023-08-08 ]

It looks like MDEV-27025. midenok, if you don't have any objections, I would mark it as MDEV-27025 duplicate and close.

Comment by Aleksey Midenkov [ 2023-08-11 ]

vlad.lesin Transactions in 2. and 3. are coming in parallel. Nobody defined 2. comes before 3. except the system itself. System made wrong choice since transaction 2. is cancelled. Think of it as transaction 3. comes before 2.

I remember I checked this, but I rechecked:

--- ./mysql-test/suite/innodb/r/innodb-lock.result	2023-08-11 21:19:40.365663652 +0300
+++ ./mysql-test/suite/innodb/r/innodb-lock.reject	2023-08-11 22:16:43.070838144 +0300
@@ -176,15 +176,11 @@
 insert into t1 values (2, 1);
 insert into t1 values (4, 1);
 insert into t1 values (5, 1);
-commit;
+rollback;
 connection con1;
 # No ER_LOCK_DEADLOCK; all rows returned
 pk	x
-1	1
-2	1
-3	1
-4	1
-5	1
+3	0
 connection default;
 disconnect con1;
 drop table t1;
@@ -200,15 +196,12 @@
 insert into t1 values (2, 2);
 insert into t1 values (3, 2);
 insert into t1 values (5, 2);
-commit;
+rollback;
 connection con1;
 # No ER_LOCK_DEADLOCK; all rows returned
 pk	x
 1	2
-2	2
-3	2
-4	2
-5	2
+4	0
 connection default;
 disconnect con1;
 drop table t1;
@@ -224,15 +217,11 @@
 insert into t1 values (2, 3);
 insert into t1 values (4, 3);
 insert into t1 values (5, 3);
-commit;
+rollback;
 connection con1;
 # No ER_LOCK_DEADLOCK; all rows returned
 pk	x
-5	3
-4	3
-3	3
-2	3
-1	3
+3	0
 connection default;
 disconnect con1;
 drop table t1;
@@ -248,15 +237,12 @@
 insert into t1 values (1, 4);
 insert into t1 values (3, 4);
 insert into t1 values (4, 4);
-commit;
+rollback;
 connection con1;
 # No ER_LOCK_DEADLOCK; all rows returned
 pk	x
 5	4
-4	4
-3	4
-2	4
-1	4
+2	0
 connection default;
 disconnect con1;
 drop table t1;
@@ -272,17 +258,13 @@
 insert into t1 values (2, 5);
 insert into t1 values (4, 5);
 insert into t1 values (5, 5);
-commit;
+rollback;
 connection con1;
 # No ER_LOCK_DEADLOCK; all rows returned
 connection default;
 select * from t1;
 pk	x
-1	15
-2	15
-3	15
-4	15
-5	15
+3	10
 disconnect con1;
 drop table t1;
 # F) Same as B) but UPDATE instead SELECT
@@ -297,17 +279,14 @@
 insert into t1 values (2, 6);
 insert into t1 values (3, 6);
 insert into t1 values (5, 6);
-commit;
+rollback;
 connection con1;
 # No ER_LOCK_DEADLOCK; all rows returned
 connection default;
 select * from t1;
 pk	x
 1	16
-2	16
-3	16
-4	16
-5	16
+4	10
 disconnect con1;
 drop table t1;
 # G) Insert into an already passed unlocked gap (15)
@@ -327,7 +306,7 @@
 insert into t1 values (5, 3);
 insert into t1 values (12, 3);
 insert into t1 values (14, 3);
-commit;
+rollback;
 connection con1;
 # Unsolvable task, so fail with deadlock:
 ERROR 40001: Deadlock found when trying to get lock; try restarting transaction

AFAICS no inconsistency on rollback. Probably this should be added to test too. vlad.lesin maybe I'm missing something?

Comment by Vladislav Lesin [ 2023-08-15 ]

midenok,

> Think of it as transaction 3. comes before 2.

There are only 2 active transactions in the scenario. I suppose you meant steps 2 and 3, not transactions. If so, ok suppose trx 1 is requesting ii-lock on (2,1) record on step 3, executing "insert into t1 values (2, 1);", before trx 2 locked it on step 2, executing "select * from t1 for update". Then trx 1 should acquire the lock, and trx 2 should be blocked until trx 1 is finished. But you insist that trx 2 is killed by deadlock resolver. If so, do you have any test which would confirm it? The test does not suite, see wait condition there. Please correct me if I did not understand your argument.

> System made wrong choice since transaction 2. is cancelled.

What exactly subsystem do you mean? Deadlock resolver counts transaction weights. The more locks and undo log records transaction holds, the more its weight. Transaction with less weight is being chosen as deadlock resolver victim. So the weight of trx 1 is greater then the one of trx2, that's why trx2 is cancelled. Or you mean something else?

By ACID violation I meant the following. trx 2 is executed on RR isolation level. Trx 2 locked some range before trx 1 is inserted something in the range(see wait condition in the test), what means trx 2 must not see any records produced by trx 1 in that range. As the read is locking, MVCC is not used. If trx 1 inserts something in the locked by trx 2 gap, and trx 2 re-reads the containing the gap range, it will get record which must not be seen for trx 2. That's why trx 1 ii-lock conflicts with waiting trx2 ordinary(record + preceding gap) lock. I.e. when persistent cursor position is restored after trx 2 is awakened (see sel_restore_position_for_mysql() call after row_mysql_handle_errors() in row_search_mvcc()), it will be set on (3,1) record, and "select * from t1 for update;" will miss (2,1) record. If trx 2 is not auto-committed, and then executes the same "select * from t1 for update;" once more, it will see (2,1) record, what is isolation level violation. And cancelling trx2 with deadlock error is less evil then isolation violation.

Comment by Aleksey Midenkov [ 2023-08-15 ]

Yes, transaction 1. and 3. is the same transaction trx 2. I didn't write "from step" for brevity. vlad.lesin I don't insist anything, please see task description. You can test it yourself, it does cancel SELECT.

> What exactly subsystem do you mean?

I don't mean exact subsystem. I mean user POV.

Comment by Aleksey Midenkov [ 2023-08-15 ]

vlad.lesin Can you show the test case with ACID violation, please?

Comment by Vladislav Lesin [ 2023-08-15 ]

midenok, ok, here is modified test with different locking order:

--source include/have_innodb.inc                                                
--source include/have_debug.inc                                                 
--source include/have_debug_sync.inc                                            
                                                                                
create or replace table t1 (pk int primary key, x int) engine innodb;           
insert into t1 values (3, 0);                                                   
start transaction; # trx 1                                                      
update t1 set x= 1;                                                             
insert into t1 values (2, 1);                                                   
                                                                                
connect (con1,localhost,root,,test);                                            
SET DEBUG_SYNC="lock_wait_before_suspend SIGNAL trx2_suspended";                
send select * from t1 for update; # trx 2                                       
                                                                                
connection default;                                                             
SET DEBUG_SYNC="now WAIT_FOR trx2_suspended";                                   
commit;                                                                         
                                                                                
connection con1;                                                                
--echo # There is no ER_LOCK_DEADLOCK here, as expected                         
reap;                                                                           
                                                                                
connection default;                                                             
disconnect con1;                                                                
drop table t1; 

And there is no deadlock error, as expected. As well as there is deadlock error in the original test. What is also expected(I explained it in details in the my previous comment).

Comment by Vladislav Lesin [ 2023-08-15 ]

midenok,
> Can you show the test case with ACID violation, please?
Ah, I see you did some workaround by moving cursor before its storing and after it's restoring. I need some time to understand how it works with MDEV-20605 fix.

Generated at Thu Feb 08 08:46:07 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.