[MDEV-24738] Improve the InnoDB deadlock checker Created: 2021-01-29  Updated: 2023-08-17  Resolved: 2021-02-17

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Fix Version/s: 10.6.0

Type: Task Priority: Major
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: performance

Issue Links:
Blocks
is blocked by MDEV-24671 Assertion failure in lock_wait_table_... Closed
is blocked by MDEV-24731 Excessive mutex contention in Deadloc... Closed
Problem/Incident
causes MDEV-25750 Assertion `wait_lock->is_waiting()' i... Closed
Relates
relates to MDEV-24948 thd_need_wait_reports() hurts perform... Open
relates to MDEV-29187 Deadlock output in InnoDB status alwa... Closed
relates to MDEV-18706 ER_LOCK_DEADLOCK on concurrent read a... Stalled
relates to MDEV-25594 Crash in deadlock checker under high ... Closed
relates to MDEV-27025 insert-intention lock conflicts with ... Closed
relates to MDEV-30973 Enhance output of innodb deadlocks (S... Stalled

 Description   

The InnoDB deadlock checker is holding lock_sys.mutex for a long time. After MDEV-24671 and MDEV-24731, we could almost perform the check by using a combination of lock_sys.wait_mutex and individual trx_t::mutex to protect the reads of trx->lock.wait_lock (the field may change from a non-null pointer to another non-null pointer without holding lock_sys.wait_mutex).

We can do even better: use Brent’s cycle detection algorithm on trx->lock.wait_trx.. (a new field protected by lock_sys.wait_mutex), and then release lock_sys.wait_mutex, acquire lock_sys.mutex and reacquire lock_sys.wait_mutex to dump information on all transactions that participate in the the deadlock, as well as to choose the victim transaction to be the one with the smallest weight. No dynamic memory allocation is needed, except possibly for some character strings that are part of the output.

There may be at most 8×innodb_page_size active transactions (524,288 when using innodb_page_size=64k). It is probably easiest to treat a waits-for path that exceeds 255 steps as a deadlock. In the old implementation, the cut-off was 200.

The old deadlock reporter only displayed information about 2 transactions, even though a deadlock may involve multiple transactions.



 Comments   
Comment by Marko Mäkelä [ 2021-02-04 ]

I overlooked that a single trx->lock.wait_trx is not enough after all, because a waiting lock request may conflict with multiple pre-existing locks:

trx1: READ(A) (granted)
trx2: READ(A) (granted)
trx3: WRITE(B) (granted)
trx2: READ(B) (waiting)
trx3: WRITE(A) (waiting)

We could have trx3->lock.wait_trx==trx1 because trx1 (and trx2) is holding a conflicting lock. trx1 itself is not waiting for anything. But, the other holder of a conflicting lock is: trx2->lock.wait_trx==trx3. So, there already is a deadlock between trx2 and trx3, but we would fail to notice that trx3 is also waiting for trx2, which is a deadlock.

As far as I can tell, Brent’s cycle detection algorithm is only applicable to graphs that are not branching.

The new deadlock checker in MySQL 8.0.18 seems to suffer from the same problem. As far as I can tell, it is an intentional approximation to allow the deadlock checks to run fast. Compared to earlier, some transactions would fail to report an immediate deadlock, if it is ‘hidden’ behind other lock holders. The innodb_lock_wait_timeout could come to their rescue if the original conflicting locks are not released quickly enough to expose the hidden deadlocks.

With a small adjustment, my deadlock checker should become equivalent to the MySQL one. Whenever the trx->lock.wait_trx is changed to point to another conflicting lock holder, we can ensure that a deadlock check will be scheduled for the new trx->lock.wait_trx.

Generated at Thu Feb 08 09:32:17 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.