[MDEV-17512] Deadlock after locking all rows in table - Jira

Details

Type: Bug
Status: Open (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: 10.2.18, 10.3.10
Fix Version/s: 10.2(EOL)
Component/s: Locking, Storage Engine - InnoDB
Labels:
- performance
Environment:
Windows 7 Pro x64, Ubuntu Xenial

Description

This is a surprisingly simple scenario leading to a (to me) surprising deadlock. Since I'm not an expert on InnoDB I have already tried other ways to find out whether it is a bug (see https://dba.stackexchange.com/questions/216450/is-this-surprising-deadlock-reasonable-or-a-mariadb-bug), but unfortunately didn't get any helpful responses, so I'm filing it now.

Here is the setup. The isolation level is left at its default (REPEATABLE_READ):

create database deadlock;
use deadlock;
create table foo(id int primary key) engine=InnoDB;
insert into foo values (10);

Now we can start. Let's open two connections A and B to the database.

-- Connection A
begin;
select * from foo for update;

-- Connection B
begin;
select * from foo for update;

At this point, the select issued by B is waiting. This is as expected because A is holding the relevant locks. Note that the StackExchange question used "delete from foo;" instead in both transactions - it makes no difference to the result, but a select seems easier to analyze.

-- Connection A
insert into foo values (9);

And at this point, mariadb kills the waiting query of B due to a deadlock. This is very surprising to me, because it would mean that B has already managed to acquire some locks.

As an additional point of information, if you use the id 11 instead of 9 in the last statement, there is no deadlock.

This test is a simplification of a production scenario where this deadlock kept occurring - there was a transaction that first deleted "invalid" records before inserting a new one. Whether a record was "invalid" was determined in a way that did not use an index, so the entire table had to be scanned. I did not understand how the deadlock could occur, since my mental model was that the delete would essentially get an exclusive lock on everything relevant in the table, so a second transaction running the same queries would have to wait before being able to acquire any locks, and thus could not possibly contribute to a deadlock.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

last deadlock information.txt
2 kB
2018-10-21 08:31

Issue Links

relates to

MDEV-16406 Refactor the InnoDB record locks

Open

MDEV-24813 Locking full table scan fails to use table-level locking

In Review

Activity

Ascending order - Click to sort in descending order

Elena Stepanova added a comment - 2018-12-29 13:58

Here is the last deadlock info for the test case that you described. It is basically the same as the one you attached, but the attachment is for DELETE, so I'll get another one, to avoid confusion.

------------------------

LATEST DETECTED DEADLOCK

------------------------

2018-12-29 15:53:36 0x7ffb08d3c700

*** (1) TRANSACTION:

TRANSACTION 58, ACTIVE 15 sec starting index read

mysql tables in use 1, locked 1

LOCK WAIT 2 lock struct(s), heap size 1160, 1 row lock(s)

MySQL thread id 9, OS thread handle 140716161918720, query id 54 localhost ::1 root Sending data

select * from foo for update

*** (1) WAITING FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 6 page no 3 n bits 72 index PRIMARY of table `deadlock`.`foo` trx id 58 lock_mode X waiting

Record lock, heap no 2 PHYSICAL RECORD: n_fields 3; compact format; info bits 0

 0: len 4; hex 8000000a; asc     ;;

 1: len 6; hex 000000000000; asc       ;;

 2: len 7; hex 80000000000000; asc        ;;

*** (2) TRANSACTION:

TRANSACTION 57, ACTIVE 19 sec inserting

mysql tables in use 1, locked 1

3 lock struct(s), heap size 1160, 3 row lock(s)

MySQL thread id 10, OS thread handle 140716161615616, query id 57 localhost ::1 root Update

insert into foo values (9)

*** (2) HOLDS THE LOCK(S):

RECORD LOCKS space id 6 page no 3 n bits 72 index PRIMARY of table `deadlock`.`foo` trx id 57 lock_mode X

Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0

 0: len 8; hex 73757072656d756d; asc supremum;;

Record lock, heap no 2 PHYSICAL RECORD: n_fields 3; compact format; info bits 0

 0: len 4; hex 8000000a; asc     ;;

 1: len 6; hex 000000000000; asc       ;;

 2: len 7; hex 80000000000000; asc        ;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 6 page no 3 n bits 72 index PRIMARY of table `deadlock`.`foo` trx id 57 lock_mode X locks gap before rec insert intention waiting

Record lock, heap no 2 PHYSICAL RECORD: n_fields 3; compact format; info bits 0

 0: len 4; hex 8000000a; asc     ;;

 1: len 6; hex 000000000000; asc       ;;

 2: len 7; hex 80000000000000; asc        ;;

*** WE ROLL BACK TRANSACTION (1)

Here Connection A is TRANSACTION(2), and Connection B is TRANSACTION(1)

So, to my understanding, in your case Connection A, by running SELECT .. FOR UPDATE, acquired an exclusive lock on the index record, and also a shared gap lock.
Connection B, by starting SELECT .. FOR UPDATE, also acquired a shared gap lock (nothing prevents it from doing so), and requested an exclusive lock on the index record, but couldn't get the record lock, hence waiting.
Then, Connection A, by starting INSERT, requested an exclusive gap lock, but couldn't get it, because Connection B is holding the shared one:

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 32882 page no 3 n bits 72 index PRIMARY of table `deadlock`.`foo` trx id 481731 lock_mode X locks gap before rec i

Hence the deadlock.

It's a bit inconvenient that there is no clear indication of gap lock holding in the LAST DEADLOCK info (or in TRANSACTIONS info, for that matter), so I'll assign it to marko to confirm that's what is happening here, and if so, clarify if it's possible to see acquired gap locks in innodb status (apart from counting rows and seeing that there is one too many in "row locks").

Elena Stepanova added a comment - 2018-12-29 13:58 Here is the last deadlock info for the test case that you described. It is basically the same as the one you attached, but the attachment is for DELETE , so I'll get another one, to avoid confusion. ------------------------ LATEST DETECTED DEADLOCK ------------------------ 2018-12-29 15:53:36 0x7ffb08d3c700 *** (1) TRANSACTION: TRANSACTION 58, ACTIVE 15 sec starting index read mysql tables in use 1, locked 1 LOCK WAIT 2 lock struct(s), heap size 1160, 1 row lock(s) MySQL thread id 9, OS thread handle 140716161918720, query id 54 localhost ::1 root Sending data select * from foo for update *** (1) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 6 page no 3 n bits 72 index PRIMARY of table `deadlock`.`foo` trx id 58 lock_mode X waiting Record lock, heap no 2 PHYSICAL RECORD: n_fields 3; compact format; info bits 0 0: len 4; hex 8000000a; asc ;; 1: len 6; hex 000000000000; asc ;; 2: len 7; hex 80000000000000; asc ;; *** (2) TRANSACTION: TRANSACTION 57, ACTIVE 19 sec inserting mysql tables in use 1, locked 1 3 lock struct(s), heap size 1160, 3 row lock(s) MySQL thread id 10, OS thread handle 140716161615616, query id 57 localhost ::1 root Update insert into foo values (9) *** (2) HOLDS THE LOCK(S): RECORD LOCKS space id 6 page no 3 n bits 72 index PRIMARY of table `deadlock`.`foo` trx id 57 lock_mode X Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0 0: len 8; hex 73757072656d756d; asc supremum;; Record lock, heap no 2 PHYSICAL RECORD: n_fields 3; compact format; info bits 0 0: len 4; hex 8000000a; asc ;; 1: len 6; hex 000000000000; asc ;; 2: len 7; hex 80000000000000; asc ;; *** (2) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 6 page no 3 n bits 72 index PRIMARY of table `deadlock`.`foo` trx id 57 lock_mode X locks gap before rec insert intention waiting Record lock, heap no 2 PHYSICAL RECORD: n_fields 3; compact format; info bits 0 0: len 4; hex 8000000a; asc ;; 1: len 6; hex 000000000000; asc ;; 2: len 7; hex 80000000000000; asc ;; *** WE ROLL BACK TRANSACTION (1) Here Connection A is TRANSACTION(2) , and Connection B is TRANSACTION(1) So, to my understanding, in your case Connection A , by running SELECT .. FOR UPDATE , acquired an exclusive lock on the index record, and also a shared gap lock. Connection B , by starting SELECT .. FOR UPDATE , also acquired a shared gap lock (nothing prevents it from doing so), and requested an exclusive lock on the index record, but couldn't get the record lock, hence waiting. Then, Connection A , by starting INSERT , requested an exclusive gap lock, but couldn't get it, because Connection B is holding the shared one: *** (2) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 32882 page no 3 n bits 72 index PRIMARY of table `deadlock`.`foo` trx id 481731 lock_mode X locks gap before rec i Hence the deadlock. It's a bit inconvenient that there is no clear indication of gap lock holding in the LAST DEADLOCK info (or in TRANSACTIONS info, for that matter), so I'll assign it to marko to confirm that's what is happening here, and if so, clarify if it's possible to see acquired gap locks in innodb status (apart from counting rows and seeing that there is one too many in "row locks").

Marko Mäkelä added a comment - 2018-12-31 09:04 - edited

According to what I have written in MDEV-16406 half a year ago, SELECT…FOR UPDATE indeed acquires a shared gap lock.

The test case is very similar to my test case in MDEV-16406. The ‘insert intention’ lock that will conflict with the SELECT * FOR UPDATE actually is an exclusive gap lock.

It surely feels counter-intuitive that FOR UPDATE is only acquiring a shared gap lock, and not an exclusive one. The motivation could have been to minimize locking conflicts. I don’t think that this behaviour directly contradicts the FOR UPDATE request, because it is only INSERT that causes the deadlock.

Marko Mäkelä added a comment - 2018-12-31 09:04 - edited According to what I have written in MDEV-16406 half a year ago, SELECT…FOR UPDATE indeed acquires a shared gap lock. The test case is very similar to my test case in MDEV-16406 . The ‘insert intention’ lock that will conflict with the SELECT * FOR UPDATE actually is an exclusive gap lock. It surely feels counter-intuitive that FOR UPDATE is only acquiring a shared gap lock, and not an exclusive one. The motivation could have been to minimize locking conflicts. I don’t think that this behaviour directly contradicts the FOR UPDATE request, because it is only INSERT that causes the deadlock.

Marko Mäkelä added a comment - 2021-04-13 15:40

The terminology around gap locks is confusing. There are not actually "shared" or "exclusive" gap locks, but we could call them "read gap" locks and "write gap" locks (a.k.a. insert intention locks). I have tried to clarify this in MDEV-16406.

MDEV-24813 has been filed for making a locking full table scan to lock the entire table. That should improve performance and avoid this type of deadlock.

Marko Mäkelä added a comment - 2021-04-13 15:40 The terminology around gap locks is confusing. There are not actually "shared" or "exclusive" gap locks, but we could call them "read gap" locks and "write gap" locks (a.k.a. insert intention locks). I have tried to clarify this in MDEV-16406 . MDEV-24813 has been filed for making a locking full table scan to lock the entire table. That should improve performance and avoid this type of deadlock.

MariaDB Server

Deadlock after locking all rows in table

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration