[MDEV-23560] Deadlock detected on SELECT when only one record being processed Created: 2020-08-24  Updated: 2020-10-17

Status: Open
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.4.14
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Steve Millington Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Environment:

Vanilla Docker image of MariaDb


Attachments: File mariabug.tar.gz    
Issue Links:
Relates
relates to MDEV-10962 Deadlock with 3 concurrent DELETEs by... Closed

 Description   

I have a complex situation that I have boiled down to a relatively simple testcase. I have been unable to use any MySQL Unit Test infrastructure as the issue involves having 3 simultaneous clients. I am attaching a JDBC program which can be used as described below to demonstrate the problem. I have confirmed this is not a JDBC issue as I have an equivalent ODBC program that exhibits the same behaviour.

I have one client session that has successfully done a 'SELECT ... FOR UPDATE' on a record (with autocommit off), and I have two other sessions (with autocommit off) that are doing the same select on the same key. These two other sessions are blocked, waiting for a lock. The session that has the lock, then does a delete of the selected record (in JDBC, it does a resultSet.deleteRow(), in ODBC it does a SQLSetPos(..., SQL_DELETE)). That is fine. Finally, that session does a commit. At this point the two blocked sessions come to life. I am expecting both of the 'SELECT' statements to return successfully with 'no rows found'. The bug is that instead of both doing that, one of them returns successfully with 'no rows found', but the other one produces a 'Deadlock found when trying to get lock; try restarting transaction'.
The session that produces the 'deadlock' error is always the second item in the waiter queue.
[Please see at the bottom for more specifics about when the problem occurs and when it doesn't]

Setup:
====
I have a simple table defined as

MariaDB [clouddb]> describe BUG_DEMO;
+-------+---------+------+-----+---------+-------+
| Field | Type    | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+-------+
| C1    | int(11) | NO   | PRI | NULL    |       |
| C2    | int(11) | NO   | UNI | NULL    |       |
+-------+---------+------+-----+---------+-------+
2 rows in set (0.001 sec)

created using 'CREATE TABLE BUG_DEMO (C1 INTEGER NOT NULL PRIMARY KEY, C2 INTEGER NOT NULL UNIQUE KEY)'

I then insert a single row into this table using for example "INSERT INTO BUG_DEMO(C1,C2) VALUES(12345, 12345)"

(These two operations can be done with my attached program by running 'make create' and 'make insert'.

Reproduce:
=======
Create 3 client Unix terminal windows. In each of them type 'make delete' and press enter. The 3 windows each connect to the database and await user action. In all three windows, now press enter. This issues the 'SELECT' in all 3 sessions. The first will execute immediately, and the other two will block. The first of the windows now prompts for user action; and you should press enter. This then does a delete and does another prompt. Notice the other windows remain unchanged. Then pressing enter on the active window causes a commit() to occur. At this point the two 'block'ed windows come to like and exhibit the undesired behaviour.

Problem Specifics:
The code in the given reproduces the problem. Some slight variations do change things.
a) The datatypes of the two columns seems to have no effect on the issue.
b) Having more columns does not seem to affect the issue
c) If I remove the 'UNIQUE KEY' from C2, the problem does not occur.
d) If the select is done on C1 (the primary key) instead of C2, the problem does not occur.
So, it seems to be specific to when a SELECT is done on a UNIQUE KEYed column that is NOT the primary key.



 Comments   
Comment by Marko Mäkelä [ 2020-08-24 ]

Where is the client program? Could you repeat this with something like

yes 'SELECT * FROM BUG_DEMO …;'|mysql -uroot test &
yes 'INSERT …;'|mysql -uroot test &

If you are using InnoDB, then this might be working as designed. InnoDB does not have a concept of ‘table row locks’, but instead it has ‘index record’ locks. If the SELECT is accessing the secondary UNIQUE index first, it can easily deadlock with an INSERT that would first lock (and modify) the primary key index before locking the secondary indexes, one by one. This report could actually be a duplicate of MDEV-10962. As you may know, both DELETE and UPDATE internally invoke a locking read first, and such deadlocks are possible when using a secondary index in the locking read.

Comment by Steve Millington [ 2020-08-24 ]

Sorry for the delay in putting my test program on - my network died. .. it is there now.

Comment by Steve Millington [ 2020-08-24 ]

Marko,

Thanks for getting back.

I think that 'If the SELECT is accessing the secondary UNIQUE index first, it can easily deadlock with an INSERT that would first lock (and modify) the primary key index before locking the secondary indexes, one by one.' is not the situation I am in. My situation is purely with SELECT ... / DELETE. The only INSERT is prior to the issue setting up the database.

This does look a bit similar to 10962, but it is really hard for me to tell.

Comment by Steve Millington [ 2020-08-24 ]

It is intuitively odd that the exact same select issues by two clients would produce deadlocks. If any locks were taken, one would expect them to be taken in the same order and therefore no deadlock to occur! However. I am aware that my intuition is not necessarily what would occur!

Comment by Steve Millington [ 2020-08-24 ]

Here is the output of innodb status

MariaDB [(none)]> show engine innodb status\G
*************************** 1. row ***************************
  Type: InnoDB
  Name: 
Status: 
=====================================
2020-08-24 15:53:29 0x7fab642ee700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 23 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 1 srv_active, 0 srv_shutdown, 7975 srv_idle
srv_master_thread log flush and writes: 7975
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 52
OS WAIT ARRAY INFO: signal count 35
RW-shared spins 55, rounds 1336, OS waits 44
RW-excl spins 0, rounds 0, OS waits 0
RW-sx spins 0, rounds 0, OS waits 0
Spin rounds per wait: 24.29 RW-shared, 0.00 RW-excl, 0.00 RW-sx
------------------------
LATEST DETECTED DEADLOCK
------------------------
2020-08-24 14:47:47 0x7fab64258700
*** (1) TRANSACTION:
TRANSACTION 69, ACTIVE 4 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 1128, 1 row lock(s)
MySQL thread id 30, OS thread handle 140374096008960, query id 112 192.168.7.158 vsamdso Statistics
SELECT C1,C2 FROM BUG_DEMO WHERE C2 = 12345 FOR UPDATE WAIT 30
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 6 page no 4 n bits 72 index C2 of table `clouddb`.`BUG_DEMO` trx id 69 lock_mode X locks rec but not gap waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
 0: len 4; hex 80003039; asc   09;;
 1: len 4; hex 80003039; asc   09;;
 
*** (2) TRANSACTION:
TRANSACTION 68, ACTIVE 5 sec starting index read
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1128, 2 row lock(s)
MySQL thread id 28, OS thread handle 140374096316160, query id 111 192.168.7.158 vsamdso Statistics
SELECT C1,C2 FROM BUG_DEMO WHERE C2 = 12345 FOR UPDATE WAIT 30
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 6 page no 4 n bits 72 index C2 of table `clouddb`.`BUG_DEMO` trx id 68 lock_mode X locks rec but not gap
Record lock, heap no 2 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
 0: len 4; hex 80003039; asc   09;;
 1: len 4; hex 80003039; asc   09;;
 
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 6 page no 4 n bits 72 index C2 of table `clouddb`.`BUG_DEMO` trx id 68 lock_mode X waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 2; compact format; info bits 32
 0: len 4; hex 80003039; asc   09;;
 1: len 4; hex 80003039; asc   09;;
 
*** WE ROLL BACK TRANSACTION (1)
------------
TRANSACTIONS
------------
Trx id counter 75
Purge done for trx's n:o < 75 undo n:o < 0 state: running
History list length 31
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 421849074450776, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421849074446520, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
--------
FILE I/O
--------
I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
I/O thread 1 state: waiting for completed aio requests (log thread)
I/O thread 2 state: waiting for completed aio requests (read thread)
I/O thread 3 state: waiting for completed aio requests (read thread)
I/O thread 4 state: waiting for completed aio requests (read thread)
I/O thread 5 state: waiting for completed aio requests (read thread)
I/O thread 6 state: waiting for completed aio requests (write thread)
I/O thread 7 state: waiting for completed aio requests (write thread)
I/O thread 8 state: waiting for completed aio requests (write thread)
I/O thread 9 state: waiting for completed aio requests (write thread)
Pending normal aio reads: [0, 0, 0, 0] , aio writes: [0, 0, 0, 0] ,
 ibuf aio reads:, log i/o's:, sync i/o's:
Pending flushes (fsync) log: 0; buffer pool: 0
192 OS file reads, 350 OS file writes, 131 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 0, seg size 2, 0 merges
merged operations:
 insert 0, delete mark 0, delete 0
discarded operations:
 insert 0, delete mark 0, delete 0
Hash table size 69257, node heap has 0 buffer(s)
Hash table size 69257, node heap has 0 buffer(s)
Hash table size 69257, node heap has 0 buffer(s)
Hash table size 69257, node heap has 0 buffer(s)
Hash table size 69257, node heap has 0 buffer(s)
Hash table size 69257, node heap has 0 buffer(s)
Hash table size 69257, node heap has 0 buffer(s)
Hash table size 69257, node heap has 0 buffer(s)
0.00 hash searches/s, 0.00 non-hash searches/s
---
LOG
---
Log sequence number 86093
Log flushed up to   86093
Pages flushed up to 86093
Last checkpoint at  86084
0 pending log flushes, 0 pending chkp writes
87 log i/o's done, 0.00 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 285212672
Dictionary memory allocated 32968
Buffer pool size   16046
Free buffers       15711
Database pages     335
Old database pages 0
Modified db pages  0
Percent of dirty pages(LRU & free pages): 0.000
Max dirty pages percent: 75.000
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 0, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 176, created 159, written 256
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 335, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
0 read views open inside InnoDB
Process ID=1, Main thread ID=140373336721152, state: sleeping
Number of rows inserted 4, updated 0, deleted 3, read 6
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
Number of system rows inserted 0, updated 0, deleted 0, read 0
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================
 
1 row in set (0.001 sec)

Comment by Steve Millington [ 2020-08-24 ]

I know nothing of this, but to my unknowing eyes, it looks like the deadlock is with itself!

Generated at Thu Feb 08 09:23:20 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.