[MDEV-10962] Deadlock with 3 concurrent DELETEs by unique key - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 5.5(EOL), 10.0(EOL), 10.1(EOL), 10.2(EOL), 10.3(EOL), 10.4(EOL)
Fix Version/s: 10.4.31, 10.5.22, 10.6.15, 10.9.8, 10.10.6, 10.11.5, 11.0.3, 11.1.2, 11.2.1
Component/s: Storage Engine - InnoDB
Labels:
- upstream

Description

As explained in upstream bug reports (that has all the details on the test case):

http://bugs.mysql.com/bug.php?id=82127
https://bugs.launchpad.net/percona-server/+bug/1598822

there is a deadlock scenario with 3 concurrent DELETEs by UNIQUE key that can not be explained by the manual:

CREATE TABLE `tu`(`id` int(11), `a` int(11) DEFAULT NULL, `b` varchar(10) DEFAULT NULL, `c` varchar(10) DEFAULT NULL, PRIMARY KEY(`id`), UNIQUE KEY `u`(`a`,`b`)) ENGINE=InnoDB DEFAULT CHARSET=latin1 STATS_PERSISTENT=0;

insert into tu values(1,1,'a','a'),(2,9999,'xxxx','x'),(3,10000,'b','b'),(4,4,'c','c');

mysqlslap -uroot --concurrency=3 --create-schema=test --no-drop --number-of-queries=1000 --query="delete from tu where a = 9999 and b = 'xxxx'"

mysqlslap: Cannot run query delete from tu where a = 9999 and b = 'xxxx' ERROR : Deadlock found when trying to get lock; try restarting transaction

Deadlock happens both with triggers mentioned in that bug reports and without them (just less often).

The problem was originally noted by customer on MariaDB 5.5.24, but affects all released versions up to those based on InnoDB from 5.7.x for sure.

As there is no visible progress on upstream bugs, I create this bug report for MariaDB to decide if there is anything to fix here or to document clearly in the knowledge base.

Attachments

Issue Links

is blocked by

MDEV-27025 insert-intention lock conflicts with waiting ORDINARY lock

Closed

relates to

MDEV-14589 InnoDB should not lock a delete-marked record

Closed

MDEV-16232 Use fewer mini-transactions

Stalled

MDEV-16402 Support Index Condition Pushdown for clustered PK scans

Confirmed

MDEV-16406 Refactor the InnoDB record locks

Open

MDEV-18706 ER_LOCK_DEADLOCK on concurrent read and insert into already locked gap

In Review

MDEV-31833 replication breaks when using optimistic replication and replica is a galera node

Closed

MDEV-20605 Awaken transaction can miss inserted by other transaction records due to wrong persistent cursor restoration

Closed

MDEV-22698 Deadlock on concurrent acquisition from multiple indexes

Open

MDEV-23560 Deadlock detected on SELECT when only one record being processed

Open

MDEV-28800 SIGABRT due to running out of memory for InnoDB locks

Closed

links to

Bug #82127 Deadlock with 3 concurrent DELETEs by UNIQUE key

lp:1598822 Deadlock with 3 concurrent DELETEs by unique key and triggers reading the same row

mentioned in: Page Loading...

(6 relates to, 2 links to, 1 mentioned in)

Activity

Ascending order - Click to sort in descending order

View 14 older comments

Vladislav Lesin added a comment - 2023-06-27 08:38 - edited

Can't repeat it with the following test:

--source include/have_innodb.inc

--source include/have_debug.inc

--source include/have_debug_sync.inc

--connect(dont_purge, localhost,root,,)

START TRANSACTION WITH CONSISTENT SNAPSHOT;

--connection default

# There are various scenarious in which a transaction already holds "half"

# of a record lock (for example, a lock on the record but not on the gap)

# and wishes to "upgrade it" to a full lock (i.e. on both gap and record).

# This is often a cause for a deadlock, if there is another transaction

# which is already waiting for the lock being blocked by us:

# 1. our granted lock for one half

# 2. her waiting lock for the same half

# 3. our waiting lock for the whole

# SCENARIO 1

# In this scenario, three different threads try to delete the same row,

# identified by a secondary index key.

# This kind of operation (besides LOCK_IX on a table) requires

# an LOCK_REC_NOT_GAP|LOCK_REC|LOCK_X lock on a secondary index

# 1. `deleter` is the first to get the required lock

# 2. `holder` enqueues a waiting lock

# 3. `waiter` enqueues right after `holder`

# 4. `deleter` commits, releasing the lock, and granting it to `holder`

# 5. `holder` now observes that the row was deleted, so it needs to

#    "seal the gap", by obtaining a LOCK_X|LOCK_REC, but..

# 6. this causes a deadlock between `holder` and `waiter`

CREATE TABLE `t`(

  `id` INT,

  `a` INT DEFAULT NULL,

  PRIMARY KEY(`id`),

  UNIQUE KEY `u`(`a`)

) ENGINE=InnoDB;

INSERT INTO t (`id`,`a`) VALUES

  (1,1),

  (2,9999),

  (3,10000);

--connect(deleter,localhost,root,,)

--connect(holder,localhost,root,,)

--connect(waiter,localhost,root,,)

--connection deleter

  SET DEBUG_SYNC =

    'lock_sec_rec_read_check_and_lock_has_locked

      SIGNAL deleter_has_locked

      WAIT_FOR waiter_has_locked';

  --send DELETE FROM t WHERE a = 9999

--connection holder

  SET DEBUG_SYNC=

    'now WAIT_FOR deleter_has_locked';

  SET DEBUG_SYNC=

    'lock_sec_rec_read_check_and_lock_has_locked SIGNAL holder_has_locked';

  --send DELETE FROM t WHERE a = 9999

--connection waiter

  SET DEBUG_SYNC=

    'now WAIT_FOR holder_has_locked';

  SET DEBUG_SYNC=

    'lock_sec_rec_read_check_and_lock_has_locked SIGNAL waiter_has_locked';

  --send DELETE FROM t WHERE a = 9999

--connection deleter

  --reap

--connection holder

  --reap

--connection waiter

  --reap

--connection default

--disconnect deleter

--disconnect holder

--disconnect waiter

--disconnect dont_purge

DROP TABLE `t`;

SET DEBUG_SYNC='reset';

and the following sync point:

diff --git a/storage/innobase/lock/lock0lock.cc b/storage/innobase/lock/lock0lock.cc

index 26388ad95e2..bc59d824ff2 100644

--- a/storage/innobase/lock/lock0lock.cc

+++ b/storage/innobase/lock/lock0lock.cc

@@ -5763,6 +5763,8 @@ lock_sec_rec_modify_check_and_lock(

        return(err);

+#include "scope.h"

 /*********************************************************************//**

 Like lock_clust_rec_read_check_and_lock(), but reads a

 secondary index record.

@@ -5791,6 +5793,10 @@ lock_sec_rec_read_check_and_lock(

        dberr_t err;

        ulint   heap_no;

+        SCOPE_EXIT([]() {

+          DEBUG_SYNC_C("lock_sec_rec_read_check_and_lock_has_locked");

+        });

        ut_ad(!dict_index_is_clust(index));

        ut_ad(!dict_index_is_online_ddl(index));

        ut_ad(block->frame == page_align(rec));

from

commit bfba840dfa7794b988c59c94658920dbe556075d

Author: Jakub Łopuszański <jakub.lopuszanski@oracle.com>

Date:   Tue Jun 11 12:36:53 2019 +0200

    Bug #23755664 DEADLOCK WITH 3 CONCURRENT DELETES BY UNIQUE KEY

on the latest 10.4, as well as with the sequence of steps described in https://bugs.mysql.com/bug.php?id=82127.

Have not understood yet why.

Vladislav Lesin added a comment - 2023-06-27 08:38 - edited Can't repeat it with the following test: --source include/have_innodb.inc --source include/have_debug.inc --source include/have_debug_sync.inc --connect(dont_purge, localhost,root,,) START TRANSACTION WITH CONSISTENT SNAPSHOT; --connection default # There are various scenarious in which a transaction already holds "half" # of a record lock ( for example, a lock on the record but not on the gap) # and wishes to "upgrade it" to a full lock (i.e. on both gap and record). # This is often a cause for a deadlock, if there is another transaction # which is already waiting for the lock being blocked by us: # 1 . our granted lock for one half # 2 . her waiting lock for the same half # 3 . our waiting lock for the whole # # SCENARIO 1 # # In this scenario, three different threads try to delete the same row, # identified by a secondary index key. # This kind of operation (besides LOCK_IX on a table) requires # an LOCK_REC_NOT_GAP|LOCK_REC|LOCK_X lock on a secondary index # 1 . `deleter` is the first to get the required lock # 2 . `holder` enqueues a waiting lock # 3 . `waiter` enqueues right after `holder` # 4 . `deleter` commits, releasing the lock, and granting it to `holder` # 5 . `holder` now observes that the row was deleted, so it needs to # "seal the gap" , by obtaining a LOCK_X|LOCK_REC, but.. # 6 . this causes a deadlock between `holder` and `waiter` CREATE TABLE `t`( `id` INT, `a` INT DEFAULT NULL, PRIMARY KEY(`id`), UNIQUE KEY `u`(`a`) ) ENGINE=InnoDB; INSERT INTO t (`id`,`a`) VALUES ( 1 , 1 ), ( 2 , 9999 ), ( 3 , 10000 ); --connect(deleter,localhost,root,,) --connect(holder,localhost,root,,) --connect(waiter,localhost,root,,) --connection deleter SET DEBUG_SYNC = 'lock_sec_rec_read_check_and_lock_has_locked SIGNAL deleter_has_locked WAIT_FOR waiter_has_locked'; --send DELETE FROM t WHERE a = 9999 --connection holder SET DEBUG_SYNC= 'now WAIT_FOR deleter_has_locked' ; SET DEBUG_SYNC= 'lock_sec_rec_read_check_and_lock_has_locked SIGNAL holder_has_locked' ; --send DELETE FROM t WHERE a = 9999 --connection waiter SET DEBUG_SYNC= 'now WAIT_FOR holder_has_locked' ; SET DEBUG_SYNC= 'lock_sec_rec_read_check_and_lock_has_locked SIGNAL waiter_has_locked' ; --send DELETE FROM t WHERE a = 9999 --connection deleter --reap --connection holder --reap --connection waiter --reap --connection default --disconnect deleter --disconnect holder --disconnect waiter --disconnect dont_purge DROP TABLE `t`; SET DEBUG_SYNC= 'reset' ; and the following sync point: diff --git a/storage/innobase/lock/lock0lock.cc b/storage/innobase/lock/lock0lock.cc index 26388ad95e2..bc59d824ff2 100644 --- a/storage/innobase/lock/lock0lock.cc +++ b/storage/innobase/lock/lock0lock.cc @@ - 5763 , 6 + 5763 , 8 @@ lock_sec_rec_modify_check_and_lock( return (err); } +#include "scope.h" + /*********************************************************************/ /** Like lock_clust_rec_read_check_and_lock(), but reads a secondary index record. @@ - 5791 , 6 + 5793 , 10 @@ lock_sec_rec_read_check_and_lock( dberr_t err; ulint heap_no; + SCOPE_EXIT([]() { + DEBUG_SYNC_C( "lock_sec_rec_read_check_and_lock_has_locked" ); + }); + ut_ad(!dict_index_is_clust(index)); ut_ad(!dict_index_is_online_ddl(index)); ut_ad(block->frame == page_align(rec)); from commit bfba840dfa7794b988c59c94658920dbe556075d Author: Jakub Łopuszański <jakub.lopuszanski@oracle.com> Date: Tue Jun 11 12:36:53 2019 +0200 Bug #23755664 DEADLOCK WITH 3 CONCURRENT DELETES BY UNIQUE KEY on the latest 10.4, as well as with the sequence of steps described in https://bugs.mysql.com/bug.php?id=82127 . Have not understood yet why.

Vladislav Lesin added a comment - 2023-06-27 12:42 - edited

The cause of why I can't reproduce it on 10.4.28 is ~~MDEV-30225~~ fix.

Vladislav Lesin added a comment - 2023-06-27 12:42 - edited The cause of why I can't reproduce it on 10.4.28 is MDEV-30225 fix.

Vladislav Lesin added a comment - 2023-06-27 17:53

~~MDEV-30225~~ does not fix the bug, but just hides it. If we take a look the test above, the 'holder' does not "seal the gap" after 'deleter' was committed because it was initially sealed, as after ~~MDEV-30225~~ fix the 'holder' initially requests next-key lock.

The following test from bfba840dfa7794b988c59c94658920dbe556075d mysql commit shows the issue:

# SCENARIO 2

# Here, we form a situation in which con1 has LOCK_REC_NOT_GAP on rows 1 and 2

# con2 waits for lock on row 1, and then con1 wants to upgrade the lock on row 1,

# which might cause a deadlock, unless con1 properly notices that even though the

# lock on row 1 can not be upgraded, a separate LOCK_GAP can be obtaied easily.

CREATE TABLE `t`(

  `id` INT NOT NULL PRIMARY KEY

) ENGINE=InnoDB;

INSERT INTO t (`id`) VALUES (1), (2);

--connect(holder,localhost,root,,)

--connect(waiter,localhost,root,,)

--connection holder

  BEGIN;

  SELECT id FROM t WHERE id=1 FOR UPDATE;

  SELECT id FROM t WHERE id=2 FOR UPDATE;

--connection waiter

  SET DEBUG_SYNC=

    'lock_wait_suspend_thread_enter SIGNAL waiter_will_wait';

  --send SELECT id FROM t WHERE id = 1 FOR UPDATE

--connection holder

  SET DEBUG_SYNC=

    'now WAIT_FOR waiter_will_wait';

  SELECT * FROM t FOR UPDATE;

  COMMIT;

--connection waiter

  --reap

--connection default

--disconnect holder

--disconnect waiter

DROP TABLE `t`;

Vladislav Lesin added a comment - 2023-06-27 17:53 MDEV-30225 does not fix the bug, but just hides it. If we take a look the test above, the 'holder' does not "seal the gap" after 'deleter' was committed because it was initially sealed, as after MDEV-30225 fix the 'holder' initially requests next-key lock. The following test from bfba840dfa7794b988c59c94658920dbe556075d mysql commit shows the issue: # SCENARIO 2 # # Here, we form a situation in which con1 has LOCK_REC_NOT_GAP on rows 1 and 2 # con2 waits for lock on row 1 , and then con1 wants to upgrade the lock on row 1 , # which might cause a deadlock, unless con1 properly notices that even though the # lock on row 1 can not be upgraded, a separate LOCK_GAP can be obtaied easily. CREATE TABLE `t`( `id` INT NOT NULL PRIMARY KEY ) ENGINE=InnoDB; INSERT INTO t (`id`) VALUES ( 1 ), ( 2 ); --connect(holder,localhost,root,,) --connect(waiter,localhost,root,,) --connection holder BEGIN; SELECT id FROM t WHERE id= 1 FOR UPDATE; SELECT id FROM t WHERE id= 2 FOR UPDATE; --connection waiter SET DEBUG_SYNC= 'lock_wait_suspend_thread_enter SIGNAL waiter_will_wait' ; --send SELECT id FROM t WHERE id = 1 FOR UPDATE --connection holder SET DEBUG_SYNC= 'now WAIT_FOR waiter_will_wait' ; SELECT * FROM t FOR UPDATE; COMMIT; --connection waiter --reap --connection default --disconnect holder --disconnect waiter DROP TABLE `t`;

Vladislav Lesin added a comment - 2023-06-30 11:06

This commit suffers from ~~MDEV-27992~~, see this comment for details.

Vladislav Lesin added a comment - 2023-06-30 11:06 This commit suffers from MDEV-27992 , see this comment for details.

Marko Mäkelä added a comment - 2023-07-05 14:23

Thank you, this looks good to me.

Marko Mäkelä added a comment - 2023-07-05 14:23 Thank you, this looks good to me.

MariaDB Server

Deadlock with 3 concurrent DELETEs by unique key

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration