Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-10962

Deadlock with 3 concurrent DELETEs by unique key

Details

    Description

      As explained in upstream bug reports (that has all the details on the test case):

      http://bugs.mysql.com/bug.php?id=82127
      https://bugs.launchpad.net/percona-server/+bug/1598822

      there is a deadlock scenario with 3 concurrent DELETEs by UNIQUE key that can not be explained by the manual:

      CREATE TABLE `tu`(`id` int(11), `a` int(11) DEFAULT NULL, `b` varchar(10) DEFAULT NULL, `c` varchar(10) DEFAULT NULL, PRIMARY KEY(`id`), UNIQUE KEY `u`(`a`,`b`)) ENGINE=InnoDB DEFAULT CHARSET=latin1 STATS_PERSISTENT=0;
       
      insert into tu values(1,1,'a','a'),(2,9999,'xxxx','x'),(3,10000,'b','b'),(4,4,'c','c');
       
      mysqlslap -uroot --concurrency=3 --create-schema=test --no-drop --number-of-queries=1000 --query="delete from tu where a = 9999 and b = 'xxxx'"
      mysqlslap: Cannot run query delete from tu where a = 9999 and b = 'xxxx' ERROR : Deadlock found when trying to get lock; try restarting transaction
      

      Deadlock happens both with triggers mentioned in that bug reports and without them (just less often).

      The problem was originally noted by customer on MariaDB 5.5.24, but affects all released versions up to those based on InnoDB from 5.7.x for sure.

      As there is no visible progress on upstream bugs, I create this bug report for MariaDB to decide if there is anything to fix here or to document clearly in the knowledge base.

      Attachments

        Issue Links

          Activity

            vlad.lesin Vladislav Lesin added a comment - - edited

            Can't repeat it with the following test:

            --source include/have_innodb.inc
            --source include/have_debug.inc
            --source include/have_debug_sync.inc
             
            --connect(dont_purge, localhost,root,,)
            START TRANSACTION WITH CONSISTENT SNAPSHOT;
             
            --connection default
            # There are various scenarious in which a transaction already holds "half"
            # of a record lock (for example, a lock on the record but not on the gap)
            # and wishes to "upgrade it" to a full lock (i.e. on both gap and record).
            # This is often a cause for a deadlock, if there is another transaction
            # which is already waiting for the lock being blocked by us:
            # 1. our granted lock for one half
            # 2. her waiting lock for the same half
            # 3. our waiting lock for the whole
             
            #
            # SCENARIO 1
            #
            # In this scenario, three different threads try to delete the same row,
            # identified by a secondary index key.
            # This kind of operation (besides LOCK_IX on a table) requires
            # an LOCK_REC_NOT_GAP|LOCK_REC|LOCK_X lock on a secondary index
            # 1. `deleter` is the first to get the required lock
            # 2. `holder` enqueues a waiting lock
            # 3. `waiter` enqueues right after `holder`
            # 4. `deleter` commits, releasing the lock, and granting it to `holder`
            # 5. `holder` now observes that the row was deleted, so it needs to
            #    "seal the gap", by obtaining a LOCK_X|LOCK_REC, but..
            # 6. this causes a deadlock between `holder` and `waiter`
             
            CREATE TABLE `t`(
              `id` INT,
              `a` INT DEFAULT NULL,
              PRIMARY KEY(`id`),
              UNIQUE KEY `u`(`a`)
            ) ENGINE=InnoDB;
             
            INSERT INTO t (`id`,`a`) VALUES
              (1,1),
              (2,9999),
              (3,10000);
             
            --connect(deleter,localhost,root,,)
            --connect(holder,localhost,root,,)
            --connect(waiter,localhost,root,,)
             
             
            --connection deleter
              SET DEBUG_SYNC =
                'lock_sec_rec_read_check_and_lock_has_locked
                  SIGNAL deleter_has_locked
                  WAIT_FOR waiter_has_locked';
              --send DELETE FROM t WHERE a = 9999
             
            --connection holder
              SET DEBUG_SYNC=
                'now WAIT_FOR deleter_has_locked';
              SET DEBUG_SYNC=
                'lock_sec_rec_read_check_and_lock_has_locked SIGNAL holder_has_locked';
              --send DELETE FROM t WHERE a = 9999
             
            --connection waiter
              SET DEBUG_SYNC=
                'now WAIT_FOR holder_has_locked';
              SET DEBUG_SYNC=
                'lock_sec_rec_read_check_and_lock_has_locked SIGNAL waiter_has_locked';
              --send DELETE FROM t WHERE a = 9999
             
            --connection deleter
              --reap
             
            --connection holder
              --reap
             
            --connection waiter
              --reap
             
            --connection default
             
            --disconnect deleter
            --disconnect holder
            --disconnect waiter
            --disconnect dont_purge
             
             
            DROP TABLE `t`;
            SET DEBUG_SYNC='reset';
            

            and the following sync point:

            diff --git a/storage/innobase/lock/lock0lock.cc b/storage/innobase/lock/lock0lock.cc
            index 26388ad95e2..bc59d824ff2 100644
            --- a/storage/innobase/lock/lock0lock.cc
            +++ b/storage/innobase/lock/lock0lock.cc
            @@ -5763,6 +5763,8 @@ lock_sec_rec_modify_check_and_lock(
                    return(err);
             }
             
            +#include "scope.h"
            +
             /*********************************************************************//**
             Like lock_clust_rec_read_check_and_lock(), but reads a
             secondary index record.
            @@ -5791,6 +5793,10 @@ lock_sec_rec_read_check_and_lock(
                    dberr_t err;
                    ulint   heap_no;
             
            +        SCOPE_EXIT([]() {
            +          DEBUG_SYNC_C("lock_sec_rec_read_check_and_lock_has_locked");
            +        });
            +
                    ut_ad(!dict_index_is_clust(index));
                    ut_ad(!dict_index_is_online_ddl(index));
                    ut_ad(block->frame == page_align(rec));
            
            

            from

            commit bfba840dfa7794b988c59c94658920dbe556075d
            Author: Jakub Łopuszański <jakub.lopuszanski@oracle.com>
            Date:   Tue Jun 11 12:36:53 2019 +0200
             
                Bug #23755664 DEADLOCK WITH 3 CONCURRENT DELETES BY UNIQUE KEY
            

            on the latest 10.4, as well as with the sequence of steps described in https://bugs.mysql.com/bug.php?id=82127.

            Have not understood yet why.

            vlad.lesin Vladislav Lesin added a comment - - edited Can't repeat it with the following test: --source include/have_innodb.inc --source include/have_debug.inc --source include/have_debug_sync.inc   --connect(dont_purge, localhost,root,,) START TRANSACTION WITH CONSISTENT SNAPSHOT;   --connection default # There are various scenarious in which a transaction already holds "half" # of a record lock ( for example, a lock on the record but not on the gap) # and wishes to "upgrade it" to a full lock (i.e. on both gap and record). # This is often a cause for a deadlock, if there is another transaction # which is already waiting for the lock being blocked by us: # 1 . our granted lock for one half # 2 . her waiting lock for the same half # 3 . our waiting lock for the whole   # # SCENARIO 1 # # In this scenario, three different threads try to delete the same row, # identified by a secondary index key. # This kind of operation (besides LOCK_IX on a table) requires # an LOCK_REC_NOT_GAP|LOCK_REC|LOCK_X lock on a secondary index # 1 . `deleter` is the first to get the required lock # 2 . `holder` enqueues a waiting lock # 3 . `waiter` enqueues right after `holder` # 4 . `deleter` commits, releasing the lock, and granting it to `holder` # 5 . `holder` now observes that the row was deleted, so it needs to # "seal the gap" , by obtaining a LOCK_X|LOCK_REC, but.. # 6 . this causes a deadlock between `holder` and `waiter`   CREATE TABLE `t`( `id` INT, `a` INT DEFAULT NULL, PRIMARY KEY(`id`), UNIQUE KEY `u`(`a`) ) ENGINE=InnoDB;   INSERT INTO t (`id`,`a`) VALUES ( 1 , 1 ), ( 2 , 9999 ), ( 3 , 10000 );   --connect(deleter,localhost,root,,) --connect(holder,localhost,root,,) --connect(waiter,localhost,root,,)     --connection deleter SET DEBUG_SYNC = 'lock_sec_rec_read_check_and_lock_has_locked SIGNAL deleter_has_locked WAIT_FOR waiter_has_locked'; --send DELETE FROM t WHERE a = 9999   --connection holder SET DEBUG_SYNC= 'now WAIT_FOR deleter_has_locked' ; SET DEBUG_SYNC= 'lock_sec_rec_read_check_and_lock_has_locked SIGNAL holder_has_locked' ; --send DELETE FROM t WHERE a = 9999   --connection waiter SET DEBUG_SYNC= 'now WAIT_FOR holder_has_locked' ; SET DEBUG_SYNC= 'lock_sec_rec_read_check_and_lock_has_locked SIGNAL waiter_has_locked' ; --send DELETE FROM t WHERE a = 9999   --connection deleter --reap   --connection holder --reap   --connection waiter --reap   --connection default   --disconnect deleter --disconnect holder --disconnect waiter --disconnect dont_purge     DROP TABLE `t`; SET DEBUG_SYNC= 'reset' ; and the following sync point: diff --git a/storage/innobase/lock/lock0lock.cc b/storage/innobase/lock/lock0lock.cc index 26388ad95e2..bc59d824ff2 100644 --- a/storage/innobase/lock/lock0lock.cc +++ b/storage/innobase/lock/lock0lock.cc @@ - 5763 , 6 + 5763 , 8 @@ lock_sec_rec_modify_check_and_lock( return (err); } +#include "scope.h" + /*********************************************************************/ /** Like lock_clust_rec_read_check_and_lock(), but reads a secondary index record. @@ - 5791 , 6 + 5793 , 10 @@ lock_sec_rec_read_check_and_lock( dberr_t err; ulint heap_no; + SCOPE_EXIT([]() { + DEBUG_SYNC_C( "lock_sec_rec_read_check_and_lock_has_locked" ); + }); + ut_ad(!dict_index_is_clust(index)); ut_ad(!dict_index_is_online_ddl(index)); ut_ad(block->frame == page_align(rec)); from commit bfba840dfa7794b988c59c94658920dbe556075d Author: Jakub Łopuszański <jakub.lopuszanski@oracle.com> Date: Tue Jun 11 12:36:53 2019 +0200   Bug #23755664 DEADLOCK WITH 3 CONCURRENT DELETES BY UNIQUE KEY on the latest 10.4, as well as with the sequence of steps described in https://bugs.mysql.com/bug.php?id=82127 . Have not understood yet why.
            vlad.lesin Vladislav Lesin added a comment - - edited

            The cause of why I can't reproduce it on 10.4.28 is MDEV-30225 fix.

            vlad.lesin Vladislav Lesin added a comment - - edited The cause of why I can't reproduce it on 10.4.28 is MDEV-30225 fix.

            MDEV-30225 does not fix the bug, but just hides it. If we take a look the test above, the 'holder' does not "seal the gap" after 'deleter' was committed because it was initially sealed, as after MDEV-30225 fix the 'holder' initially requests next-key lock.

            The following test from bfba840dfa7794b988c59c94658920dbe556075d mysql commit shows the issue:

            # SCENARIO 2
            #
            # Here, we form a situation in which con1 has LOCK_REC_NOT_GAP on rows 1 and 2
            # con2 waits for lock on row 1, and then con1 wants to upgrade the lock on row 1,
            # which might cause a deadlock, unless con1 properly notices that even though the
            # lock on row 1 can not be upgraded, a separate LOCK_GAP can be obtaied easily.
             
            CREATE TABLE `t`(
              `id` INT NOT NULL PRIMARY KEY
            ) ENGINE=InnoDB;
             
            INSERT INTO t (`id`) VALUES (1), (2);
             
            --connect(holder,localhost,root,,)
            --connect(waiter,localhost,root,,)
             
            --connection holder
              BEGIN;
              SELECT id FROM t WHERE id=1 FOR UPDATE;
              SELECT id FROM t WHERE id=2 FOR UPDATE;
             
            --connection waiter
              SET DEBUG_SYNC=
                'lock_wait_suspend_thread_enter SIGNAL waiter_will_wait';
              --send SELECT id FROM t WHERE id = 1 FOR UPDATE
             
            --connection holder
              SET DEBUG_SYNC=
                'now WAIT_FOR waiter_will_wait';
              SELECT * FROM t FOR UPDATE;
              COMMIT;
             
            --connection waiter
              --reap
             
            --connection default
             
            --disconnect holder
            --disconnect waiter
             
            DROP TABLE `t`;
            

            vlad.lesin Vladislav Lesin added a comment - MDEV-30225 does not fix the bug, but just hides it. If we take a look the test above, the 'holder' does not "seal the gap" after 'deleter' was committed because it was initially sealed, as after MDEV-30225 fix the 'holder' initially requests next-key lock. The following test from bfba840dfa7794b988c59c94658920dbe556075d mysql commit shows the issue: # SCENARIO 2 # # Here, we form a situation in which con1 has LOCK_REC_NOT_GAP on rows 1 and 2 # con2 waits for lock on row 1 , and then con1 wants to upgrade the lock on row 1 , # which might cause a deadlock, unless con1 properly notices that even though the # lock on row 1 can not be upgraded, a separate LOCK_GAP can be obtaied easily.   CREATE TABLE `t`( `id` INT NOT NULL PRIMARY KEY ) ENGINE=InnoDB;   INSERT INTO t (`id`) VALUES ( 1 ), ( 2 );   --connect(holder,localhost,root,,) --connect(waiter,localhost,root,,)   --connection holder BEGIN; SELECT id FROM t WHERE id= 1 FOR UPDATE; SELECT id FROM t WHERE id= 2 FOR UPDATE;   --connection waiter SET DEBUG_SYNC= 'lock_wait_suspend_thread_enter SIGNAL waiter_will_wait' ; --send SELECT id FROM t WHERE id = 1 FOR UPDATE   --connection holder SET DEBUG_SYNC= 'now WAIT_FOR waiter_will_wait' ; SELECT * FROM t FOR UPDATE; COMMIT;   --connection waiter --reap   --connection default   --disconnect holder --disconnect waiter   DROP TABLE `t`;

            This commit suffers from MDEV-27992, see this comment for details.

            vlad.lesin Vladislav Lesin added a comment - This commit suffers from MDEV-27992 , see this comment for details.

            Thank you, this looks good to me.

            marko Marko Mäkelä added a comment - Thank you, this looks good to me.

            People

              vlad.lesin Vladislav Lesin
              valerii Valerii Kravchuk
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.