[MDEV-27025] insert-intention lock conflicts with waiting ORDINARY lock - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.2(EOL), 10.3(EOL), 10.4(EOL), 10.5, 10.6, 10.7(EOL), 10.8(EOL)
Fix Version/s: 10.5.14, 10.6.6, 10.7.2, 10.8.1, 10.3.35, 10.4.25, 10.5.16, 10.6.8, 10.7.4, 10.8.3
Component/s: Storage Engine - InnoDB
Labels:
None

Description

We have two transactions and one record. The first transaction holds ORDINARY S-lock on the record, the second transaction created waiting ORDINARY X-lock and waits for the first transaction. Then the first transaction requests insert-intention lock on the record. And this lock conflicts with the waiting ORDINARY X-lock of the second transaction. What causes deadlock. Why it should conflict? The first transaction already holds lock on the record. And the second's transaction lock contains "waiting" flag.

Let's take a look 10.6 code:

dberr_t

lock_rec_insert_check_and_lock(...)

...

      const unsigned type_mode= LOCK_X | LOCK_GAP | LOCK_INSERT_INTENTION;

      if (lock_t *c_lock= lock_rec_other_has_conflicting(type_mode,

                                                         g.cell(), id,

                                                         heap_no, trx))

        trx->mutex_lock();

        err= lock_rec_enqueue_waiting(c_lock, type_mode, id, block->frame,

                                      heap_no, index, thr, nullptr);

        trx->mutex_unlock();

...

Neither lock_rec_insert_check_and_lock() nor lock_rec_other_has_conflicting() doesn't check conflicting lock is in waiting state and it already waits for the requesting insert-intention lock transaction.

The test is attached: ii-conflicts-waiting.test

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

ii-conflicts-waiting.test
2021-11-11 10:38
2 kB
Vladislav Lesin

Issue Links

blocks

MDEV-10962 Deadlock with 3 concurrent DELETEs by unique key

Closed

causes

MDEV-27992 DELETE fails to delete record after blocking is released

Closed

relates to

MDEV-20605 Awaken transaction can miss inserted by other transaction records due to wrong persistent cursor restoration

Closed

MDEV-24738 Improve the InnoDB deadlock checker

Closed

MDEV-27550 The test galera.MW-328D no longer reproduces a deadlock

Closed

MDEV-27922 INSERT fails to return an error after transaction abort

Closed

MDEV-34877 Port "Bug #11745929 Change lock priority so that the transaction holding S-lock gets X-lock first" fix from MySQL to MariaDB

Closed

SAMU-292 Loading...

mentioned in: Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...

(3 relates to, 33 mentioned in)

Activity

Ascending order - Click to sort in descending order

View 19 older comments

Michael Widenius added a comment - 2024-08-30 11:14

Note that in MariaDB 10.6 we have now optimized bit operations in my_bitmap.cc
There is no looping over bits anymore and all operations are done on 64 bits at a time.

Michael Widenius added a comment - 2024-08-30 11:14 Note that in MariaDB 10.6 we have now optimized bit operations in my_bitmap.cc There is no looping over bits anymore and all operations are done on 64 bits at a time.

Vladislav Lesin added a comment - 2024-09-05 10:09

I filed ~~MDEV-34877~~ for porting MySQL's "Bug #11745929".

Vladislav Lesin added a comment - 2024-09-05 10:09 I filed MDEV-34877 for porting MySQL's "Bug #11745929".

Marko Mäkelä added a comment - 2025-03-31 05:23

In order to run the test ii-conflicts-waiting.test on MariaDB Server 10.6 or later, the following adjustment is needed.

@@ -18,7 +18,7 @@

 --connect(con_del,localhost,root,,)

 SET DEBUG_SYNC = 'now WAIT_FOR ins_set_locks';

-SET DEBUG_SYNC = 'lock_wait_suspend_thread_enter SIGNAL del_locked';

+SET DEBUG_SYNC = 'lock_wait_start SIGNAL del_locked';

 ###############################################################################

 # This DELETE creates waiting ORDINARY X-lock for heap_no 2 as the record is

 # delete-marked, this lock conflicts with ORDINARY S-lock set by the the last

vlad.lesin, it seems that ~~MDEV-34877~~ is not fixing the scenario of this test:

11.4 30140c066d50f7e4ac4f490a9e081d9d605aea07
mysqltest: At line 43: query 'reap' failed: ER_LOCK_DEADLOCK (1213): Deadlock found when trying to get lock; try restarting transaction

I tested this with both values of innodb_snapshot_isolation (~~MDEV-35124~~) and got the same result. Based on my reading of the analysis in ~~MDEV-27992~~, this setting should make no difference in that scenario, but I did not check that by re-applying and retesting the original fix of ~~MDEV-27025~~.

The regression ~~MDEV-27992~~, which forced us to revert the original fix, involves a scenario where some PRIMARY KEY columns are being updated, or more generally, if the same transaction is first deleting and then inserting rows in the same table. If the problems are only limited to only such scenarios, I wonder if we could treat those as a special case and enable the optimization in other cases. At the core of the ~~MDEV-27992~~ fix is the added parameter bool insert_before_waiting, which is being set in the calls of lock_rec_add_to_queue() in lock_rec_convert_impl_to_expl_for_trx() and when lock_rec_other_has_conflicting() sets was_ignored in lock_rec_lock().

Marko Mäkelä added a comment - 2025-03-31 05:23 In order to run the test ii-conflicts-waiting.test on MariaDB Server 10.6 or later, the following adjustment is needed. @@ -18,7 +18,7 @@ --connect(con_del,localhost,root,,) SET DEBUG_SYNC = 'now WAIT_FOR ins_set_locks'; -SET DEBUG_SYNC = 'lock_wait_suspend_thread_enter SIGNAL del_locked'; +SET DEBUG_SYNC = 'lock_wait_start SIGNAL del_locked'; ############################################################################### # This DELETE creates waiting ORDINARY X-lock for heap_no 2 as the record is # delete-marked, this lock conflicts with ORDINARY S-lock set by the the last vlad.lesin , it seems that MDEV-34877 is not fixing the scenario of this test: 11.4 30140c066d50f7e4ac4f490a9e081d9d605aea07 mysqltest: At line 43: query 'reap' failed: ER_LOCK_DEADLOCK (1213): Deadlock found when trying to get lock; try restarting transaction I tested this with both values of innodb_snapshot_isolation ( MDEV-35124 ) and got the same result. Based on my reading of the analysis in MDEV-27992 , this setting should make no difference in that scenario, but I did not check that by re-applying and retesting the original fix of MDEV-27025 . The regression MDEV-27992 , which forced us to revert the original fix, involves a scenario where some PRIMARY KEY columns are being updated, or more generally, if the same transaction is first deleting and then inserting rows in the same table. If the problems are only limited to only such scenarios, I wonder if we could treat those as a special case and enable the optimization in other cases. At the core of the MDEV-27992 fix is the added parameter bool insert_before_waiting , which is being set in the calls of lock_rec_add_to_queue() in lock_rec_convert_impl_to_expl_for_trx() and when lock_rec_other_has_conflicting() sets was_ignored in lock_rec_lock() .

Vladislav Lesin added a comment - 2025-04-14 11:24

alessandro.vetere, monty
> and it was later declared that ~~MDEV-27025~~ was not a bug to fix, but a feature request.

This part is wrong. ~~MDEV-27025~~ is not a feature request. The code, which causes ~~MDEV-27025~~ is not a bug, it's a feature, so ~~MDEV-27025~~ is not a bug. I would change the status from "closed" to "not a bug".

~~MDEV-34877~~ must not fix ~~MDEV-27025~~. Both the original Bug #34123159 commit and my version of the fix, contain notes about ii-locks in the commit messages. My note repeats what was already said in this comment:

    MySQL's commit contains the following explanation of why insert-intention

    locks must not overtake a waiting ordinary or gap locks:

    "It is important that this decission rule doesn't allow

    INSERT_INTENTION locks to overtake WAITING locks on gaps (`S`, `S|GAP`,

    `X`, `X|GAP`), as inserting a record into a gap would split such WAITING

    lock, violating the invariant that each transaction can have at most

    single WAITING lock at any time."

    I would add to the explanation the following. Suppose we have trx 1 which

    holds ordinary X-lock on some record. And trx 2 executes "DELETE FROM t"

    or "SELECT * FOR UPDATE" in RR(see lock_delete_updated.test and

    MDEV-27992), i.e. it creates waiting ordinary X-lock on the same record.

    And then trx 1 wants to insert some record just before the locked record.

    It requests insert-intention lock, and if the lock overtakes trx 2 lock,

    there will be phantom records for trx 2 in RR. lock_delete_updated.test

    shows how "DELETE" allows to insert some records in already scanned gap

    and misses some records to delete.

marko,
> The regression ~~MDEV-27992~~, which forced us to revert the original fix, involves a scenario where some PRIMARY KEY columns are being updated, or more generally, if the same transaction is first deleting and then inserting rows in the same table. If the problems are only limited to only such scenarios

I don't have another tests which would show the issues caused by the original ~~MDEV-27025~~ fix. But, according to the code analyses, it doesn't relate to deletion at all, in the test of ~~MDEV-27992~~ cursor restoring happens on the stage when the DELETE looks for the suitable record in the primary index, i.e., the same could happen with another scenarios, the simplest one is "SELECT * FROM t FOR UPDATE".

I would generalize the cases with the following: to avoid phantom records we must not allow to insert records in the ranges, which where already scanned by another active transactions even if the transactions are blocked by some record-lock. This is implemented as forbidding ii-locks to overtake conflicting waiting locks.

Vladislav Lesin added a comment - 2025-04-14 11:24 alessandro.vetere , monty > and it was later declared that MDEV-27025 was not a bug to fix, but a feature request. This part is wrong. MDEV-27025 is not a feature request. The code, which causes MDEV-27025 is not a bug, it's a feature, so MDEV-27025 is not a bug. I would change the status from "closed" to "not a bug". MDEV-34877 must not fix MDEV-27025 . Both the original Bug #34123159 commit and my version of the fix, contain notes about ii-locks in the commit messages. My note repeats what was already said in this comment: MySQL's commit contains the following explanation of why insert-intention locks must not overtake a waiting ordinary or gap locks: "It is important that this decission rule doesn't allow INSERT_INTENTION locks to overtake WAITING locks on gaps (`S`, `S|GAP`, `X`, `X|GAP`), as inserting a record into a gap would split such WAITING lock, violating the invariant that each transaction can have at most single WAITING lock at any time." I would add to the explanation the following. Suppose we have trx 1 which holds ordinary X-lock on some record. And trx 2 executes "DELETE FROM t" or "SELECT * FOR UPDATE" in RR(see lock_delete_updated.test and MDEV-27992), i.e. it creates waiting ordinary X-lock on the same record. And then trx 1 wants to insert some record just before the locked record. It requests insert-intention lock, and if the lock overtakes trx 2 lock, there will be phantom records for trx 2 in RR. lock_delete_updated.test shows how "DELETE" allows to insert some records in already scanned gap and misses some records to delete. marko , > The regression MDEV-27992 , which forced us to revert the original fix, involves a scenario where some PRIMARY KEY columns are being updated, or more generally, if the same transaction is first deleting and then inserting rows in the same table. If the problems are only limited to only such scenarios I don't have another tests which would show the issues caused by the original MDEV-27025 fix. But, according to the code analyses, it doesn't relate to deletion at all, in the test of MDEV-27992 cursor restoring happens on the stage when the DELETE looks for the suitable record in the primary index, i.e., the same could happen with another scenarios, the simplest one is "SELECT * FROM t FOR UPDATE". I would generalize the cases with the following: to avoid phantom records we must not allow to insert records in the ranges, which where already scanned by another active transactions even if the transactions are blocked by some record-lock. This is implemented as forbidding ii-locks to overtake conflicting waiting locks.

Marko Mäkelä added a comment - 2025-04-14 12:48

vlad.lesin, thank you for your clarification.

Marko Mäkelä added a comment - 2025-04-14 12:48 vlad.lesin , thank you for your clarification.

People

Assignee:: Vladislav Lesin

Reporter:: Vladislav Lesin

Votes:: 1 Vote for this issue

Watchers:: 12 Start watching this issue

Dates

Created:: 2021-11-11 10:38

Updated:: 1 week ago 07:42

Resolved:: 2024-09-05 10:09

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.