[MDEV-29635] race on trx->lock.wait_lock in deadlock resolution - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.6
Fix Version/s: 10.6.11, 10.7.7, 10.8.6, 10.9.4, 10.10.2, 10.11.1
Component/s: Storage Engine - InnoDB
Labels:
None

Description

1. trx 1 does semi-consistent read for UPDATE, checks its was_chosen_as_deadlock_victim in lock_trx_handle_wait(), it's not set, and the execution goes to the trx->lock.wait_lock check.

2. trx 2 executes DELETE and does deadlock resolution, it sets victim->lock.was_chosen_as_deadlock_victim= true for trx 1 and executes lock_cancel_waiting_and_release(), which, in turns, calls lock_reset_lock_and_trx_wait() where trx->lock.wait_lock is reset for trx 2.

3. trx 1 loads trx->lock.wait_lock value, as it was reset by trx 1, the value is nullptr, and lock_trx_handle_wait() returns success. As it returns success, row_search_mvcc() tries to lock the next record and lock_rec_lock() aborts execution as the condition ut_ad(!(trx->lock.was_chosen_as_deadlock_victim & 1)) is not true;

The logic which is supposed to be is the following. Trx 1 created waiting lock and either already suspended in lock_wait() or is moving to it. Trx 2 does deadlock resolution, chooses trx 1 as a victim and releases and cancels waiting lock. Trx 1 checks trx->lock.was_chosen_as_deadlock_victim before or after suspending in lock_wait(), and rolls back itself. And there must not be any new locks created by Trx 1. That's why ~~MDEV-29081~~ made sense.

This logic does not work in the case of semi-consistent read. As in this case trx 1 checks lock state in lock_trx_handle_wait() after creating waiting lock and trying to read committed version of the record. And if lock_trx_handle_wait() returns "success", lock_wait() will not be executed, and trx->lock.was_chosen_as_deadlock_victim will not be checked, what allows trx 1 to continue execution and create new locks even if trx->lock.was_chosen_as_deadlock_victim was set by some other transaction after lock_trx_handle_wait() call.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

MDEV-29635.test.diff
3 kB
2022-10-06 12:40

Issue Links

relates to

MDEV-29711 INSERT fails to convert implicit to explicit lock as UPDATE owns explicit lock

Open

MDEV-29081 trx_t::lock.was_chosen_as_deadlock_victim race in lock_wait_end()

Closed

MDEV-29622 Wrong assertions in lock_cancel_waiting_and_release() for deadlock resolving caller

Closed

Activity

Ascending order - Click to sort in descending order

Vladislav Lesin added a comment - 2022-10-06 12:48

Test case is added: MDEV-29635.test.diff .

Note the test crashes not in the lock_rec_lock(), but in lock_trx_has_expl_x_lock(), like MDEV-29711(I even assume it's a duplicate). But it isn't so important, because the test shows the root reason of the bug, and repeating the crash in the same location, as it was found during our testing, would require additional effort. And we can't create test case, which would fail before the fix and be successful after the fix, as after the fix the debug sync points will be blocked.

Vladislav Lesin added a comment - 2022-10-06 12:48 Test case is added: MDEV-29635.test.diff . Note the test crashes not in the lock_rec_lock(), but in lock_trx_has_expl_x_lock(), like MDEV-29711 (I even assume it's a duplicate). But it isn't so important, because the test shows the root reason of the bug, and repeating the crash in the same location, as it was found during our testing, would require additional effort. And we can't create test case, which would fail before the fix and be successful after the fix, as after the fix the debug sync points will be blocked.

Vladislav Lesin added a comment - 2022-10-06 13:29 - edited

I think that the following optimization:

dberr_t lock_trx_handle_wait(trx_t *trx)

  DEBUG_SYNC_C("lock_trx_handle_wait_enter");

  if (trx->lock.was_chosen_as_deadlock_victim)

    return DB_DEADLOCK;

  if (!trx->lock.wait_lock)

    return DB_SUCCESS;

  dberr_t err= DB_SUCCESS;

  mysql_mutex_lock(&lock_sys.wait_mutex);

  if (trx->lock.was_chosen_as_deadlock_victim)

    err= DB_DEADLOCK;

  else if (lock_t *wait_lock= trx->lock.wait_lock)

    err= lock_sys_t::cancel<true>(trx, wait_lock);

  lock_sys.deadlock_check();

  mysql_mutex_unlock(&lock_sys.wait_mutex);

  return err;

for trx->lock.wait_lock is wrong.

Because even if trx->lock.was_chosen_as_deadlock_victim was not set before the first check, it can be set after the check, and trx->lock.wait_lock can be reset by another thread from lock_reset_lock_and_trx_wait() if the transaction was chosen as deadlock victim. In this case lock_trx_handle_wait() will return DB_SUCCESS even the transaction was marked as deadlock victim, and continue execution instead of rolling back.

We can leave the first check for trx->lock.was_chosen_as_deadlock_victim and remove the first check for trx->lock.wait_lock. In this case after lock_trx_handle_wait() is executed, the lock will be either acquired or cancelled, there will not be waiting lock for the transaction, thus the transaction will not be selected as deadlock victim after lock_trx_handle_wait() call and before the next waiting record lock is created.

Note the above claim is true for 10.6+, but needs to be checked for 10.3, and, if it's wrong, it can be the reason of MDEV-29711.

Vladislav Lesin added a comment - 2022-10-06 13:29 - edited I think that the following optimization: dberr_t lock_trx_handle_wait(trx_t *trx) { DEBUG_SYNC_C( "lock_trx_handle_wait_enter" ); if (trx->lock.was_chosen_as_deadlock_victim) return DB_DEADLOCK; if (!trx->lock.wait_lock) return DB_SUCCESS; dberr_t err= DB_SUCCESS; mysql_mutex_lock(&lock_sys.wait_mutex); if (trx->lock.was_chosen_as_deadlock_victim) err= DB_DEADLOCK; else if (lock_t *wait_lock= trx->lock.wait_lock) err= lock_sys_t::cancel< true >(trx, wait_lock); lock_sys.deadlock_check(); mysql_mutex_unlock(&lock_sys.wait_mutex); return err; } for trx->lock.wait_lock is wrong. Because even if trx->lock.was_chosen_as_deadlock_victim was not set before the first check, it can be set after the check, and trx->lock.wait_lock can be reset by another thread from lock_reset_lock_and_trx_wait() if the transaction was chosen as deadlock victim. In this case lock_trx_handle_wait() will return DB_SUCCESS even the transaction was marked as deadlock victim, and continue execution instead of rolling back. We can leave the first check for trx->lock.was_chosen_as_deadlock_victim and remove the first check for trx->lock.wait_lock. In this case after lock_trx_handle_wait() is executed, the lock will be either acquired or cancelled, there will not be waiting lock for the transaction, thus the transaction will not be selected as deadlock victim after lock_trx_handle_wait() call and before the next waiting record lock is created. Note the above claim is true for 10.6+, but needs to be checked for 10.3, and, if it's wrong, it can be the reason of MDEV-29711 .

Vladislav Lesin added a comment - 2022-10-06 13:35

BTW, the above test will even work for the fix, proposed above.

Vladislav Lesin added a comment - 2022-10-06 13:35 BTW, the above test will even work for the fix, proposed above.

MariaDB Server

race on trx->lock.wait_lock in deadlock resolution

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration