Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
10.6
-
None
Description
1. trx 1 does semi-consistent read for UPDATE, checks its was_chosen_as_deadlock_victim in lock_trx_handle_wait(), it's not set, and the execution goes to the trx->lock.wait_lock check.
2. trx 2 executes DELETE and does deadlock resolution, it sets victim->lock.was_chosen_as_deadlock_victim= true for trx 1 and executes lock_cancel_waiting_and_release(), which, in turns, calls lock_reset_lock_and_trx_wait() where trx->lock.wait_lock is reset for trx 2.
3. trx 1 loads trx->lock.wait_lock value, as it was reset by trx 1, the value is nullptr, and lock_trx_handle_wait() returns success. As it returns success, row_search_mvcc() tries to lock the next record and lock_rec_lock() aborts execution as the condition ut_ad(!(trx->lock.was_chosen_as_deadlock_victim & 1)) is not true;
The logic which is supposed to be is the following. Trx 1 created waiting lock and either already suspended in lock_wait() or is moving to it. Trx 2 does deadlock resolution, chooses trx 1 as a victim and releases and cancels waiting lock. Trx 1 checks trx->lock.was_chosen_as_deadlock_victim before or after suspending in lock_wait(), and rolls back itself. And there must not be any new locks created by Trx 1. That's why MDEV-29081 made sense.
This logic does not work in the case of semi-consistent read. As in this case trx 1 checks lock state in lock_trx_handle_wait() after creating waiting lock and trying to read committed version of the record. And if lock_trx_handle_wait() returns "success", lock_wait() will not be executed, and trx->lock.was_chosen_as_deadlock_victim will not be checked, what allows trx 1 to continue execution and create new locks even if trx->lock.was_chosen_as_deadlock_victim was set by some other transaction after lock_trx_handle_wait() call.
Attachments
Issue Links
- relates to
-
MDEV-29711 INSERT fails to convert implicit to explicit lock as UPDATE owns explicit lock
-
- Open
-
-
MDEV-29081 trx_t::lock.was_chosen_as_deadlock_victim race in lock_wait_end()
-
- Closed
-
-
MDEV-29622 Wrong assertions in lock_cancel_waiting_and_release() for deadlock resolving caller
-
- Closed
-
Test case is added: MDEV-29635.test.diff
.
Note the test crashes not in the lock_rec_lock(), but in lock_trx_has_expl_x_lock(), like MDEV-29711(I even assume it's a duplicate). But it isn't so important, because the test shows the root reason of the bug, and repeating the crash in the same location, as it was found during our testing, would require additional effort. And we can't create test case, which would fail before the fix and be successful after the fix, as after the fix the debug sync points will be blocked.