Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
10.6
-
None
Description
The scenario is the following:
1. trx 1 does semi-consistent read for UPDATE, creates waiting lock.
2. trx 2 executes UPDATE, does deadlock check in lock_wait(), sets trx->lock.was_chosen_as_deadlock_victim for trx 1 in Deadlock::report() just before lock_cancel_waiting_and_release() call.
3. trx 1 checks trx->lock.was_chosen_as_deadlock_victim in lock_trx_handle_wait(), and as it's set, does rollback from row_mysql_handle_errors(). trx_commit_or_rollback_prepare() zeroes out trx->lock.wait_thr.
4. trx 2 executes lock_cancel_waiting_and_release(). lock_wait_end() aborts on checking trx->lock.wait_thr.
The comment of trx->lock.wait_thr says:
que_thr_t* wait_thr; /*!< query thread belonging to this
|
trx that is in waiting
|
state. For threads suspended in a
|
lock wait, this is protected by
|
lock_sys.latch. Otherwise, this may
|
only be modified by the thread that is
|
serving the running transaction. */
|
And lock_wait() acquires lock_sys.wait_mutex before deadlock check:
mysql_mutex_lock(&lock_sys.wait_mutex);
|
if (trx->lock.wait_lock) |
{
|
if (Deadlock::check_and_resolve(trx)) |
{
|
ut_ad(!trx->lock.wait_lock);
|
trx->error_state= DB_DEADLOCK;
|
goto end_wait; |
}
|
}
|
But lock_sys.wait_mutex is not acquired in the following call stack:
0x00005562572076ca in trx_commit_or_rollback_prepare (trx=0x4a6d5ba30040)
|
at ./storage/innobase/trx/trx0trx.cc:1507
|
1507 trx->lock.wait_thr = NULL;
|
(rr) bt
|
#0 0x00005562572076ca in trx_commit_or_rollback_prepare (trx=0x4a6d5ba30040)
|
at ./storage/innobase/trx/trx0trx.cc:1507
|
#1 0x00005562571e7c38 in trx_rollback_step (thr=0x6160085ce218) at ./storage/innobase/trx/trx0roll.cc:915
|
#2 0x0000556256ffe0d9 in que_thr_step (thr=0x6160085ce218) at ./storage/innobase/que/que0que.cc:659
|
#3 0x0000556256ffe430 in que_run_threads_low (thr=0x6160085ce218) at ./storage/innobase/que/que0que.cc:709
|
#4 0x0000556256ffe5d2 in que_run_threads (thr=0x6160085ce218) at ./storage/innobase/que/que0que.cc:729
|
#5 0x00005562571e9102 in trx_t::rollback_low (this=0x4a6d5ba30040, savept=0x0)
|
at ./storage/innobase/trx/trx0roll.cc:124
|
#6 0x00005562571e3593 in trx_t::rollback (this=0x4a6d5ba30040, savept=0x0)
|
at ./storage/innobase/trx/trx0roll.cc:176
|
#7 0x00005562570c0f1b in row_mysql_handle_errors (new_err=0x640000aae430, trx=0x4a6d5ba30040, thr=0x620000241848, savept=0x0)
|
at ./storage/innobase/row/row0mysql.cc:696
|
Theoretically, trx_t::mutex could protect trx->lock.wait_thr from racing. It's acquired in trx_rollback_step() before trx_commit_or_rollback_prepare() call, and it's also acquired in lock_cancel_waiting_and_release(). But there is some room between victim->lock.was_chosen_as_deadlock_victim= true assigning and trx->mutex_lock() in lock_cancel_waiting_and_release() call in Deadlock::report(), and this gives the ability to rollback the transaction in parallel thread before transaction mutex is acquired in lock_cancel_waiting_and_release().
By the same reason acquiring lock_sys.mutex in trx_commit_or_rollback_prepare() before trx->lock.wait_thr zeroing out can't solve the issue.
Attachments
Issue Links
- causes
-
MDEV-29869 mtr failure: innodb.deadlock_wait_thr_race
- Closed
- relates to
-
MDEV-29635 race on trx->lock.wait_lock in deadlock resolution
- Closed
-
MDEV-29860 Simplify transaction rollback initiation on termination from the other thread
- Closed