The scenario is the following:
1. trx 1 does semi-consistent read for UPDATE, creates waiting lock.
2. trx 2 executes UPDATE, does deadlock check in lock_wait(), sets trx->lock.was_chosen_as_deadlock_victim for trx 1 in Deadlock::report() just before lock_cancel_waiting_and_release() call.
3. trx 1 checks trx->lock.was_chosen_as_deadlock_victim in lock_trx_handle_wait(), and as it's set, does rollback from row_mysql_handle_errors(). trx_commit_or_rollback_prepare() zeroes out trx->lock.wait_thr.
4. trx 2 executes lock_cancel_waiting_and_release(). lock_wait_end() aborts on checking trx->lock.wait_thr.
The comment of trx->lock.wait_thr says:
And lock_wait() acquires lock_sys.wait_mutex before deadlock check:
But lock_sys.wait_mutex is not acquired in the following call stack:
Theoretically, trx_t::mutex could protect trx->lock.wait_thr from racing. It's acquired in trx_rollback_step() before trx_commit_or_rollback_prepare() call, and it's also acquired in lock_cancel_waiting_and_release(). But there is some room between victim->lock.was_chosen_as_deadlock_victim= true assigning and trx->mutex_lock() in lock_cancel_waiting_and_release() call in Deadlock::report(), and this gives the ability to rollback the transaction in parallel thread before transaction mutex is acquired in lock_cancel_waiting_and_release().
By the same reason acquiring lock_sys.mutex in trx_commit_or_rollback_prepare() before trx->lock.wait_thr zeroing out can't solve the issue.