[MDEV-29622] Wrong assertions in lock_cancel_waiting_and_release() for deadlock resolving caller - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.6
Fix Version/s: 10.6.11, 10.7.7, 10.8.6, 10.9.4, 10.10.2, 10.11.1
Component/s: Storage Engine - InnoDB
Labels:
None

Description

The scenario is the following:

1. trx 1 does semi-consistent read for UPDATE, creates waiting lock.

2. trx 2 executes UPDATE, does deadlock check in lock_wait(), sets trx->lock.was_chosen_as_deadlock_victim for trx 1 in Deadlock::report() just before lock_cancel_waiting_and_release() call.

3. trx 1 checks trx->lock.was_chosen_as_deadlock_victim in lock_trx_handle_wait(), and as it's set, does rollback from row_mysql_handle_errors(). trx_commit_or_rollback_prepare() zeroes out trx->lock.wait_thr.

4. trx 2 executes lock_cancel_waiting_and_release(). lock_wait_end() aborts on checking trx->lock.wait_thr.

The comment of trx->lock.wait_thr says:

        que_thr_t*      wait_thr;       /*!< query thread belonging to this

                                        trx that is in waiting

                                        state. For threads suspended in a

                                        lock wait, this is protected by

                                        lock_sys.latch. Otherwise, this may

                                        only be modified by the thread that is

                                        serving the running transaction. */

And lock_wait() acquires lock_sys.wait_mutex before deadlock check:

  mysql_mutex_lock(&lock_sys.wait_mutex);

  if (trx->lock.wait_lock)

    if (Deadlock::check_and_resolve(trx))

      ut_ad(!trx->lock.wait_lock);

      trx->error_state= DB_DEADLOCK;

      goto end_wait;

But lock_sys.wait_mutex is not acquired in the following call stack:

0x00005562572076ca in trx_commit_or_rollback_prepare (trx=0x4a6d5ba30040)

    at ./storage/innobase/trx/trx0trx.cc:1507

1507                    trx->lock.wait_thr = NULL;

(rr) bt

#0  0x00005562572076ca in trx_commit_or_rollback_prepare (trx=0x4a6d5ba30040)

    at ./storage/innobase/trx/trx0trx.cc:1507

#1  0x00005562571e7c38 in trx_rollback_step (thr=0x6160085ce218) at ./storage/innobase/trx/trx0roll.cc:915

#2  0x0000556256ffe0d9 in que_thr_step (thr=0x6160085ce218) at ./storage/innobase/que/que0que.cc:659

#3  0x0000556256ffe430 in que_run_threads_low (thr=0x6160085ce218) at ./storage/innobase/que/que0que.cc:709

#4  0x0000556256ffe5d2 in que_run_threads (thr=0x6160085ce218) at ./storage/innobase/que/que0que.cc:729

#5  0x00005562571e9102 in trx_t::rollback_low (this=0x4a6d5ba30040, savept=0x0)

    at ./storage/innobase/trx/trx0roll.cc:124

#6  0x00005562571e3593 in trx_t::rollback (this=0x4a6d5ba30040, savept=0x0)

    at ./storage/innobase/trx/trx0roll.cc:176

#7  0x00005562570c0f1b in row_mysql_handle_errors (new_err=0x640000aae430, trx=0x4a6d5ba30040, thr=0x620000241848, savept=0x0)

    at ./storage/innobase/row/row0mysql.cc:696

Theoretically, trx_t::mutex could protect trx->lock.wait_thr from racing. It's acquired in trx_rollback_step() before trx_commit_or_rollback_prepare() call, and it's also acquired in lock_cancel_waiting_and_release(). But there is some room between victim->lock.was_chosen_as_deadlock_victim= true assigning and trx->mutex_lock() in lock_cancel_waiting_and_release() call in Deadlock::report(), and this gives the ability to rollback the transaction in parallel thread before transaction mutex is acquired in lock_cancel_waiting_and_release().

By the same reason acquiring lock_sys.mutex in trx_commit_or_rollback_prepare() before trx->lock.wait_thr zeroing out can't solve the issue.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

MDEV-29622.test.diff
4 kB
2022-10-06 18:12

Issue Links

causes

MDEV-29869 mtr failure: innodb.deadlock_wait_thr_race

Closed

relates to

MDEV-29635 race on trx->lock.wait_lock in deadlock resolution

Closed

MDEV-29860 Simplify transaction rollback initiation on termination from the other thread

Closed

Activity

Transition	Time In Source Status	Execution Times

Vladislav Lesin made transition - 2022-10-06 14:31

Open

In Progress

12d 21h 7m

Vladislav Lesin made transition - 2022-10-24 07:51

In Progress

Closed

17d 17h 20m

People

Assignee:: Vladislav Lesin

Reporter:: Vladislav Lesin

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 2022-09-23 17:23

Updated:: 2024-07-07 19:52

Resolved:: 2022-10-24 07:51

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration