[MDEV-32096] Parallel replication lags because innobase_kill_query() may fail to interrupt a lock wait - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.6.0, 10.6, 10.7(EOL), 10.8(EOL), 10.9(EOL), 10.10(EOL), 10.11, 11.0(EOL), 11.1(EOL), 11.2(EOL)
Fix Version/s: 10.6.16, 10.10.7, 10.11.6, 11.0.4, 11.1.3
Component/s: Locking, Replication, Storage Engine - InnoDB
Labels:
- performance

Description

~~MDEV-24671~~ introduced a race condition in the function innobase_kill_query(), which is responsible for interrupting a lock wait for the target of a KILL QUERY or KILL CONNECTION statement.

This can severely affect optimistic (and aggressive) parallel replication. If the race is triggered, conflicts are not resolved correctly and parallel replication will be blocked until --innodb-lock-wait-timeout. This will be seen in SHOW PROCESSLIST as one worker being in the "killed" state and some other worker stuck in a query.

A user reported a hang of parallel replication due to this, and knielsen spotted the data race: If the target transaction starts a lock wait roughly at the same time as innobase_kill_query() is invoked, then trx->lock.wait_lock could be read as nullptr and the lock wait would not be interrupted. Therefore, we need to acquire lock_sys.wait_mutex before checking if a lock wait needs to be aborted.

Attached mdev32096_testcase.patch is an (ugly) ./mtr testcase that triggers the problem.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

mdev32096_testcase.patch
2023-09-05 11:19
4 kB
Kristian Nielsen

Issue Links

causes

MDEV-32530 Race condition in lock_wait_rpl_report()

Closed

is caused by

MDEV-24671 Assertion failure in lock_wait_table_reserve_slot()

Closed

Activity

People

Assignee:: Marko Mäkelä

Reporter:: Marko Mäkelä

Votes:: 1 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 2023-09-05 05:57

Updated:: 2024-07-07 19:20

Resolved:: 2023-09-11 12:43

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration