[MDEV-25016] Race condition between lock_sys_t::cancel() and page split or merge - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Blocker
Resolution: Fixed
Affects Version/s: 10.6
Fix Version/s: 10.6.0
Component/s: Storage Engine - InnoDB
Labels:
- performance

Description

In ~~MDEV-24789~~, we are minimizing the use of lock_sys.latch. It turns out that when the acquisition of an exclusive lock_sys.latch in innobase_kill_query() is replaced with an acquisition of a shared lock_sys.latch, a number of tests would occasionally hang:

rpl.rpl_parallel_optimistic
rpl.rpl_parallel_optimistic_xa_lsu_off
rpl.rpl_parallel_optimistic_nobinlog

It seems that we can work around this bug by making innobase_kill_query() acquire an exclusive lock_sys.latch instead of a shared one. This work-around will obviously hurt performance, and I would think that it is merely reducing the probability of such hangs, instead of fixing them altogether. Until this bug is fixed, we can invoke the work-around whenever thd_need_wait_reports() holds.

Note: thd_need_wait_reports() holds even when no replication is being used, and only the option log_bin is enabled. That condition seems to be necessary, because without it, the test binlog.rpl_parallel_optimistic would hang (fall back to innodb_lock_wait_timeout).

Attachments

Issue Links

is caused by

MDEV-24789 Performance regression after MDEV-24671

Closed

relates to

MDEV-24789 Performance regression after MDEV-24671

Closed

MDEV-24948 thd_need_wait_reports() hurts performance

Open

Activity

People

Assignee:: Marko Mäkelä

Reporter:: Marko Mäkelä

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2021-03-01 12:50

Updated:: 2021-03-09 10:21

Resolved:: 2021-03-04 12:49

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.