[MDEV-24789] Performance regression after MDEV-24671 - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Blocker
Resolution: Fixed
Affects Version/s: 10.6
Fix Version/s: 10.6.0
Component/s: Storage Engine - InnoDB
Labels:
- performance
- regression

Description

The fix of ~~MDEV-24671~~ introduced a serious performance regression that is observable at 32 concurrent connections.

My current hypothesis based on some initial investigation is that the changed sizeof(trx->lock) caused an increase of cache misses.

Attachments

Issue Links

causes

MDEV-25016 Race condition between lock_sys_t::cancel() and page split or merge

Closed

MDEV-25371 Potential hang in wsrep_is_BF_lock_timeout()

Closed

MDEV-26883 InnoDB hang due to table lock conflict

Closed

is caused by

MDEV-24671 Assertion failure in lock_wait_table_reserve_slot()

Closed

relates to

MDEV-25016 Race condition between lock_sys_t::cancel() and page split or merge

Closed

Activity

Ascending order - Click to sort in descending order

Marko Mäkelä added a comment - 2021-02-05 17:15

Reducing sizeof(trx_lock_t) did not help (I did it anyway), but some refactoring to reduce the hold time of lock_sys.mutex and lock_sys.wait_mutex seems to have done the trick. Thanks to axel for helping with the analysis and validation!

Marko Mäkelä added a comment - 2021-02-05 17:15 Reducing sizeof(trx_lock_t) did not help (I did it anyway), but some refactoring to reduce the hold time of lock_sys.mutex and lock_sys.wait_mutex seems to have done the trick. Thanks to axel for helping with the analysis and validation!

Marko Mäkelä added a comment - 2021-02-25 16:58

Some performance regression is still present. The function lock_wait() that was introduced in ~~MDEV-24671~~ was holding lock_sys.wait_mutex for unnecessarily long time.

Marko Mäkelä added a comment - 2021-02-25 16:58 Some performance regression is still present. The function lock_wait() that was introduced in MDEV-24671 was holding lock_sys.wait_mutex for unnecessarily long time.

Marko Mäkelä added a comment - 2021-02-26 13:22

Even after reducing the lock_sys.wait_mutex hold time in lock_wait() to the minimum, some regression is present, and more work needs to be done.

Marko Mäkelä added a comment - 2021-02-26 13:22 Even after reducing the lock_sys.wait_mutex hold time in lock_wait() to the minimum, some regression is present, and more work needs to be done.

Marko Mäkelä added a comment - 2021-03-01 15:23

It looks like ~~MDEV-24671~~ emphasized a pre-existing bottleneck on log_sys.mutex, which we hope to address in ~~MDEV-14425~~. The remaining performance regression was only observed on RAM disk, not on real storage.

That said, the latest change that is being tested should reduce contention on lock_sys.latch and lock_sys.wait_mutex to the absolute minimum.

Marko Mäkelä added a comment - 2021-03-01 15:23 It looks like MDEV-24671 emphasized a pre-existing bottleneck on log_sys.mutex , which we hope to address in MDEV-14425 . The remaining performance regression was only observed on RAM disk, not on real storage. That said, the latest change that is being tested should reduce contention on lock_sys.latch and lock_sys.wait_mutex to the absolute minimum.

People

Assignee:: Marko Mäkelä

Reporter:: Marko Mäkelä

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 2021-02-05 10:20

Updated:: 2022-12-20 14:56

Resolved:: 2021-03-02 12:41

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server