[MDEV-20612] Improve InnoDB lock_sys scalability - Jira

Details

Type: Task
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Fix Version/s: 10.6.0
Component/s: Storage Engine - InnoDB
Labels:
- performance

Description

lock_sys is one of three major InnoDB scalability bottlenecks. Scalability issues are especially obvious under sysbench OLTP update index/non-index benchmarks.

There's no clarity on how exactly it should be optimised yet.

Attachments

Issue Links

blocks

MDEV-21452 Use condition variables and normal mutexes instead of InnoDB os_event and mutex

Closed

causes

MDEV-24861 Assertion `trx->rsegs.m_redo.rseg' failed in innodb_prepare_commit_versioned

Closed

MDEV-35708 lock_rec_get_prev() returns only the first record lock

Closed

includes

MDEV-24731 Excessive mutex contention in DeadlockChecker::check_and_resolve()

Closed

is blocked by

MDEV-24671 Assertion failure in lock_wait_table_reserve_slot()

Closed

relates to

MDEV-11392 AliSQL: [perf] Issue#31 OPTIMIZE CHECK/GRANT OF INNODB TABLE LOCK

Closed

MDEV-16406 Refactor the InnoDB record locks

Open

MDEV-24813 Locking full table scan fails to use table-level locking

In Review

MDEV-24948 thd_need_wait_reports() hurts performance

Open

MDEV-24952 Simplify the locking and access of lock hash tables

Closed

MDEV-26476 InnoDB is missing futex support on some platforms

Closed

MDEV-18706 ER_LOCK_DEADLOCK on concurrent read and insert into already locked gap

In Review

MDEV-21175 Remove dict_table_t::n_foreign_key_checks_running from InnoDB

Closed

MDEV-24915 Galera conflict resolution is unnecessarily complex

Closed

MDEV-25010 Assertion `!lock_sys_t::get_first(receiver_cell, receiver_id, receiver_heap_no)' failed in lock_rec_move

Closed

links to

WL#10314: InnoDB: Lock-sys optimization: sharded lock_sys mutex

(10 relates to, 1 links to)

Activity

Descending order - Click to sort in ascending order

Marko Mäkelä added a comment - 2021-02-12 16:00

We replaced lock_sys.mutex with a lock_sys.latch (~~MDEV-24167~~) that is 4 or 8 bytes on Linux, Microsoft Windows or OpenBSD. On other systems, a native rw-lock or a mutex and two condition variables will be used.

The entire world of transactional locks can be stopped by acquiring lock_sys.latch in exclusive mode.

Scalability is achieved by making most users use a combination of a shared lock_sys.latch and a lock-specific dict_table_t::lock_mutex or lock_sys_t::hash_latch that is embedded in each cache line of the lock_sys.rec_hash, lock_sys.prdt_hash, or lock_sys.prdt_page_hash. The lock_sys_t::hash_latch is always 4 or 8 bytes. On other systems than Linux, OpenBSD, and Microsoft Windows, the lock_sys_t::hash_latch::release() will always acquire a mutex and signal a condition variable. This is a known scalability bottleneck and could be improved further on such systems, by splitting the mutex and condition variable. (If such systems supported a lightweight mutex that is at most sizeof(void*), then we could happily use that.)

Until ~~MDEV-24738~~ has been fixed, the deadlock detector will remain a significant bottleneck, because each lock_wait() would acquire lock_sys.latch in exclusive mode. This bottleneck can be avoided by setting innodb_deadlock_detect=OFF.

Marko Mäkelä added a comment - 2021-02-12 16:00 We replaced lock_sys.mutex with a lock_sys.latch ( MDEV-24167 ) that is 4 or 8 bytes on Linux, Microsoft Windows or OpenBSD. On other systems, a native rw-lock or a mutex and two condition variables will be used. The entire world of transactional locks can be stopped by acquiring lock_sys.latch in exclusive mode. Scalability is achieved by making most users use a combination of a shared lock_sys.latch and a lock-specific dict_table_t::lock_mutex or lock_sys_t::hash_latch that is embedded in each cache line of the lock_sys.rec_hash , lock_sys.prdt_hash , or lock_sys.prdt_page_hash . The lock_sys_t::hash_latch is always 4 or 8 bytes. On other systems than Linux, OpenBSD, and Microsoft Windows, the lock_sys_t::hash_latch::release() will always acquire a mutex and signal a condition variable. This is a known scalability bottleneck and could be improved further on such systems, by splitting the mutex and condition variable. (If such systems supported a lightweight mutex that is at most sizeof(void*) , then we could happily use that.) Until MDEV-24738 has been fixed, the deadlock detector will remain a significant bottleneck, because each lock_wait() would acquire lock_sys.latch in exclusive mode. This bottleneck can be avoided by setting innodb_deadlock_detect=OFF .

Marko Mäkelä added a comment - 2021-01-28 17:32

As a minimal change, I moved the DeadlockChecker::search() invocation to lock_wait(). A separate deadlock checker thread or task might still be useful. For that, I do not think that there is a need to introduce any blocking_trx data member. In our code, it should be safe to follow the chain of trx->lock.wait_lock->trx while holding lock_sys.wait_mutex and possibly also trx->mutex.

Marko Mäkelä added a comment - 2021-01-28 17:32 As a minimal change, I moved the DeadlockChecker::search() invocation to lock_wait() . A separate deadlock checker thread or task might still be useful. For that, I do not think that there is a need to introduce any blocking_trx data member. In our code, it should be safe to follow the chain of trx->lock.wait_lock->trx while holding lock_sys.wait_mutex and possibly also trx->mutex .

Marko Mäkelä added a comment - 2021-01-19 18:29 - edited

The lock_wait() refactoring was causing some assertion failures in the start/stop que_thr_t bookkeeping. I think that it is simplest to remove that bookkeeping along with removing some unnecessary data members or enum values. Edit: This was done in ~~MDEV-24671~~. As an added bonus, innodb_lock_wait_timeout is enforced more timely (no extra 1-second delay).

It turns out that the partitioned lock_sys.mutex will not work efficiently with the old DeadlockChecker. It must be refactored, similar to what was done in Oracle Bug #29882690 in MySQL 8.0.18.

Marko Mäkelä added a comment - 2021-01-19 18:29 - edited The lock_wait() refactoring was causing some assertion failures in the start/stop que_thr_t bookkeeping. I think that it is simplest to remove that bookkeeping along with removing some unnecessary data members or enum values. Edit: This was done in MDEV-24671 . As an added bonus, innodb_lock_wait_timeout is enforced more timely (no extra 1-second delay). It turns out that the partitioned lock_sys.mutex will not work efficiently with the old DeadlockChecker . It must be refactored, similar to what was done in Oracle Bug #29882690 in MySQL 8.0.18.

Marko Mäkelä added a comment - 2021-01-15 16:25 - edited

zhaiwx1987, I adapted the ~~MDEV-11392~~ idea from MySQL Bug #72948, but I introduced a single counter dict_table_t::n_lock_x_or_s. There is actually quite a bit of room for improvement in lock_sys, in addition to what was done in MySQL 8.0.21 WL#10314.

Marko Mäkelä added a comment - 2021-01-15 16:25 - edited zhaiwx1987 , I adapted the MDEV-11392 idea from MySQL Bug #72948 , but I introduced a single counter dict_table_t::n_lock_x_or_s . There is actually quite a bit of room for improvement in lock_sys , in addition to what was done in MySQL 8.0.21 WL#10314 .

Marko Mäkelä added a comment - 2021-01-09 17:08

Also srv_slot_t can be removed and the locality of reference improved by storing trx->lock.wait_lock and trx->lock.cond in adjacent addresses.

Marko Mäkelä added a comment - 2021-01-09 17:08 Also srv_slot_t can be removed and the locality of reference improved by storing trx->lock.wait_lock and trx->lock.cond in adjacent addresses.

View 6 older comments

People

Assignee:: Marko Mäkelä

Reporter:: Sergey Vojtovich

Votes:: 2 Vote for this issue

Watchers:: 12 Start watching this issue

Dates

Created:: 2019-09-17 13:17

Updated:: 2024-12-23 05:26

Resolved:: 2021-02-12 16:00

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server