[MDEV-20612] Improve InnoDB lock_sys scalability Created: 2019-09-17 Updated: 2021-08-25 Resolved: 2021-02-12 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Fix Version/s: | 10.6.0 |
| Type: | Task | Priority: | Major |
| Reporter: | Sergey Vojtovich | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 2 |
| Labels: | performance | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
lock_sys is one of three major InnoDB scalability bottlenecks. Scalability issues are especially obvious under sysbench OLTP update index/non-index benchmarks. There's no clarity on how exactly it should be optimised yet. |
| Comments |
| Comment by Marko Mäkelä [ 2019-09-17 ] |
|
As noted in Maybe we could store table_locks in a lock-free hash table, so that they can traversed by diagnostic printouts. Similarly, maybe we can extend |
| Comment by zhai weixiang [ 2019-10-09 ] |
|
we partitoned the record lock hash which works fine. the diffcult part is how to detect deadlock with multiple partition. Lock-free lock system is a good idea, AFAIK, Aws aurora did a similar job to make it lock free. There's a paper describing this: https://ts.data61.csiro.au/publications/nicta_full_text/6465.pdf |
| Comment by Marko Mäkelä [ 2020-10-05 ] |
|
MySQL 8.0.21 implemented a sharded lock_sys mutex. Edit: they seem to have replaced the mutex with a partitioned rw-latch, using page_id_t as the key. For deadlock detection, they appear to employ a global rw-latch. I think that it might be helpful to first implement MDEV-16232 to avoid the creation of explicit record locks during non-conflicting DELETE and UPDATE. Table IX lock creation would still cause contention on lock_sys. |
| Comment by Marko Mäkelä [ 2020-11-05 ] |
|
In The code around suspending and resuming threads is rather convoluted. In particular, que_thr_stop_for_mysql() and lock_wait_suspend_thread() are separated from each other, and the trx->mutex is being acquired and released multiple times while a lock wait is being registered. Also, there are multiple state fields related to lock waits, both in que_thr_t and trx_t. I think that our performance would improve a little if we cleaned this up a little. |
| Comment by Marko Mäkelä [ 2020-12-16 ] |
|
I reviewed the MySQL 8.0.21 WL#10314 again.
Because implementing MDEV-16232 would require much more effort than this, I think that we must consider this approach for MariaDB. One complication is that lock_wait_suspend_thread() in |
| Comment by Marko Mäkelä [ 2021-01-08 ] |
|
I replaced lock_wait_suspend_thread() with a simple lock_wait() that will use pthread_cond_timedwait() for waiting for the lock to the granted. If that call fails with an error, we know that a timeout has occurred. The wait may be interrupted by ha_kill_query() or deadlock detection, which will simply invoke pthread_cond_signal(). There is no need for a separate lock_wait_timeout_task, which would wake up once per second. Also, the relative latching order of lock_sys.wait_mutex and lock_sys.mutex (which will be replaced with lock_sys.latch) will be swapped. Hopefully this necessary refactoring will provide some additional performance benefit. |
| Comment by Marko Mäkelä [ 2021-01-09 ] |
|
Also srv_slot_t can be removed and the locality of reference improved by storing trx->lock.wait_lock and trx->lock.cond in adjacent addresses. |
| Comment by Marko Mäkelä [ 2021-01-15 ] |
|
zhaiwx1987, I adapted the |
| Comment by Marko Mäkelä [ 2021-01-19 ] |
|
The lock_wait() refactoring was causing some assertion failures in the start/stop que_thr_t bookkeeping. I think that it is simplest to remove that bookkeeping along with removing some unnecessary data members or enum values. Edit: This was done in It turns out that the partitioned lock_sys.mutex will not work efficiently with the old DeadlockChecker. It must be refactored, similar to what was done in Oracle Bug #29882690 in MySQL 8.0.18. |
| Comment by Marko Mäkelä [ 2021-01-28 ] |
|
As a minimal change, I moved the DeadlockChecker::search() invocation to lock_wait(). A separate deadlock checker thread or task might still be useful. For that, I do not think that there is a need to introduce any blocking_trx data member. In our code, it should be safe to follow the chain of trx->lock.wait_lock->trx while holding lock_sys.wait_mutex and possibly also trx->mutex. |
| Comment by Marko Mäkelä [ 2021-02-12 ] |
|
We replaced lock_sys.mutex with a lock_sys.latch ( The entire world of transactional locks can be stopped by acquiring lock_sys.latch in exclusive mode. Scalability is achieved by making most users use a combination of a shared lock_sys.latch and a lock-specific dict_table_t::lock_mutex or lock_sys_t::hash_latch that is embedded in each cache line of the lock_sys.rec_hash, lock_sys.prdt_hash, or lock_sys.prdt_page_hash. The lock_sys_t::hash_latch is always 4 or 8 bytes. On other systems than Linux, OpenBSD, and Microsoft Windows, the lock_sys_t::hash_latch::release() will always acquire a mutex and signal a condition variable. This is a known scalability bottleneck and could be improved further on such systems, by splitting the mutex and condition variable. (If such systems supported a lightweight mutex that is at most sizeof(void*), then we could happily use that.) Until |