[MDEV-22850] Reduce buf_pool.page_hash latch contention Created: 2020-06-10 Updated: 2020-07-15 Resolved: 2020-06-11 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.5 |
| Fix Version/s: | 10.5.4 |
| Type: | Bug | Priority: | Major |
| Reporter: | Marko Mäkelä | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | Scalability, performance | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Description |
|
The changes that were introduced in
Note: the LF_BACKOFF() was added in an attempt to reduce the problem. It feels strange that we are first loading lock_copy and then insisting that we will get an exact match on that lock_copy value. It seems that this could cause unnecessary contention between non-conflicting S-latch requests. The following naïve patch should alleviate that:
For even better results, we might want to rewrite the S-latch acquisition, as in the following incomplete patch. At least rw_lock_x_unlock_func() would have to be adjusted as well:
Also, somewhat related to this, we might want to replace the rw_lock_s_lock(l) call in buf_pool_t::page_hash_lock() with a non-inlined function call, because that code is extremely unlikely to be executed, only when the buffer pool is being resized. |
| Comments |
| Comment by Marko Mäkelä [ 2020-06-10 ] | ||||||||||||||||||||
|
Related to this, measurements on two Intel Cascade Lake based systems suggest that the PAUSE instruction latency was reverted from the Skylake microarchitecture’s roughly 120 clock cycles to pre-Skylake levels (about 12 clock cycles). To compensate that, it seems that a larger value of my_cpu_relax_multiplier would be useful. We should also finally implement a startup message about it (MDEV-19929). | ||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-06-10 ] | ||||||||||||||||||||
|
I filed and rejected My suggested patch cannot be helpful after all. The lock_copy is actually being reloaded in the loop, by the compare_exchange_strong() operation. Adding a redundant load of the lock_copy to the loop hurt performance, not improve it. The performance regression on Intel Cascade Lake remains unresolved. And we should still review how we could reduce the buf_pool.page_hash X-latch hold time, which should be the only factor that contributes to this contention. | ||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-06-10 ] | ||||||||||||||||||||
|
I believe that we can rely solely on buf_pool.mutex on some code paths where we are currently acquiring the rw-lock on buf_pool.page_hash. Currently, my fix is causing hangs especially in encryption tests. I now believe there is no need to have a LF_BACKOFF() call in rw_lock_lock_word_decr(). After all, we are terminating the spinloop as soon as a conflict is detected. | ||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-06-11 ] | ||||||||||||||||||||
|
I was able to reduce the buf_pool.page_hash X-latch hold time in buf_page_init_for_read() and buf_page_create(), which I hope should address the root cause of this. That change includes the following:
It remains to be seen whether having the LF_BACKOFF() could still be beneficial. After all, we could still have many concurrent S-latch requests on the buf_pool.page_hash. A proper fix for the spurious S-latch contention (if it still remains an issue) might be to implement a simpler variant of rw_lock_t that does not support SX-latches (we only need those for dict_index_t::lock, buf_block_t::lock and possibly fil_space_t::latch) nor any recursive latches, and implements the S-latch acquisition by fetch_add(1). | ||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-06-11 ] | ||||||||||||||||||||
|
To be on the safe side with regard to performance when the number of concurrently executing threads exceeds the number of available CPU cores, I decided to retain the LF_BACKOFF() call after all, even though krunalbauskar found it to reduce performance on one system. That call will be removed in In buf_page_init_for_read(), we must acquire the buf_pool.page_hash latch early to prevent a race condition with buf_pool_t::watch_remove(). So, basically this fix is only reducing contention in buf_page_create(), which should mostly benefit read-write use cases. |