Details
-
Bug
-
Status: Closed (View Workflow)
-
Blocker
-
Resolution: Fixed
-
10.5.4
Description
MDEV-22871 refactored the InnoDB buf_pool.page_hash to use a simple rw-lock implementation that avoids a spinloop between non-contended read-lock requests, simply using std::atomic::fetch_add() for the lock acquisition.
Alas, wlad noticed that a write-heavy stress test on a 56-core system with 1000 concurrent client connections would every few seconds indicate that the server appears to halt, delivering 0 transactions per second. It is not a permanent hang; the performance will resume after some time.
I attached GDB to the server during one such apparent halt, and I saw 22 of the 1,033 threads trying to access the same object:
10.5 8ddebb33c28b0aeaa6550ac0e825beccd367bb2c |
#1 0x00005628f70317d5 in page_hash_latch::read_lock_wait (
|
this=this@entry=0x7f2ae590d040)
|
at /home/marko/server/storage/innobase/buf/buf0buf.cc:298
|
In each of the calls, the page_id is distinct, and each invocation is for an undo log page:
10.5 8ddebb33c28b0aeaa6550ac0e825beccd367bb2c |
Thread 5 (Thread 0x7f23a9d8f700 (LWP 296467)):
|
#6 0x00005628f7037127 in buf_page_get_gen (page_id={m_id = 193651}, …)
|
#7 0x00005628f6fefc91 in trx_undo_reuse_cached
|
Thread 25 (Thread 0x7f23aa60e700 (LWP 296445)):
|
#6 0x00005628f7037127 in buf_page_get_gen (page_id={m_id = 194308}, …)
|
#7 0x00005628f6fea9b4 in trx_undo_page_get
|
Thread 45 (Thread 0x7f23aadf7700 (LWP 296406)):
|
#6 0x00005628f7037127 in buf_page_get_gen (page_id={m_id = 193982}, …)
|
#7 0x00005628f6fefc91 in trx_undo_reuse_cached
|
Thread 46 (Thread 0x7f23aae42700 (LWP 296403)):
|
#6 0x00005628f7037127 in buf_page_get_gen (page_id={m_id = 193818}, …)
|
#7 0x00005628f6fefc91 in trx_undo_reuse_cached
|
Thread 67 (Thread 0x7f23ab70c700 (LWP 296362)):
|
#6 0x00005628f7037127 in buf_page_get_gen (page_id={m_id = 194076}, …)
|
#7 0x00005628f6fefc91 in trx_undo_reuse_cached
|
Thread 81 (Thread 0x7f23abef5700 (LWP 296332)):
|
#6 0x00005628f7037127 in buf_page_get_gen (page_id={m_id = 193656}, …)
|
#7 0x00005628f6fefc91 in trx_undo_reuse_cached
|
Thread 85 (Thread 0x7f23ac198700 (LWP 296324)):
|
#5 0x00005628f7037127 in buf_page_get_gen (page_id={m_id = 194312}, …)
|
#6 0x00005628f6fea9b4 in trx_undo_page_get
|
Thread 88 (Thread 0x7f23ac2c4700 (LWP 296319)):
|
#6 0x00005628f7037127 in buf_page_get_gen (page_id={m_id = 193831}, …)
|
#7 0x00005628f6fefc91 in trx_undo_reuse_cached
|
Thread 104 (Thread 0x7f23ac9cc700 (LWP 296286)):
|
#6 0x00005628f7037127 in buf_page_get_gen (page_id={m_id = 193952}, …)
|
#7 0x00005628f6fefc91 in trx_undo_reuse_cached
|
Thread 106 (Thread 0x7f23acbd9700 (LWP 296283)):
|
#6 0x00005628f7037127 in buf_page_get_gen (page_id={m_id = 194298}, …)
|
#7 0x00005628f6fea9b4 in trx_undo_page_get
|
Thread 129 (Thread 0x7f23ad61a700 (LWP 296238)):
|
#5 0x00005628f7037127 in buf_page_get_gen (page_id={m_id = 194272}, …)
|
#6 0x00005628f6fefc91 in trx_undo_reuse_cached
|
Thread 147 (Thread 0x7f23ae010700 (LWP 296201)):
|
#6 0x00005628f7037127 in buf_page_get_gen (page_id={m_id = 193899}, …)
|
#7 0x00005628f6fefc91 in trx_undo_reuse_cached
|
Thread 150 (Thread 0x7f23ae1d2700 (LWP 296195)):
|
#6 0x00005628f7037127 in buf_page_get_gen (page_id={m_id = 194106}, …)
|
#7 0x00005628f6fefc91 in trx_undo_reuse_cached
|
Thread 158 (Thread 0x7f23ae5a1700 (LWP 296180)):
|
#6 0x00005628f7037127 in buf_page_get_gen (page_id={m_id = 193795}, …)
|
#7 0x00005628f6fefc91 in trx_undo_reuse_cached
|
Thread 162 (Thread 0x7f23ae844700 (LWP 296171)):
|
#6 0x00005628f7037127 in buf_page_get_gen (page_id={m_id = 193947}, …)
|
#7 0x00005628f6fefc91 in trx_undo_reuse_cached
|
Thread 176 (Thread 0x7f23aee6b700 (LWP 296144)):
|
#6 0x00005628f7037127 in buf_page_get_gen (page_id={m_id = 194302}, …)
|
#7 0x00005628f6fefc91 in trx_undo_reuse_cached
|
Thread 195 (Thread 0x7f23af816700 (LWP 296105)):
|
#5 0x00005628f7037127 in buf_page_get_gen (page_id={m_id = 194233}, …)
|
#6 0x00005628f6fea9b4 in trx_undo_page_get
|
Thread 200 (Thread 0x7f23afa6e700 (LWP 296095)):
|
#5 0x00005628f7037127 in buf_page_get_gen (page_id={m_id = 194014}, …)
|
#6 0x00005628f6fea9b4 in trx_undo_page_get
|
Thread 207 (Thread 0x7f23afe3d700 (LWP 296082)):
|
#5 0x00005628f7037127 in buf_page_get_gen (page_id={m_id = 193891}, …)
|
#6 0x00005628f6fefc91 in trx_undo_reuse_cached
|
Thread 212 (Thread 0x7f23b4102700 (LWP 296071)):
|
#6 0x00005628f7037127 in buf_page_get_gen (page_id={m_id = 194293}, …)
|
#7 0x00005628f6fea9b4 in trx_undo_page_get
|
The false sharing is completely eliminated by the following:
diff --git a/storage/innobase/include/buf0buf.h b/storage/innobase/include/buf0buf.h
|
index 2677d42..20bf8d5 100644
|
--- a/storage/innobase/include/buf0buf.h
|
+++ b/storage/innobase/include/buf0buf.h
|
@@ -1824,7 +1824,8 @@ class buf_pool_t
|
{
|
/** Number of array[] elements per page_hash_latch.
|
Must be one less than a power of 2. */
|
- static constexpr size_t ELEMENTS_PER_LATCH= 1023;
|
+ static constexpr size_t ELEMENTS_PER_LATCH= CPU_LEVEL1_DCACHE_LINESIZE /
|
+ sizeof(void*) - 1;
|
|
/** number of payload elements in array[] */
|
Atomic_relaxed<ulint> n_cells; |
The practical minimum value of CPU_LEVEL1_DCACHE_LINESIZE appears to be 64 bytes, and the practical maximum value of sizeof(void*) is 8 bytes. Those are the exact values on the AMD64 a.k.a. Intel EM64T a.k.a. x86_64 ISA.
With the above fix, we would use at most 1/8 of buf_pool.page_hash.array for the page_hash_latch. The payload size of the array is the number of pages in the buffer pool (innodb_buffer_pool_size/innodb_page_size). This number would be rounded up to a slightly larger prime number, and with the above patch, multiplied by 8/7, or about 15%, and finally multiplied by sizeof(void*).
For example, a 50GiB buffer pool would comprise at most 3276800 pieces of 16KiB pages, and the raw payload size of the buf_pool.page_hash.array would be 25MiB. The above fix would increase the memory usage to 28.6MiB, which seems acceptable to me.
Attachments
Issue Links
- is caused by
-
MDEV-22871 Contention on the buf_pool.page_hash
- Closed
- relates to
-
MDEV-23379 Deprecate and ignore options for InnoDB concurrency throttling
- Closed