[MDEV-23369] False sharing in page_hash_latch::read_lock_wait() - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Blocker
Resolution: Fixed
Affects Version/s: 10.5.4
Fix Version/s: 10.5.5
Component/s: Storage Engine - InnoDB
Labels:
- performance

Description

~~MDEV-22871~~ refactored the InnoDB buf_pool.page_hash to use a simple rw-lock implementation that avoids a spinloop between non-contended read-lock requests, simply using std::atomic::fetch_add() for the lock acquisition.

Alas, wlad noticed that a write-heavy stress test on a 56-core system with 1000 concurrent client connections would every few seconds indicate that the server appears to halt, delivering 0 transactions per second. It is not a permanent hang; the performance will resume after some time.

I attached GDB to the server during one such apparent halt, and I saw 22 of the 1,033 threads trying to access the same object:

10.5 8ddebb33c28b0aeaa6550ac0e825beccd367bb2c
#1 0x00005628f70317d5 in page_hash_latch::read_lock_wait (
this=this@entry=0x7f2ae590d040)
at /home/marko/server/storage/innobase/buf/buf0buf.cc:298

In each of the calls, the page_id is distinct, and each invocation is for an undo log page:

10.5 8ddebb33c28b0aeaa6550ac0e825beccd367bb2c
Thread 5 (Thread 0x7f23a9d8f700 (LWP 296467)):
#6 0x00005628f7037127 in buf_page_get_gen (page_id={m_id = 193651}, …)
#7 0x00005628f6fefc91 in trx_undo_reuse_cached
Thread 25 (Thread 0x7f23aa60e700 (LWP 296445)):
#6 0x00005628f7037127 in buf_page_get_gen (page_id={m_id = 194308}, …)
#7 0x00005628f6fea9b4 in trx_undo_page_get
Thread 45 (Thread 0x7f23aadf7700 (LWP 296406)):
#6 0x00005628f7037127 in buf_page_get_gen (page_id={m_id = 193982}, …)
#7 0x00005628f6fefc91 in trx_undo_reuse_cached
Thread 46 (Thread 0x7f23aae42700 (LWP 296403)):
#6 0x00005628f7037127 in buf_page_get_gen (page_id={m_id = 193818}, …)
#7 0x00005628f6fefc91 in trx_undo_reuse_cached
Thread 67 (Thread 0x7f23ab70c700 (LWP 296362)):
#6 0x00005628f7037127 in buf_page_get_gen (page_id={m_id = 194076}, …)
#7 0x00005628f6fefc91 in trx_undo_reuse_cached
Thread 81 (Thread 0x7f23abef5700 (LWP 296332)):
#6 0x00005628f7037127 in buf_page_get_gen (page_id={m_id = 193656}, …)
#7 0x00005628f6fefc91 in trx_undo_reuse_cached
Thread 85 (Thread 0x7f23ac198700 (LWP 296324)):
#5 0x00005628f7037127 in buf_page_get_gen (page_id={m_id = 194312}, …)
#6 0x00005628f6fea9b4 in trx_undo_page_get
Thread 88 (Thread 0x7f23ac2c4700 (LWP 296319)):
#6 0x00005628f7037127 in buf_page_get_gen (page_id={m_id = 193831}, …)
#7 0x00005628f6fefc91 in trx_undo_reuse_cached
Thread 104 (Thread 0x7f23ac9cc700 (LWP 296286)):
#6 0x00005628f7037127 in buf_page_get_gen (page_id={m_id = 193952}, …)
#7 0x00005628f6fefc91 in trx_undo_reuse_cached
Thread 106 (Thread 0x7f23acbd9700 (LWP 296283)):
#6 0x00005628f7037127 in buf_page_get_gen (page_id={m_id = 194298}, …)
#7 0x00005628f6fea9b4 in trx_undo_page_get
Thread 129 (Thread 0x7f23ad61a700 (LWP 296238)):
#5 0x00005628f7037127 in buf_page_get_gen (page_id={m_id = 194272}, …)
#6 0x00005628f6fefc91 in trx_undo_reuse_cached
Thread 147 (Thread 0x7f23ae010700 (LWP 296201)):
#6 0x00005628f7037127 in buf_page_get_gen (page_id={m_id = 193899}, …)
#7 0x00005628f6fefc91 in trx_undo_reuse_cached
Thread 150 (Thread 0x7f23ae1d2700 (LWP 296195)):
#6 0x00005628f7037127 in buf_page_get_gen (page_id={m_id = 194106}, …)
#7 0x00005628f6fefc91 in trx_undo_reuse_cached
Thread 158 (Thread 0x7f23ae5a1700 (LWP 296180)):
#6 0x00005628f7037127 in buf_page_get_gen (page_id={m_id = 193795}, …)
#7 0x00005628f6fefc91 in trx_undo_reuse_cached
Thread 162 (Thread 0x7f23ae844700 (LWP 296171)):
#6 0x00005628f7037127 in buf_page_get_gen (page_id={m_id = 193947}, …)
#7 0x00005628f6fefc91 in trx_undo_reuse_cached
Thread 176 (Thread 0x7f23aee6b700 (LWP 296144)):
#6 0x00005628f7037127 in buf_page_get_gen (page_id={m_id = 194302}, …)
#7 0x00005628f6fefc91 in trx_undo_reuse_cached
Thread 195 (Thread 0x7f23af816700 (LWP 296105)):
#5 0x00005628f7037127 in buf_page_get_gen (page_id={m_id = 194233}, …)
#6 0x00005628f6fea9b4 in trx_undo_page_get
Thread 200 (Thread 0x7f23afa6e700 (LWP 296095)):
#5 0x00005628f7037127 in buf_page_get_gen (page_id={m_id = 194014}, …)
#6 0x00005628f6fea9b4 in trx_undo_page_get
Thread 207 (Thread 0x7f23afe3d700 (LWP 296082)):
#5 0x00005628f7037127 in buf_page_get_gen (page_id={m_id = 193891}, …)
#6 0x00005628f6fefc91 in trx_undo_reuse_cached
Thread 212 (Thread 0x7f23b4102700 (LWP 296071)):
#6 0x00005628f7037127 in buf_page_get_gen (page_id={m_id = 194293}, …)
#7 0x00005628f6fea9b4 in trx_undo_page_get

The false sharing is completely eliminated by the following:

diff --git a/storage/innobase/include/buf0buf.h b/storage/innobase/include/buf0buf.h

index 2677d42..20bf8d5 100644

--- a/storage/innobase/include/buf0buf.h

+++ b/storage/innobase/include/buf0buf.h

@@ -1824,7 +1824,8 @@ class buf_pool_t

     /** Number of array[] elements per page_hash_latch.

     Must be one less than a power of 2. */

-    static constexpr size_t ELEMENTS_PER_LATCH= 1023;

+    static constexpr size_t ELEMENTS_PER_LATCH= CPU_LEVEL1_DCACHE_LINESIZE /

+      sizeof(void*) - 1;

     /** number of payload elements in array[] */

     Atomic_relaxed<ulint> n_cells;

The practical minimum value of CPU_LEVEL1_DCACHE_LINESIZE appears to be 64 bytes, and the practical maximum value of sizeof(void*) is 8 bytes. Those are the exact values on the AMD64 a.k.a. Intel EM64T a.k.a. x86_64 ISA.

With the above fix, we would use at most 1/8 of buf_pool.page_hash.array for the page_hash_latch. The payload size of the array is the number of pages in the buffer pool (innodb_buffer_pool_size/innodb_page_size). This number would be rounded up to a slightly larger prime number, and with the above patch, multiplied by 8/7, or about 15%, and finally multiplied by sizeof(void*).

For example, a 50GiB buffer pool would comprise at most 3276800 pieces of 16KiB pages, and the raw payload size of the buf_pool.page_hash.array would be 25MiB. The above fix would increase the memory usage to 28.6MiB, which seems acceptable to me.

Attachments

Issue Links

is caused by

MDEV-22871 Contention on the buf_pool.page_hash

Closed

relates to

MDEV-23379 Deprecate and ignore options for InnoDB concurrency throttling

Closed

Activity

People

Assignee:: Marko Mäkelä

Reporter:: Marko Mäkelä

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 2020-08-02 17:39

Updated:: 2020-08-03 08:41

Resolved:: 2020-08-02 17:49

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server