MDEV-22871 refactored the InnoDB buf_pool.page_hash to use a simple rw-lock implementation that avoids a spinloop between non-contended read-lock requests, simply using std::atomic::fetch_add() for the lock acquisition.
Alas, Vladislav Vaintroub noticed that a write-heavy stress test on a 56-core system with 1000 concurrent client connections would every few seconds indicate that the server appears to halt, delivering 0 transactions per second. It is not a permanent hang; the performance will resume after some time.
I attached GDB to the server during one such apparent halt, and I saw 22 of the 1,033 threads trying to access the same object:
In each of the calls, the page_id is distinct, and each invocation is for an undo log page:
The false sharing is completely eliminated by the following:
The practical minimum value of CPU_LEVEL1_DCACHE_LINESIZE appears to be 64 bytes, and the practical maximum value of sizeof(void*) is 8 bytes. Those are the exact values on the AMD64 a.k.a. Intel EM64T a.k.a. x86_64 ISA.
With the above fix, we would use at most 1/8 of buf_pool.page_hash.array for the page_hash_latch. The payload size of the array is the number of pages in the buffer pool (innodb_buffer_pool_size/innodb_page_size). This number would be rounded up to a slightly larger prime number, and with the above patch, multiplied by 8/7, or about 15%, and finally multiplied by sizeof(void*).
For example, a 50GiB buffer pool would comprise at most 3276800 pieces of 16KiB pages, and the raw payload size of the buf_pool.page_hash.array would be 25MiB. The above fix would increase the memory usage to 28.6MiB, which seems acceptable to me.