[MDEV-15053] Reduce buf_pool_t::mutex contention Created: 2018-01-24  Updated: 2024-01-30  Resolved: 2020-06-05

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Fix Version/s: 10.5.4

Type: Task Priority: Critical
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 1
Labels: performance

Attachments: File bug75534-2.patch    
Issue Links:
Blocks
blocks MDEV-16526 Overhaul the InnoDB page flushing Closed
Duplicate
is duplicated by MDEV-18724 Replace buf_block_t::mutex with more ... Closed
Problem/Incident
causes MDEV-22816 Assertion `node->space == fil_system.... Closed
causes MDEV-23017 range query performance regression in... Closed
causes MDEV-23229 Read of Uninitialized memory during b... Closed
causes MDEV-23399 10.5 performance regression with IO-b... Closed
causes MDEV-23410 buf_LRU_scan_and_free_block() fails t... Closed
causes MDEV-23807 Assertion n_pending_flushes failed in... Closed
causes MDEV-23909 innodb_flush_neighbors=2 is treated l... Closed
causes MDEV-24054 Assertion `in_LRU_list' failed in buf... Closed
causes MDEV-25776 Assertion `bpage->in_page_hash' in bu... Closed
causes MDEV-29967 innodb_read_ahead_threshold (linear r... Closed
causes MDEV-33332 SIGSEGV in buf_read_ahead_linear() wh... Confirmed
Relates
relates to MDEV-15384 buf_flush_LRU_list_batch() always rep... Closed
relates to MDEV-22850 Reduce buf_pool.page_hash latch conte... Closed
relates to MDEV-22862 compilation failure on centos74-aarch64 Closed
relates to MDEV-22871 Contention on the buf_pool.page_hash Closed
relates to MDEV-23416 innodb.innodb_mutexes failed in build... Closed
relates to MDEV-23439 Assertion `size == space->size' faile... Closed
relates to MDEV-24090 Optimize buf_page_optimistic_get() Open
relates to MDEV-15016 multiple page cleaner threads use a l... Closed
relates to MDEV-15058 Remove multiple InnoDB buffer pool in... Closed
relates to MDEV-21330 Lock monitor doesn't print a semaphor... Closed
relates to MDEV-22031 Assertion `bpage->in_page_hash' faile... Closed
Sprint: 10.3.6-1, 10.4.0-1

 Description   

MySQL 8.0.0 split the InnoDB buf_pool_t::mutex. MariaDB should do something similar.

Instead of introducing more mutexes or radically changing the latching rules of various buf_pool_t and buf_block_t data members, I think that it is possible to reduce the contention on buf_pool.mutex by other means:

  • Move more code to inline functions of buf_pool_t or buf_page_t.
  • Reduce the amount of mutex release/reacquire dance in buf0flu.cc and buf0rea.cc.
  • Avoid repeated calls to page_id_t::fold() or page_id_t::page_id_t(); use page_id_t directly as loop iterator.
  • Move buf_page_t::flush_type to IORequest.
  • Split buf_page_io_complete() into separate ‘read completion’ and ‘write completion’ callbacks.
  • Avoid holding buf_pool.mutex during buf_pool.page_hash operations. Consider removing the debug field buf_page_t::in_page_hash.
  • Split operations on buf_pool.watch[] into two parts. The allocation of buf_pool.watch[] should be only protected by buf_pool.mutex, and the buf_pool.page_hash only by the hash bucket latch.


 Comments   
Comment by Marko Mäkelä [ 2018-02-08 ]

This was originally contributed by kastauyra as MySQL Bug #75534: Solve buffer pool mutex contention by splitting it.

Comment by Laurynas Biveinis [ 2018-02-09 ]

Make sure to review https://bugs.mysql.com/bug.php?id=85205, https://bugs.mysql.com/bug.php?id=85208. If your starting point is Oracle 8.0.3 commit, try to find follow-up fixes too, I only recall 946996a3767, e34c6b0dea2 from the top of my head.

Comment by Marko Mäkelä [ 2018-02-21 ]

kastauyra, thank you for the helpful pointers!
I noticed that the Oracle version of bug75534-2.patch omitted the contributed test changes. thiru, can you please try to add them?
I just pushed a test version of this. I think that more work is needed for the initial commit, because I believe that we really need to use the atomic memory access primitives in order to work correctly, especially on ARM and POWER.

Comment by Marko Mäkelä [ 2018-02-21 ]

thiru, a search of the MySQL 8.0 commit history reveals quite a few follow-up commits. Please cherry-pick those as well.
3b1718b8150ea92166111798c5dc6a11a0e4bfeb (with test case)
d887a49e84a618e150dfa90d3b977614dc0cb020 (looks reasonable)
bd914aebb3ec510784f364e5e71cd35bb8a0cad5 (with test case)
65649a16ce9412f7d9286e7e6ad6c2de5769be25 (looks questionable; let us skip it, and review the atomics instead)

Comment by Marko Mäkelä [ 2018-02-22 ]

Let us use the new branch bb-10.3-MDEV-15053 for this.
I cherry-picked the following:
3b1718b8150ea92166111798c5dc6a11a0e4bfeb (without test case or debug instrumentation)
d887a49e84a618e150dfa90d3b977614dc0cb020 (maybe we should test this while concurrently resizing the buffer pool)
I filed MDEV-15384 for a regression that MariaDB 10.2.2 got earlier, in an incorrect merge of InnoDB code from MySQL 5.7.9.

What remains to be done:

  1. Review and revise the use of atomics (use my_atomic_loadlint() or similar for reads)
  2. Check if the test cases of bug75534-2.patch can be imported.
  3. (for me): Squash your follow-up fixes the first commit, and review it.
Comment by Marko Mäkelä [ 2019-01-21 ]

In MySQL 8.0.14 there is a related bug fix:
Bug#28556539 LOCK_ORDER: CYCLE INVOLVING BUF_POOL_FREE_LIST_MUTEX AND BUF_POOL_ZIP_FREE_MUTEX
The buf_pool->free_list_mutex was introduced in MySQL 8.0.0.

Comment by Marko Mäkelä [ 2020-01-27 ]

The work is not only splitting buf_pool_t::mutex but also changing some data fields to rely on atomic memory operations instead of mutexes for controlling concurrent access.

The work so far was done on 10.3, where std::atomic is not available to InnoDB. Starting with 10.4, it is available and should be used. I think that we should try to use std::memory_order_relaxed where available.

Comment by Marko Mäkelä [ 2020-03-17 ]

This will require some more work and a careful review. At the moment, cmake -DWITH_ASAN is reporting heap-use-after-free in numerous tests.

Comment by Matthias Leich [ 2020-03-23 ]

Result of RQG testing on origin/bb-10.5-marko commit ab5fe285276ab2dc4d87fb59a31655db51706da5
The binaries build from this tree were neither better nor worse than actual 10.5.

Comment by Matthias Leich [ 2020-05-27 ]

Result of RQG testing on origin/bb-10.5-MDEV-15053-3 577ac7de0b26e595bc7bf678c25a65a0b4c92edc 2020-05-27T16:05:27+03:00
The binaries build from this tree were neither better nor worse than actual 10.5.

Comment by Marko Mäkelä [ 2020-05-29 ]

kevg, after your review I still fixed one thing that probably caused the occasional hangs that axel reproduced in buf_page_init_for_read(). I had wrongly released the buf_pool.page_hash latch before buf_page_t::set_io_fix(BUF_IO_READ) was called. This was a functional change from earlier.

Comment by Marko Mäkelä [ 2020-06-04 ]

The hang was very elusive, but finally I found the reason. Now that we removed buf_block_t::mutex, we must release buf_block_t::lock before invoking buf_block_t::unfix() or buf_block_t::io_unfix(). Otherwise, something bad will happen somewhere, causing a block in the buf_pool.free list to permanently remain in X-latched state, with block->lock.writer_thread==0. We only experienced the hang in a release build, and it never reproduced under rr record. In the end, I disabled buf_page_optimistic_get() and added debug assertions on buf_block_t::lock when the buf_pool.free list is modified. To be able to do that, I had to fix a glitch in dict_check_sys_tables().

Comment by Marko Mäkelä [ 2020-06-09 ]

To reduce contention especially in read-only workloads, we increased svr_n_page_hash_locks from 16 to 64 and added LF_BACKOFF() to the spin loop in rw_lock_lock_word_decr().

Generated at Thu Feb 08 08:18:20 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.