Details

    • 10.3.6-1, 10.4.0-1

    Description

      MySQL 8.0.0 split the InnoDB buf_pool_t::mutex. MariaDB should do something similar.

      Instead of introducing more mutexes or radically changing the latching rules of various buf_pool_t and buf_block_t data members, I think that it is possible to reduce the contention on buf_pool.mutex by other means:

      • Move more code to inline functions of buf_pool_t or buf_page_t.
      • Reduce the amount of mutex release/reacquire dance in buf0flu.cc and buf0rea.cc.
      • Avoid repeated calls to page_id_t::fold() or page_id_t::page_id_t(); use page_id_t directly as loop iterator.
      • Move buf_page_t::flush_type to IORequest.
      • Split buf_page_io_complete() into separate ‘read completion’ and ‘write completion’ callbacks.
      • Avoid holding buf_pool.mutex during buf_pool.page_hash operations. Consider removing the debug field buf_page_t::in_page_hash.
      • Split operations on buf_pool.watch[] into two parts. The allocation of buf_pool.watch[] should be only protected by buf_pool.mutex, and the buf_pool.page_hash only by the hash bucket latch.

      Attachments

        Issue Links

          Activity

            marko Marko Mäkelä added a comment - This was originally contributed by kastauyra as MySQL Bug #75534: Solve buffer pool mutex contention by splitting it .

            Make sure to review https://bugs.mysql.com/bug.php?id=85205, https://bugs.mysql.com/bug.php?id=85208. If your starting point is Oracle 8.0.3 commit, try to find follow-up fixes too, I only recall 946996a3767, e34c6b0dea2 from the top of my head.

            kastauyra Laurynas Biveinis added a comment - Make sure to review https://bugs.mysql.com/bug.php?id=85205 , https://bugs.mysql.com/bug.php?id=85208 . If your starting point is Oracle 8.0.3 commit, try to find follow-up fixes too, I only recall 946996a3767, e34c6b0dea2 from the top of my head.

            kastauyra, thank you for the helpful pointers!
            I noticed that the Oracle version of bug75534-2.patch omitted the contributed test changes. thiru, can you please try to add them?
            I just pushed a test version of this. I think that more work is needed for the initial commit, because I believe that we really need to use the atomic memory access primitives in order to work correctly, especially on ARM and POWER.

            marko Marko Mäkelä added a comment - kastauyra , thank you for the helpful pointers! I noticed that the Oracle version of bug75534-2.patch omitted the contributed test changes. thiru , can you please try to add them? I just pushed a test version of this . I think that more work is needed for the initial commit, because I believe that we really need to use the atomic memory access primitives in order to work correctly, especially on ARM and POWER.

            thiru, a search of the MySQL 8.0 commit history reveals quite a few follow-up commits. Please cherry-pick those as well.
            3b1718b8150ea92166111798c5dc6a11a0e4bfeb (with test case)
            d887a49e84a618e150dfa90d3b977614dc0cb020 (looks reasonable)
            bd914aebb3ec510784f364e5e71cd35bb8a0cad5 (with test case)
            65649a16ce9412f7d9286e7e6ad6c2de5769be25 (looks questionable; let us skip it, and review the atomics instead)

            marko Marko Mäkelä added a comment - thiru , a search of the MySQL 8.0 commit history reveals quite a few follow-up commits. Please cherry-pick those as well. 3b1718b8150ea92166111798c5dc6a11a0e4bfeb (with test case) d887a49e84a618e150dfa90d3b977614dc0cb020 (looks reasonable) bd914aebb3ec510784f364e5e71cd35bb8a0cad5 (with test case) 65649a16ce9412f7d9286e7e6ad6c2de5769be25 (looks questionable; let us skip it, and review the atomics instead)

            Let us use the new branch bb-10.3-MDEV-15053 for this.
            I cherry-picked the following:
            3b1718b8150ea92166111798c5dc6a11a0e4bfeb (without test case or debug instrumentation)
            d887a49e84a618e150dfa90d3b977614dc0cb020 (maybe we should test this while concurrently resizing the buffer pool)
            I filed MDEV-15384 for a regression that MariaDB 10.2.2 got earlier, in an incorrect merge of InnoDB code from MySQL 5.7.9.

            What remains to be done:

            1. Review and revise the use of atomics (use my_atomic_loadlint() or similar for reads)
            2. Check if the test cases of bug75534-2.patch can be imported.
            3. (for me): Squash your follow-up fixes the first commit, and review it.
            marko Marko Mäkelä added a comment - Let us use the new branch bb-10.3-MDEV-15053 for this. I cherry-picked the following: 3b1718b8150ea92166111798c5dc6a11a0e4bfeb (without test case or debug instrumentation) d887a49e84a618e150dfa90d3b977614dc0cb020 (maybe we should test this while concurrently resizing the buffer pool) I filed MDEV-15384 for a regression that MariaDB 10.2.2 got earlier, in an incorrect merge of InnoDB code from MySQL 5.7.9. What remains to be done: Review and revise the use of atomics (use my_atomic_loadlint() or similar for reads) Check if the test cases of bug75534-2.patch can be imported. (for me): Squash your follow-up fixes the first commit, and review it.

            In MySQL 8.0.14 there is a related bug fix:
            Bug#28556539 LOCK_ORDER: CYCLE INVOLVING BUF_POOL_FREE_LIST_MUTEX AND BUF_POOL_ZIP_FREE_MUTEX
            The buf_pool->free_list_mutex was introduced in MySQL 8.0.0.

            marko Marko Mäkelä added a comment - In MySQL 8.0.14 there is a related bug fix: Bug#28556539 LOCK_ORDER: CYCLE INVOLVING BUF_POOL_FREE_LIST_MUTEX AND BUF_POOL_ZIP_FREE_MUTEX The buf_pool->free_list_mutex was introduced in MySQL 8.0.0 .

            The work is not only splitting buf_pool_t::mutex but also changing some data fields to rely on atomic memory operations instead of mutexes for controlling concurrent access.

            The work so far was done on 10.3, where std::atomic is not available to InnoDB. Starting with 10.4, it is available and should be used. I think that we should try to use std::memory_order_relaxed where available.

            marko Marko Mäkelä added a comment - The work is not only splitting buf_pool_t::mutex but also changing some data fields to rely on atomic memory operations instead of mutexes for controlling concurrent access. The work so far was done on 10.3, where std::atomic is not available to InnoDB. Starting with 10.4, it is available and should be used. I think that we should try to use std::memory_order_relaxed where available.

            This will require some more work and a careful review. At the moment, cmake -DWITH_ASAN is reporting heap-use-after-free in numerous tests.

            marko Marko Mäkelä added a comment - This will require some more work and a careful review. At the moment, cmake -DWITH_ASAN is reporting heap-use-after-free in numerous tests.

            Result of RQG testing on origin/bb-10.5-marko commit ab5fe285276ab2dc4d87fb59a31655db51706da5
            The binaries build from this tree were neither better nor worse than actual 10.5.

            mleich Matthias Leich added a comment - Result of RQG testing on origin/bb-10.5-marko commit ab5fe285276ab2dc4d87fb59a31655db51706da5 The binaries build from this tree were neither better nor worse than actual 10.5.

            Result of RQG testing on origin/bb-10.5-MDEV-15053-3 577ac7de0b26e595bc7bf678c25a65a0b4c92edc 2020-05-27T16:05:27+03:00
            The binaries build from this tree were neither better nor worse than actual 10.5.

            mleich Matthias Leich added a comment - Result of RQG testing on origin/bb-10.5- MDEV-15053 -3 577ac7de0b26e595bc7bf678c25a65a0b4c92edc 2020-05-27T16:05:27+03:00 The binaries build from this tree were neither better nor worse than actual 10.5.

            kevg, after your review I still fixed one thing that probably caused the occasional hangs that axel reproduced in buf_page_init_for_read(). I had wrongly released the buf_pool.page_hash latch before buf_page_t::set_io_fix(BUF_IO_READ) was called. This was a functional change from earlier.

            marko Marko Mäkelä added a comment - kevg , after your review I still fixed one thing that probably caused the occasional hangs that axel reproduced in buf_page_init_for_read() . I had wrongly released the buf_pool.page_hash latch before buf_page_t::set_io_fix(BUF_IO_READ) was called. This was a functional change from earlier.

            The hang was very elusive, but finally I found the reason. Now that we removed buf_block_t::mutex, we must release buf_block_t::lock before invoking buf_block_t::unfix() or buf_block_t::io_unfix(). Otherwise, something bad will happen somewhere, causing a block in the buf_pool.free list to permanently remain in X-latched state, with block->lock.writer_thread==0. We only experienced the hang in a release build, and it never reproduced under rr record. In the end, I disabled buf_page_optimistic_get() and added debug assertions on buf_block_t::lock when the buf_pool.free list is modified. To be able to do that, I had to fix a glitch in dict_check_sys_tables().

            marko Marko Mäkelä added a comment - The hang was very elusive, but finally I found the reason. Now that we removed buf_block_t::mutex , we must release buf_block_t::lock before invoking buf_block_t::unfix() or buf_block_t::io_unfix() . Otherwise, something bad will happen somewhere, causing a block in the buf_pool.free list to permanently remain in X-latched state, with block->lock.writer_thread==0 . We only experienced the hang in a release build, and it never reproduced under rr record . In the end, I disabled buf_page_optimistic_get() and added debug assertions on buf_block_t::lock when the buf_pool.free list is modified. To be able to do that, I had to fix a glitch in dict_check_sys_tables() .

            To reduce contention especially in read-only workloads, we increased svr_n_page_hash_locks from 16 to 64 and added LF_BACKOFF() to the spin loop in rw_lock_lock_word_decr().

            marko Marko Mäkelä added a comment - To reduce contention especially in read-only workloads, we increased svr_n_page_hash_locks from 16 to 64 and added LF_BACKOFF() to the spin loop in rw_lock_lock_word_decr() .

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              1 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.