[MDEV-26033] Race condition between buf_pool.page_hash and buffer pool resizing Created: 2021-06-28  Updated: 2022-04-07  Resolved: 2021-07-03

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.5.4, 10.6.0
Fix Version/s: 10.5.12, 10.6.3

Type: Bug Priority: Critical
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: regression, rr-profile-analyzed

Attachments: File grammar.yy    
Issue Links:
Blocks
blocks MDEV-26826 Duplicated computations of buf_pool.p... Closed
Duplicate
duplicates MDEV-24030 SUMMARY: AddressSanitizer: heap-use-a... Closed
Problem/Incident
is caused by MDEV-22871 Contention on the buf_pool.page_hash Closed

 Description   

The replacement of buf_pool.page_hash with a different type of hash table in MDEV-22871 introduced a race condition with buffer pool resizing.

We have an execution trace where buf_pool.page_hash.array is changed to point to something else while page_hash_latch::read_lock() is executing. The same should also affect page_hash_latch::write_lock().

The wait loop currently fails to notice that buffer pool resizing is in progress. A part of the problem is that we are waiting too deep in the code:

    template<bool exclusive> page_hash_latch *lock(ulint fold)
    {
      for (;;)
      {
        auto n= n_cells;
        page_hash_latch *latch= lock_get(fold, n);
        latch->acquire<exclusive>();
        /* Our latch prevents n_cells from changing. */
        if (UNIV_LIKELY(n == n_cells))
          return latch;
        /* Retry, because buf_pool_t::resize_hash() affected us. */
        latch->release<exclusive>();
      }
    }

We are actually waiting inside page_hash_latch::read_lock() or page_hash_latch::write_lock() on memory that may no longer belong to the buf_pool.page_hash.array. We would need some notion of timeout or temporary failure when the buffer pool is being resized.



 Comments   
Comment by Marko Mäkelä [ 2021-07-03 ]

I considered a few options for a fix:

  1. Modify the code so that it polls buf_pool.resizing. As far as I understand, this might only reduce the probability of a race condition, but not completely prevent it. It would also have to be tested carefully for performance impact.
  2. Add a global rw-lock acquisition around the page_hash_latch acquisition, similar to lock_sys.latch in 10.6 (which mainly exists due to other purposes than buffer pool resizing). That would become an obvious scalability bottleneck.
  3. When resizing the buffer pool, never resize the buf_pool.page_hash table. This was the chosen solution, due to low risk and possibly improved performance for the usual case that the buffer pool is never being resized.
Generated at Thu Feb 08 09:42:15 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.