Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26033

Race condition between buf_pool.page_hash and buffer pool resizing

    XMLWordPrintable

    Details

      Description

      The replacement of buf_pool.page_hash with a different type of hash table in MDEV-22871 introduced a race condition with buffer pool resizing.

      We have an execution trace where buf_pool.page_hash.array is changed to point to something else while page_hash_latch::read_lock() is executing. The same should also affect page_hash_latch::write_lock().

      The wait loop currently fails to notice that buffer pool resizing is in progress. A part of the problem is that we are waiting too deep in the code:

          template<bool exclusive> page_hash_latch *lock(ulint fold)
          {
            for (;;)
            {
              auto n= n_cells;
              page_hash_latch *latch= lock_get(fold, n);
              latch->acquire<exclusive>();
              /* Our latch prevents n_cells from changing. */
              if (UNIV_LIKELY(n == n_cells))
                return latch;
              /* Retry, because buf_pool_t::resize_hash() affected us. */
              latch->release<exclusive>();
            }
          }
      

      We are actually waiting inside page_hash_latch::read_lock() or page_hash_latch::write_lock() on memory that may no longer belong to the buf_pool.page_hash.array. We would need some notion of timeout or temporary failure when the buffer pool is being resized.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              marko Marko Mäkelä
              Reporter:
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Git Integration