|
The replacement of buf_pool.page_hash with a different type of hash table in MDEV-22871 introduced a race condition with buffer pool resizing.
We have an execution trace where buf_pool.page_hash.array is changed to point to something else while page_hash_latch::read_lock() is executing. The same should also affect page_hash_latch::write_lock().
The wait loop currently fails to notice that buffer pool resizing is in progress. A part of the problem is that we are waiting too deep in the code:
template<bool exclusive> page_hash_latch *lock(ulint fold)
|
{
|
for (;;)
|
{
|
auto n= n_cells;
|
page_hash_latch *latch= lock_get(fold, n);
|
latch->acquire<exclusive>();
|
/* Our latch prevents n_cells from changing. */
|
if (UNIV_LIKELY(n == n_cells))
|
return latch;
|
/* Retry, because buf_pool_t::resize_hash() affected us. */
|
latch->release<exclusive>();
|
}
|
}
|
We are actually waiting inside page_hash_latch::read_lock() or page_hash_latch::write_lock() on memory that may no longer belong to the buf_pool.page_hash.array. We would need some notion of timeout or temporary failure when the buffer pool is being resized.
|