Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
10.5.4, 10.6.0
Description
The replacement of buf_pool.page_hash with a different type of hash table in MDEV-22871 introduced a race condition with buffer pool resizing.
We have an execution trace where buf_pool.page_hash.array is changed to point to something else while page_hash_latch::read_lock() is executing. The same should also affect page_hash_latch::write_lock().
The wait loop currently fails to notice that buffer pool resizing is in progress. A part of the problem is that we are waiting too deep in the code:
template<bool exclusive> page_hash_latch *lock(ulint fold) |
{
|
for (;;) |
{
|
auto n= n_cells;
|
page_hash_latch *latch= lock_get(fold, n);
|
latch->acquire<exclusive>();
|
/* Our latch prevents n_cells from changing. */ |
if (UNIV_LIKELY(n == n_cells)) |
return latch; |
/* Retry, because buf_pool_t::resize_hash() affected us. */ |
latch->release<exclusive>();
|
}
|
}
|
We are actually waiting inside page_hash_latch::read_lock() or page_hash_latch::write_lock() on memory that may no longer belong to the buf_pool.page_hash.array. We would need some notion of timeout or temporary failure when the buffer pool is being resized.
Attachments
Issue Links
- blocks
-
MDEV-26826 Duplicated computations of buf_pool.page_hash addresses
- Closed
- duplicates
-
MDEV-24030 SUMMARY: AddressSanitizer: heap-use-after-free failure in hash_get_lock
- Closed
- is caused by
-
MDEV-22871 Contention on the buf_pool.page_hash
- Closed