Details
-
Bug
-
Status: Stalled (View Workflow)
-
Major
-
Resolution: Unresolved
-
10.6, 10.11, 11.4
Description
It seems that one more regression may have been caused by MDEV-33053. We have seen occasional failures of the test innodb_gis.types where InnoDB would hang during crash recovery, while being low on the buffer pool.
mleich produced a core dump where this happens during recovery, with the following stack trace in the thread that is waiting to allocate a block:
buf_LRU_get_free_block
|
recv_sys_t::recover_low
|
recv_sys_t::recover
|
buf_page_get_gen
|
trx_undo_mem_create_at_db_start
|
trx_undo_lists_init
|
trx_rseg_mem_restore
|
trx_rseg_array_init
|
trx_lists_init_at_db_start
|
srv_start
|
innodb_init
|
Both buf_pool.free and buf_pool.flush_list are empty. In buf_pool.LRU there were 248 blocks; the innodb_buffer_pool_size could correspond to 512. I could see at least one block that was read-latched and buffer-fixed, but many of the blocks were actually in a replaceable state.
It seems to me that the buf_pool_page_cleaner thread was being woken up about once per second, but buf_pool_t::need_LRU_eviction() would likely fail to hold. I believe that the following should prevent this:
diff --git a/storage/innobase/buf/buf0flu.cc b/storage/innobase/buf/buf0flu.cc
|
index 4c270d2bdef..df85feb603a 100644
|
--- a/storage/innobase/buf/buf0flu.cc
|
+++ b/storage/innobase/buf/buf0flu.cc
|
@@ -2564,6 +2564,7 @@ static void buf_flush_page_cleaner()
|
ATTRIBUTE_COLD void buf_pool_t::LRU_warn()
|
{
|
mysql_mutex_assert_owner(&mutex);
|
+ try_LRU_scan= false;
|
if (!LRU_warned.test_and_set(std::memory_order_acquire))
|
sql_print_warning("InnoDB: Could not free any blocks in the buffer pool!"
|
" %zu blocks are in use and %zu free." |
The loop in buf_pool_t::need_LRU_eviction() invokes this function. Setting the flag would ensure that buf_flush_page_cleaner will do something to alleviate the situation.
As far as I understand, this hang is only possible with a small buffer pool when a large part of the buffer pool is being allocated for something else (such as crash recovery, the adaptive hash index, or explicit locks) so that buf_pool.LRU.count is below 256 (BUF_LRU_MIN_LEN). In that regard, this would be a follow-up fix to MDEV-34166.
Attachments
Issue Links
- relates to
-
MDEV-33053 InnoDB LRU flushing does not run before running out of buffer pool
- Closed
-
MDEV-33613 InnoDB may still hang when temporarily running out of buffer pool
- Closed
-
MDEV-34166 Server could hang with BP < 80M under stress
- Closed
-
MDEV-34265 Possible hang during IO burst with innodb_flush_sync enabled
- Closed