[MDEV-35409] InnoDB can still hang while running out of buffer pool - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.6, 10.11, 11.4
Fix Version/s: 10.6.21, 10.11.11, 11.4.5
Component/s: Storage Engine - InnoDB
Labels:
- hang
- recovery

Description

It seems that one more regression may have been caused by ~~MDEV-33053~~. We have seen occasional failures of the test innodb_gis.types where InnoDB would hang during crash recovery, while being low on the buffer pool.

mleich produced a core dump where this happens during recovery, with the following stack trace in the thread that is waiting to allocate a block:

buf_LRU_get_free_block

recv_sys_t::recover_low

recv_sys_t::recover

buf_page_get_gen

trx_undo_mem_create_at_db_start

trx_undo_lists_init

trx_rseg_mem_restore

trx_rseg_array_init

trx_lists_init_at_db_start

srv_start

innodb_init

Both buf_pool.free and buf_pool.flush_list are empty. In buf_pool.LRU there were 248 blocks; the innodb_buffer_pool_size could correspond to 512. I could see at least one block that was read-latched and buffer-fixed, but many of the blocks were actually in a replaceable state.

It seems to me that the buf_pool_page_cleaner thread was being woken up about once per second, but buf_pool_t::need_LRU_eviction() would likely fail to hold. I believe that the following should prevent this:

diff --git a/storage/innobase/buf/buf0flu.cc b/storage/innobase/buf/buf0flu.cc

index 4c270d2bdef..df85feb603a 100644

--- a/storage/innobase/buf/buf0flu.cc

+++ b/storage/innobase/buf/buf0flu.cc

@@ -2564,6 +2564,7 @@ static void buf_flush_page_cleaner()

 ATTRIBUTE_COLD void buf_pool_t::LRU_warn()

   mysql_mutex_assert_owner(&mutex);

+  try_LRU_scan= false;

   if (!LRU_warned.test_and_set(std::memory_order_acquire))

     sql_print_warning("InnoDB: Could not free any blocks in the buffer pool!"

                       " %zu blocks are in use and %zu free."

The loop in buf_pool_t::need_LRU_eviction() invokes this function. Setting the flag would ensure that buf_flush_page_cleaner will do something to alleviate the situation.

As far as I understand, this hang is only possible with a small buffer pool when a large part of the buffer pool is being allocated for something else (such as crash recovery, the adaptive hash index, or explicit locks) so that buf_pool.LRU.count is below 256 (BUF_LRU_MIN_LEN). In that regard, this would be a follow-up fix to ~~MDEV-34166~~.

Attachments

Issue Links

relates to

MDEV-33053 InnoDB LRU flushing does not run before running out of buffer pool

Closed

MDEV-33613 InnoDB may still hang when temporarily running out of buffer pool

Closed

MDEV-34166 Server could hang with BP < 80M under stress

Closed

MDEV-34265 Possible hang during IO burst with innodb_flush_sync enabled

Closed

MDEV-36226 Stall and crash when page cleaner fails to generate free pages during Async flush

Closed

Activity

People

Assignee:: Marko Mäkelä

Reporter:: Marko Mäkelä

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 2024-11-13 08:50

Updated:: 2025-03-05 14:02

Resolved:: 2024-11-18 07:32

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.