[MDEV-32134] InnoDB hang in buf_flush_wait_LRU_batch_end() Created: 2023-09-08  Updated: 2023-10-03  Resolved: 2023-09-11

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.6
Fix Version/s: 10.6.16, 10.10.7, 10.11.6, 11.0.4, 11.1.3

Type: Bug Priority: Blocker
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: hang, regression, rr-profile-analyzed

Issue Links:
Relates
relates to MDEV-26827 Make page flushing even faster Closed
relates to MDEV-32029 Assertion failures in log_sort_flush_... Closed

 Description   

mleich provided rr replay traces of a 10.6 based testing branch where InnoDB shutdown would hang after shrinking the InnoDB buffer pool was initiated.

This might have been caused by MDEV-26827 in 10.6.13. The fix of MDEV-32029 did refactor some related code, but I don’t think the hang should be related to that. The reason for the hang is that buf_flush_wait_LRU_batch_end() (invoked by buf_pool_t::withdraw_blocks() when shrinking the buffer pool) would wait indefinitely for buf_pool.done_flush_LRU, and the buf_flush_page_cleaner() would wait indefinitely for buf_pool.do_flush_list. I think that the fix (to be tested by mleich) would be as follows:

diff --git a/storage/innobase/buf/buf0flu.cc b/storage/innobase/buf/buf0flu.cc
index 9826320a358..06670e317a4 100644
--- a/storage/innobase/buf/buf0flu.cc
+++ b/storage/innobase/buf/buf0flu.cc
@@ -2295,6 +2295,7 @@ static void buf_flush_page_cleaner()
     set_idle:
       buf_pool.page_cleaner_set_idle(true);
     set_almost_idle:
+      pthread_cond_broadcast(&buf_pool.done_flush_LRU);
       pthread_cond_broadcast(&buf_pool.done_flush_list);
       if (UNIV_UNLIKELY(srv_shutdown_state > SRV_SHUTDOWN_INITIATED))
         break;


Generated at Thu Feb 08 10:29:05 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.