Details
-
Bug
-
Status: Closed (View Workflow)
-
Blocker
-
Resolution: Fixed
-
10.6, 10.10(EOL), 10.11, 11.0(EOL), 11.1(EOL), 11.2(EOL)
Description
While running performance tests with a small buffer pool, I encountered an anomaly where InnoDB would hang because of running out of buffer pool. There would be some actually clean blocks at buf_pool.flush_list.start, but these would be skipped by buf_flush_LRU_list_batch(). The buf_pool.flush_list.end is being trimmed by periodic calls to buf_pool.get_oldest_modification(). I suspect that the entire buf_pool.LRU would be skipped because of being buffer-fixed, latched, or registered in buf_pool.flush_list due to the MDEV-26010 optimization.
This seems to be a 10.6 regression due to MDEV-26827.
MDEV-32050 would make this problem worse by allowing the purge_coordinator_task to buffer-fix a large number of pages.
Attachments
Issue Links
- is caused by
-
MDEV-26827 Make page flushing even faster
-
- Closed
-
- relates to
-
MDEV-33613 InnoDB may still hang when temporarily running out of buffer pool
-
- Closed
-
-
MDEV-26010 Assertion `lsn > 2' failed in buf_pool_t::get_oldest_modification
-
- Closed
-
-
MDEV-32050 UNDO logs still growing for write-intensive workloads
-
- Closed
-
-
MDEV-33508 Performance regression due to frequent scan of full buf_pool.flush_list
-
- Closed
-
I think that the following patch fixes this.
diff a/storage/innobase/buf/buf0flu.cc b/storage/innobase/buf/buf0flu.cc
--- a/storage/innobase/buf/buf0flu.cc
+++ b/storage/innobase/buf/buf0flu.cc
@@ -1246,16 +1246,14 @@ static void buf_flush_LRU_list_batch(ulint max, bool evict,
ut_ad(state >= buf_page_t::FREED);
ut_ad(bpage->in_LRU_list);
- switch (bpage->oldest_modification()) {
- case 0:
+ if (!bpage->oldest_modification())
+ {
evict:
if (state != buf_page_t::FREED &&
(state >= buf_page_t::READ_FIX || (~buf_page_t::LRU_MASK & state)))
continue;
buf_LRU_free_page(bpage, true);
++n->evicted;
- /* fall through */
- case 1:
if (UNIV_LIKELY(scanned & 31))
continue;
mysql_mutex_unlock(&buf_pool.mutex);
@@ -1271,7 +1269,11 @@ static void buf_flush_LRU_list_batch(ulint max, bool evict,
switch (bpage->oldest_modification()) {
case 1:
mysql_mutex_lock(&buf_pool.flush_list_mutex);
- buf_pool.delete_from_flush_list(bpage);
+ if (ut_d(lsn_t lsn=) bpage->oldest_modification())
+ {
+ ut_ad(lsn == 1); /* It must be clean while we hold bpage->lock */
+ buf_pool.delete_from_flush_list(bpage);
+ }
mysql_mutex_unlock(&buf_pool.flush_list_mutex);
/* fall through */
Before
MDEV-26827, we were acting upon oldest_modification==1 while already holding buf_pool.flush_list_mutex. Both regressions were introduced by me inMDEV-26827.