[MDEV-24917] Flushing starts only when 90% (srv_max_buf_pool_modified_pct) pages are modified Created: 2021-02-18  Updated: 2021-02-19  Resolved: 2021-02-19

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.6
Fix Version/s: 10.5.9

Type: Bug Priority: Critical
Reporter: Krunal Bauskar Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: regression-10.5

Issue Links:
Problem/Incident
is caused by MDEV-24537 innodb_max_dirty_pages_pct_lwm=0 lost... Closed
Relates
relates to MDEV-24278 InnoDB page cleaner keeps waking up o... Closed
relates to MDEV-24369 Page cleaner sleeps indefinitely desp... Closed

 Description   

Recently changes in introduced by MDEV-24537 accidentally changed the logic such that the flow will not start flushing pages till the number of the dirty-pages > srv_max_buf_pool_modified_pct (default 90%) when ideally is flushing should kick in when the dirty pages > 10%.

    if (lsn_limit);
****    else if (dirty_pct < srv_max_buf_pool_modified_pct)****
      goto unemployed;
    else if (srv_max_dirty_pages_pct_lwm == 0.0 ||
             dirty_pct < srv_max_dirty_pages_pct_lwm)
      goto unemployed;

Said condition is wrong and should be removed.



 Comments   
Comment by Marko Mäkelä [ 2021-02-18 ]

Thank you for the report. This regression was caused by MDEV-24278, which aims avoid unnecessary regular wake-up of the page cleaner thread.
The following change should fix this:

diff --git a/storage/innobase/buf/buf0flu.cc b/storage/innobase/buf/buf0flu.cc
index 21a01dbd2fe..0acf9334c2c 100644
--- a/storage/innobase/buf/buf0flu.cc
+++ b/storage/innobase/buf/buf0flu.cc
@@ -2136,11 +2136,13 @@ static os_thread_ret_t DECLARE_THREAD(buf_flush_page_cleaner)(void*)
       double(UT_LIST_GET_LEN(buf_pool.LRU) + UT_LIST_GET_LEN(buf_pool.free));
 
     if (lsn_limit);
+    else if (srv_max_dirty_pages_pct_lwm != 0.0)
+    {
+      if (dirty_pct < srv_max_dirty_pages_pct_lwm)
+        goto unemployed;
+    }
     else if (dirty_pct < srv_max_buf_pool_modified_pct)
       goto unemployed;
-    else if (srv_max_dirty_pages_pct_lwm == 0.0 ||
-             dirty_pct < srv_max_dirty_pages_pct_lwm)
-      goto unemployed;
 
     const lsn_t oldest_lsn= buf_pool.get_oldest_modified()
       ->oldest_modification();

Now, how to avoid the unwanted goto unemployed if the above fix is not present? It looks like the only way to do this could be setting the two parameters to the same value. But, because the value innodb_max_dirty_pages_pct_lwm=0 is special (MDEV-24537), we should use a value close to it if the aim is to have the buffer pool flushed:

SET GLOBAL innodb_max_dirty_pages_pct_lwm=0.001;
SET GLOBAL innodb_max_dirty_pages_pct=0.001;

As far as I can, tell this bug makes the parameter innodb_max_dirty_pages_pct_lwm basically useless and prevents the adaptive flushing from working.

Comment by Marko Mäkelä [ 2021-02-18 ]

I can verify that indeed the cause of this is MDEV-24537, which was applied after MDEV-24278. One more change will be necessary to ensure that SET GLOBAL innodb_max_dirty_pages_pct will wake up the sleeping page cleaner thread:

diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc
index d306311365d..d54987889fd 100644
--- a/storage/innobase/handler/ha_innodb.cc
+++ b/storage/innobase/handler/ha_innodb.cc
@@ -17206,10 +17206,10 @@ innodb_max_dirty_pages_pct_update(
 				    in_val);
 
 		srv_max_dirty_pages_pct_lwm = in_val;
-		pthread_cond_signal(&buf_pool.do_flush_list);
 	}
 
 	srv_max_buf_pool_modified_pct = in_val;
+	pthread_cond_signal(&buf_pool.do_flush_list);
 }
 
 /****************************************************************//**

Comment by Axel Schwenke [ 2021-02-18 ]

verified to work with MariaDB10.5.9 (9d7dc1f6d04)

Generated at Thu Feb 08 09:33:42 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.