[MDEV-24537] innodb_max_dirty_pages_pct_lwm=0 lost its special meaning Created: 2021-01-06  Updated: 2023-03-16  Resolved: 2021-01-06

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.5.7, 10.5.8
Fix Version/s: 10.5.9

Type: Bug Priority: Blocker
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 1
Labels: performance, regression

Issue Links:
Problem/Incident
causes MDEV-24917 Flushing starts only when 90% (srv_ma... Closed
is caused by MDEV-23855 InnoDB log checkpointing causes regre... Closed
Relates
relates to MDEV-24272 Performance regression for sysbench o... Stalled
relates to MDEV-24949 Enabling idle flushing (possible regr... Closed
relates to MDEV-27295 MariaDB 10.5 does not do idle checkpo... Closed
relates to MDEV-30000 make mariadb-backup to force an innod... Open

 Description   

In MDEV-23855, I overlooked the fact that the default value 0 of the parameter innodb_max_dirty_pages_pct_lwm has a special meaning: "ignore this parameter, and look at innodb_max_dirty_pages_pct instead". This special value used to partially cancel the effect of the parameter innodb_adaptive_flushing=ON (which is set by default). The special value 0 would cause the function af_get_pct_for_dirty() to always return either 0 or 100.

This regression was originally reported in MDEV-24272, mixed up with another performance regression that only affects the 10.2, 10.3, and 10.4 release series but not 10.5. On a hard disk, running a 5-minute oltp_read_write in sysbench with 16 threads and 8 tables with 100000 rows each, I verified valerii's finding, using the following settings on MariaDB 10.5.6 and 10.5.8:

innodb_log_file_size=4G
innodb_buffer_pool_size=1G
innodb_flush_log_at_trx_commit=2
innodb-flush-method=O_DIRECT

On my 2TB Western Digital SATA 3.0 hard disk (WDC WD20EZRZ-00Z5HB0) that has a write performance of 51.9 MB/s (reported by GNOME Disks when using 1MiB block size), I got the following results:

server average throughput/tps average latency/ms maximum latency/ms
10.5.6 4672.70 3.42 1244.87
10.5.8 4147.77 3.86 851.98
10.5.8p 7106.93 2.25 139.15

The last line was produced with the following fix:

diff --git a/storage/innobase/buf/buf0flu.cc b/storage/innobase/buf/buf0flu.cc
--- a/storage/innobase/buf/buf0flu.cc
+++ b/storage/innobase/buf/buf0flu.cc
@@ -2086,6 +2086,12 @@ static os_thread_ret_t DECLARE_THREAD(buf_flush_page_cleaner)(void*)
     const double dirty_pct= double(dirty_blocks) * 100.0 /
       double(UT_LIST_GET_LEN(buf_pool.LRU) + UT_LIST_GET_LEN(buf_pool.free));
 
+    if (dirty_pct < srv_max_buf_pool_modified_pct)
+      continue;
+
+    if (srv_max_dirty_pages_pct_lwm == 0.0)
+      continue;
+
     if (dirty_pct < srv_max_dirty_pages_pct_lwm)
       continue;
 

This above patch is only applicable to 10.5.7 and 10.5.8 only; the code was slightly refactored in MDEV-24278 since then.

I believe that a work-around of this regression is to set innodb_max_dirty_pages_pct_lwm to the same value as innodb_max_dirty_pages_pct (default value: 90).

Side note: The parameter innodb_idle_flush_pct has no effect (MDEV-24536).



 Comments   
Comment by Marko Mäkelä [ 2021-01-11 ]

I think that the default behavior of MariaDB 10.5.7 and 10.5.8 can be emulated by the following:

SET GLOBAL innodb_max_dirty_pages_pct_lwm=0.0001;

Comment by Marko Mäkelä [ 2021-02-18 ]

After the MDEV-24917 fix, the performance is similar, on the same hardware:

server average throughput/tps average latency/ms maximum latency/ms
10.5.6 4672.70 3.42 1244.87
10.5.8 4147.77 3.86 851.98
10.5.8p 7106.93 2.25 139.15
10.5.9-MDEV-24917 7135.80 2.24 76.26
Generated at Thu Feb 08 09:30:44 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.