[MDEV-24537] innodb_max_dirty_pages_pct_lwm=0 lost its special meaning - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Blocker
Resolution: Fixed
Affects Version/s: 10.5.7, 10.5.8
Fix Version/s: 10.5.9
Component/s: Storage Engine - InnoDB
Labels:
- performance
- regression

Description

In ~~MDEV-23855~~, I overlooked the fact that the default value 0 of the parameter innodb_max_dirty_pages_pct_lwm has a special meaning: "ignore this parameter, and look at innodb_max_dirty_pages_pct instead". This special value used to partially cancel the effect of the parameter innodb_adaptive_flushing=ON (which is set by default). The special value 0 would cause the function af_get_pct_for_dirty() to always return either 0 or 100.

This regression was originally reported in ~~MDEV-24272~~, mixed up with another performance regression that only affects the 10.2, 10.3, and 10.4 release series but not 10.5. On a hard disk, running a 5-minute oltp_read_write in sysbench with 16 threads and 8 tables with 100000 rows each, I verified valerii's finding, using the following settings on MariaDB 10.5.6 and 10.5.8:

innodb_log_file_size=4G

innodb_buffer_pool_size=1G

innodb_flush_log_at_trx_commit=2

innodb-flush-method=O_DIRECT

On my 2TB Western Digital SATA 3.0 hard disk (WDC WD20EZRZ-00Z5HB0) that has a write performance of 51.9 MB/s (reported by GNOME Disks when using 1MiB block size), I got the following results:

server	average throughput/tps	average latency/ms	maximum latency/ms
10.5.6	4672.70	3.42	1244.87
10.5.8	4147.77	3.86	851.98
10.5.8p	7106.93	2.25	139.15

The last line was produced with the following fix:

diff --git a/storage/innobase/buf/buf0flu.cc b/storage/innobase/buf/buf0flu.cc

--- a/storage/innobase/buf/buf0flu.cc

+++ b/storage/innobase/buf/buf0flu.cc

@@ -2086,6 +2086,12 @@ static os_thread_ret_t DECLARE_THREAD(buf_flush_page_cleaner)(void*)

     const double dirty_pct= double(dirty_blocks) * 100.0 /

       double(UT_LIST_GET_LEN(buf_pool.LRU) + UT_LIST_GET_LEN(buf_pool.free));

+    if (dirty_pct < srv_max_buf_pool_modified_pct)

+      continue;

+    if (srv_max_dirty_pages_pct_lwm == 0.0)

+      continue;

     if (dirty_pct < srv_max_dirty_pages_pct_lwm)

       continue;

This above patch is only applicable to 10.5.7 and 10.5.8 only; the code was slightly refactored in ~~MDEV-24278~~ since then.

I believe that a work-around of this regression is to set innodb_max_dirty_pages_pct_lwm to the same value as innodb_max_dirty_pages_pct (default value: 90).

Side note: The parameter innodb_idle_flush_pct has no effect (~~MDEV-24536~~).

Attachments

Issue Links

causes

MDEV-24917 Flushing starts only when 90% (srv_max_buf_pool_modified_pct) pages are modified

Closed

is caused by

MDEV-23855 InnoDB log checkpointing causes regression for write-heavy OLTP

Closed

relates to

MDEV-24272 Performance regression for sysbench oltp_read_write test in 10.4.17

Closed

MDEV-24949 Enabling idle flushing (possible regression from MDEV-23855)

Closed

MDEV-27295 MariaDB 10.5 does not do idle checkpoint (regression). Marked as fixed on 10.5.13 but it is not

Closed

MDEV-30000 make mariadb-backup to force an innodb checkpoint

Closed

(1 relates to)

Activity

Ascending order - Click to sort in descending order

Marko Mäkelä created issue - 2021-01-06 11:01

Marko Mäkelä made changes - 2021-01-06 11:01

Field	Original Value	New Value
Link		This issue is caused by ~~MDEV-23855~~ [ ~~MDEV-23855~~ ]

Marko Mäkelä made changes - 2021-01-06 12:26

Link

This issue relates to ~~MDEV-24272~~ [ ~~MDEV-24272~~ ]

Marko Mäkelä made changes - 2021-01-06 13:04

Status

Open [ 1 ]

In Progress [ 3 ]

Marko Mäkelä made changes - 2021-01-06 17:35

issue.field.resolutiondate

2021-01-06 17:35:32.0

2021-01-06 17:35:32.063

Marko Mäkelä made changes - 2021-01-06 17:35

Fix Version/s		10.5.9 [ 25109 ]
Fix Version/s	10.5 [ 23123 ]
Resolution		Fixed [ 1 ]
Status	In Progress [ 3 ]	Closed [ 6 ]

Marko Mäkelä made changes - 2021-02-17 06:06

Description

In ~~MDEV-23855~~, I overlooked the fact that the default value 0 of the parameter {{innodb_max_dirty_pages_pct_lwm}} has a special meaning: "ignore this parameter, and look at {{innodb_max_dirty_pages_pct}} instead". This special value used to partially cancel the effect of the parameter {{innodb_adaptive_flushing=ON}} (which is set by default). The special value 0 would cause the function {{af_get_pct_for_dirty()}} to always return either 0 or 100.

This regression was originally reported in ~~MDEV-24272~~, mixed up with another performance regression that only affects the 10.2, 10.3, and 10.4 release series but not 10.5. On a hard disk, running a 5-minute {{oltp_read_write}} in {{sysbench}} with 16 threads and 8 tables with 100000 rows each, I verified [~valerii]'s finding, using the following settings on MariaDB 10.5.6 and 10.5.8:
{noformat}
innodb_log_file_size=4G
innodb_buffer_pool_size=1G
innodb_flush_log_at_trx_commit=2
innodb-flush-method=O_DIRECT
{noformat}
On my 2TB Western Digital SATA 3.0 hard disk (WDC WD20EZRZ-00Z5HB0) that has a write performance of 5.94MB/s, I got the following results:
||server||average throughput/tps||average latency/ms||maximum latency/ms||
|10.5.6|4672.70|3.42|1244.87|
|10.5.8|4147.77|3.86|851.98|
|10.5.8p|7106.93|2.25|139.15|
The last line was produced with the following fix:
{code:diff}
diff --git a/storage/innobase/buf/buf0flu.cc b/storage/innobase/buf/buf0flu.cc
--- a/storage/innobase/buf/buf0flu.cc
+++ b/storage/innobase/buf/buf0flu.cc
@@ -2086,6 +2086,12 @@ static os_thread_ret_t DECLARE_THREAD(buf_flush_page_cleaner)(void*)
     const double dirty_pct= double(dirty_blocks) * 100.0 /
       double(UT_LIST_GET_LEN(buf_pool.LRU) + UT_LIST_GET_LEN(buf_pool.free));

+ if (dirty_pct < srv_max_buf_pool_modified_pct)
+ continue;
+
+ if (srv_max_dirty_pages_pct_lwm == 0.0)
+ continue;
+
     if (dirty_pct < srv_max_dirty_pages_pct_lwm)
       continue;

{code}
This above patch is only applicable to 10.5.7 and 10.5.8 only; the code was slightly refactored in ~~MDEV-24278~~ since then.

I believe that a work-around of this regression is to set {{innodb_max_dirty_pages_pct_lwm}} to the same value as {{innodb_max_dirty_pages_pct}} (default value: 90).

Side note: The parameter {{innodb_idle_flush_pct}} has no effect (~~MDEV-24536~~).

In ~~MDEV-23855~~, I overlooked the fact that the default value 0 of the parameter {{innodb_max_dirty_pages_pct_lwm}} has a special meaning: "ignore this parameter, and look at {{innodb_max_dirty_pages_pct}} instead". This special value used to partially cancel the effect of the parameter {{innodb_adaptive_flushing=ON}} (which is set by default). The special value 0 would cause the function {{af_get_pct_for_dirty()}} to always return either 0 or 100.

This regression was originally reported in ~~MDEV-24272~~, mixed up with another performance regression that only affects the 10.2, 10.3, and 10.4 release series but not 10.5. On a hard disk, running a 5-minute {{oltp_read_write}} in {{sysbench}} with 16 threads and 8 tables with 100000 rows each, I verified [~valerii]'s finding, using the following settings on MariaDB 10.5.6 and 10.5.8:
{noformat}
innodb_log_file_size=4G
innodb_buffer_pool_size=1G
innodb_flush_log_at_trx_commit=2
innodb-flush-method=O_DIRECT
{noformat}
On my 2TB Western Digital SATA 3.0 hard disk (WDC WD20EZRZ-00Z5HB0) that has a write performance of 51.9 MB/s (reported by GNOME Disks when using 1MiB block size), I got the following results:
||server||average throughput/tps||average latency/ms||maximum latency/ms||
|10.5.6|4672.70|3.42|1244.87|
|10.5.8|4147.77|3.86|851.98|
|10.5.8p|7106.93|2.25|139.15|
The last line was produced with the following fix:
{code:diff}
diff --git a/storage/innobase/buf/buf0flu.cc b/storage/innobase/buf/buf0flu.cc
--- a/storage/innobase/buf/buf0flu.cc
+++ b/storage/innobase/buf/buf0flu.cc
@@ -2086,6 +2086,12 @@ static os_thread_ret_t DECLARE_THREAD(buf_flush_page_cleaner)(void*)
     const double dirty_pct= double(dirty_blocks) * 100.0 /
       double(UT_LIST_GET_LEN(buf_pool.LRU) + UT_LIST_GET_LEN(buf_pool.free));

+ if (dirty_pct < srv_max_buf_pool_modified_pct)
+ continue;
+
+ if (srv_max_dirty_pages_pct_lwm == 0.0)
+ continue;
+
     if (dirty_pct < srv_max_dirty_pages_pct_lwm)
       continue;

{code}
This above patch is only applicable to 10.5.7 and 10.5.8 only; the code was slightly refactored in ~~MDEV-24278~~ since then.

I believe that a work-around of this regression is to set {{innodb_max_dirty_pages_pct_lwm}} to the same value as {{innodb_max_dirty_pages_pct}} (default value: 90).

Side note: The parameter {{innodb_idle_flush_pct}} has no effect (~~MDEV-24536~~).

Marko Mäkelä made changes - 2021-02-18 16:08

Link

This issue causes ~~MDEV-24917~~ [ ~~MDEV-24917~~ ]

Marko Mäkelä made changes - 2021-02-23 08:53

Link

This issue relates to ~~MDEV-24949~~ [ ~~MDEV-24949~~ ]

Sergei Golubchik made changes - 2021-12-06 21:52

Workflow

MariaDB v3 [ 117851 ]

MariaDB v4 [ 158757 ]

Marko Mäkelä made changes - 2022-01-24 06:33

Link

This issue relates to ~~MDEV-27295~~ [ ~~MDEV-27295~~ ]

Marko Mäkelä made changes - 2023-03-16 08:07

Link

This issue relates to ~~MDEV-30000~~ [ ~~MDEV-30000~~ ]

Jira Automation (IT) made changes - 2024-07-04 02:35

Zendesk Related Tickets

163832

People

Assignee:: Marko Mäkelä

Reporter:: Marko Mäkelä

Votes:: 1 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2021-01-06 11:01

Updated:: 2024-07-07 21:44

Resolved:: 2021-01-06 17:35

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration