[MDEV-26055] Adaptive flushing is still not getting invoked in 10.5.11 Created: 2021-06-30  Updated: 2023-12-18  Resolved: 2023-03-16

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.5.11
Fix Version/s: 10.11.3, 11.0.2, 10.6.13, 10.8.8, 10.9.6, 10.10.4, 10.5.24

Type: Bug Priority: Major
Reporter: Gaurav Tomar Assignee: Marko Mäkelä
Resolution: Fixed Votes: 1
Labels: regression
Environment:

Ubuntu 20.04 LTS


Attachments: PNG File BP data.png     PNG File Checkpoint.png     PNG File Screenshot 2022-02-22 at 11-08-30 MySQL InnoDB Details - Percona Monitoring and Management.png     PNG File checkpoint_10_5_12.png     HTML File pt-mysql-summary    
Issue Links:
Blocks
blocks MDEV-26827 Make page flushing even faster Closed
blocks MDEV-33053 InnoDB LRU flushing does not run befo... Closed
Problem/Incident
causes MDEV-31939 Adaptive flush recommendation ignores... Closed
causes MDEV-32861 InnoDB hangs when running out of I/O ... Closed
is caused by MDEV-24369 Page cleaner sleeps indefinitely desp... Closed
Relates
relates to MDEV-6932 Enable Lazy Flushing Closed
relates to MDEV-31956 SSD based InnoDB buffer pool extension Stalled
relates to MDEV-25093 Adaptive flushing fails to kick in ev... Closed
relates to MDEV-26004 Excessive wait times in buf_LRU_get_f... Closed
relates to MDEV-29967 innodb_read_ahead_threshold (linear r... Closed
relates to MDEV-32681 Test case innodb.undo_truncate_recove... Closed

 Description   

Hi Team,

This issue is related to MDEV-25093. Adaptive flushing is not getting invoked after crossing the configured value of innodb_adaptive_flushing_lwm. I have attached the graph for checkpoint.

My setup is 3 node galera cluster running 10.5.11 version and 26.4.8-focal galera package installed on ubuntu focal 20.04.

MDEV-25093 is supposed to be fixed in 10.5.10, but I see a similar behaviour in all the mariadb versions starting from 10.5.7 to 10.5.11. I have shared the graph for 10.5.11 though.

Below are the configs I used.

 
MariaDB [(none)]> select version();
+-------------------------------------------+
| version()                                 |
+-------------------------------------------+
| 10.5.11-MariaDB-1:10.5.11+maria~focal-log |
+-------------------------------------------+
 
 
MariaDB [(none)]> show global variables like '%innodb_max_dirty_pages_pct%';
+--------------------------------+-----------+
| Variable_name                  | Value     |
+--------------------------------+-----------+
| innodb_max_dirty_pages_pct     | 90.000000 |
| innodb_max_dirty_pages_pct_lwm | 10.000000 |
+--------------------------------+-----------+
 
MariaDB [(none)]> show global variables like '%innodb_adaptive_flushing%';
+------------------------------+-----------+
| Variable_name                | Value     |
+------------------------------+-----------+
| innodb_adaptive_flushing     | ON        |
| innodb_adaptive_flushing_lwm | 10.000000 |
+------------------------------+-----------+
 
MariaDB [(none)]> show variables like '%innodb_flush%';
+--------------------------------+----------+
| Variable_name                  | Value    |
+--------------------------------+----------+
| innodb_flush_log_at_timeout    | 1        |
| innodb_flush_log_at_trx_commit | 0        |
| innodb_flush_method            | O_DIRECT |
| innodb_flush_neighbors         | 1        |
| innodb_flush_sync              | ON      |
| innodb_flushing_avg_loops      | 30       |
+--------------------------------+----------+
 
MariaDB [(none)]> show global variables like '%io_capa%';
+------------------------+-------+
| Variable_name          | Value |
+------------------------+-------+
| innodb_io_capacity     | 3000  |
| innodb_io_capacity_max | 5000  |
+------------------------+-------+

I tried by increasing/decreasing the innodb_adaptive_flushing_lwm and innodb_max_dirty_pages_pct_lwm but sill facing the same issue.

I also tweaked innodb_io_capacity and innodb_io_capacity_max but no luck, btw we are using NVMe disks for this setup.

Below are the status when Innodb_checkpoint_age reaches Innodb_checkpoint_max_age

MariaDB [(none)]> show status like 'Innodb_buffer_pool_pages%'; show status like 'Innodb_checkpoint_%';
 
+-----------------------------------------+-----------+
| Variable_name                           | Value     |
+-----------------------------------------+-----------+
| Innodb_buffer_pool_pages_data           | 10654068  |
| Innodb_buffer_pool_pages_dirty          | 9425873   |
| Innodb_buffer_pool_pages_flushed        | 100225151 |
| Innodb_buffer_pool_pages_free           | 16726607  |
| Innodb_buffer_pool_pages_made_not_young | 12247924  |
| Innodb_buffer_pool_pages_made_young     | 565734067 |
| Innodb_buffer_pool_pages_misc           | 0         |
| Innodb_buffer_pool_pages_old            | 3932829   |
| Innodb_buffer_pool_pages_total          | 27380675  |
| Innodb_buffer_pool_pages_lru_flushed    | 0         |
+-----------------------------------------+-----------+
10 rows in set (0.000 sec)
 
+---------------------------+-------------+
| Variable_name             | Value       |
+---------------------------+-------------+
| Innodb_checkpoint_age     | 26090991889 |
| Innodb_checkpoint_max_age | 26091216876 |
+---------------------------+-------------+

Iostat
 
Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
dm-0            28.80    460.80     0.00   0.00    0.22    16.00 30756.60 992005.60     0.00   0.00    0.04    32.25    0.00      0.00     0.00   0.00    0.00     0.00    1.27  99.68
dm-1             0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
dm-2             0.00      0.00     0.00   0.00    0.00     0.00    1.80     21.60     0.00   0.00    0.00    12.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.48
dm-3             0.00      0.00     0.00   0.00    0.00     0.00   13.60  31140.80     0.00   0.00    1.29  2289.76    0.00      0.00     0.00   0.00    0.00     0.00    0.02   6.32
sda              0.00      0.00     0.00   0.00    0.00     0.00    0.80     27.20     6.00  88.24    0.25    34.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.24
sdb              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sdc              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sfd0n1          28.80    460.80     0.00   0.00    0.29    16.00 32618.60 1023168.00     0.00   0.00    0.05    31.37    0.00      0.00     0.00   0.00    0.00     0.00    0.16  99.68

QPS around the issue

[ 1252s ] thds: 256 tps: 69863.98 qps: 69863.98 (r/w/o: 0.00/49843.99/20019.99) lat (ms,99%): 5.88 err/s: 0.00 reconn/s: 0.00
[ 1253s ] thds: 256 tps: 20750.41 qps: 20750.41 (r/w/o: 0.00/14817.86/5932.55) lat (ms,99%): 6.09 err/s: 0.00 reconn/s: 0.00
[ 1254s ] thds: 256 tps: 0.00 qps: 0.00 (r/w/o: 0.00/0.00/0.00) lat (ms,99%): 0.00 err/s: 0.00 reconn/s: 0.00
[ 1255s ] thds: 256 tps: 58797.66 qps: 58797.66 (r/w/o: 0.00/41615.64/17182.02) lat (ms,99%): 7.98 err/s: 0.00 reconn/s: 0.00
[ 1256s ] thds: 256 tps: 67527.27 qps: 67527.27 (r/w/o: 0.00/48173.05/19354.22) lat (ms,99%): 7.43 err/s: 0.00 reconn/s: 0.00
[ 1257s ] thds: 256 tps: 58384.64 qps: 58384.64 (r/w/o: 0.00/41503.59/16881.05) lat (ms,99%): 7.43 err/s: 0.00 reconn/s: 0.00
 
[ 1273s ] thds: 256 tps: 69908.70 qps: 69908.70 (r/w/o: 0.00/49996.07/19912.63) lat (ms,99%): 6.09 err/s: 0.00 reconn/s: 0.00
[ 1274s ] thds: 256 tps: 43327.15 qps: 43327.15 (r/w/o: 0.00/30641.10/12686.04) lat (ms,99%): 167.44 err/s: 0.00 reconn/s: 0.00
[ 1275s ] thds: 256 tps: 32849.45 qps: 32849.45 (r/w/o: 0.00/23436.04/9413.42) lat (ms,99%): 118.92 err/s: 0.00 reconn/s: 0.00
[ 1276s ] thds: 256 tps: 38600.51 qps: 38600.51 (r/w/o: 0.00/27316.24/11284.27) lat (ms,99%): 223.34 err/s: 0.00 reconn/s: 0.00
[ 1277s ] thds: 256 tps: 41558.22 qps: 41558.22 (r/w/o: 0.00/29635.58/11922.64) lat (ms,99%): 15.83 err/s: 0.00 reconn/s: 0.00
[ 1278s ] thds: 256 tps: 49127.92 qps: 49127.92 (r/w/o: 0.00/35045.52/14082.40) lat (ms,99%): 183.21 err/s: 0.00 reconn/s: 0.00
[ 1279s ] thds: 256 tps: 47763.57 qps: 47763.57 (r/w/o: 0.00/33943.82/13819.74) lat (ms,99%): 15.27 err/s: 0.00 reconn/s: 0.00
[ 1280s ] thds: 256 tps: 41276.96 qps: 41276.96 (r/w/o: 0.00/29521.97/11754.99) lat (ms,99%): 164.45 err/s: 0.00 reconn/s: 0.00
[ 1281s ] thds: 256 tps: 40896.95 qps: 40896.95 (r/w/o: 0.00/29010.55/11886.41) lat (ms,99%): 164.45 err/s: 0.00 reconn/s: 0.00
[ 1282s ] thds: 256 tps: 51294.76 qps: 51294.76 (r/w/o: 0.00/36633.83/14660.93) lat (ms,99%): 16.12 err/s: 0.00 reconn/s: 0.00
[ 1283s ] thds: 256 tps: 36653.38 qps: 36653.38 (r/w/o: 0.00/26246.55/10406.82) lat (ms,99%): 227.40 err/s: 0.00 reconn/s: 0.00
[ 1284s ] thds: 256 tps: 47463.39 qps: 47463.39 (r/w/o: 0.00/33757.41/13705.98) lat (ms,99%): 8.43 err/s: 0.00 reconn/s: 0.00
[ 1285s ] thds: 256 tps: 44322.09 qps: 44322.09 (r/w/o: 0.00/31595.92/12726.16) lat (ms,99%): 176.73 err/s: 0.00 reconn/s: 0.00
[ 1286s ] thds: 256 tps: 45233.40 qps: 45233.40 (r/w/o: 0.00/32176.58/13056.83) lat (ms,99%): 139.85 err/s: 0.00 reconn/s: 0.00
[ 1287s ] thds: 256 tps: 43981.65 qps: 43981.65 (r/w/o: 0.00/31424.61/12557.04) lat (ms,99%): 137.35 err/s: 0.00 reconn/s: 0.00
[ 1288s ] thds: 256 tps: 45362.25 qps: 45362.25 (r/w/o: 0.00/32337.89/13024.36) lat (ms,99%): 164.45 err/s: 0.00 reconn/s: 0.00
[ 1289s ] thds: 256 tps: 47941.21 qps: 47941.21 (r/w/o: 0.00/34073.42/13867.80) lat (ms,99%): 15.55 err/s: 0.00 reconn/s: 0.00
[ 1290s ] thds: 256 tps: 48869.94 qps: 48869.94 (r/w/o: 0.00/34709.96/14159.98) lat (ms,99%): 10.09 err/s: 0.00 reconn/s: 0.00
[ 1291s ] thds: 256 tps: 46091.53 qps: 46091.53 (r/w/o: 0.00/32806.82/13284.71) lat (ms,99%): 10.09 err/s: 0.00 reconn/s: 0.00
[ 1292s ] thds: 256 tps: 46741.92 qps: 46741.92 (r/w/o: 0.00/33258.23/13483.69) lat (ms,99%): 73.13 err/s: 0.00 reconn/s: 0.00
[ 1293s ] thds: 256 tps: 49509.12 qps: 49509.12 (r/w/o: 0.00/35481.93/14027.18) lat (ms,99%): 66.84 err/s: 0.00 reconn/s: 0.00
[ 1294s ] thds: 256 tps: 43450.71 qps: 43450.71 (r/w/o: 0.00/30950.49/12500.22) lat (ms,99%): 15.83 err/s: 0.00 reconn/s: 0.00
[ 1295s ] thds: 256 tps: 48791.38 qps: 48791.38 (r/w/o: 0.00/34645.01/14146.37) lat (ms,99%): 183.21 err/s: 0.00 reconn/s: 0.00
[ 1296s ] thds: 256 tps: 42562.93 qps: 42562.93 (r/w/o: 0.00/30257.95/12304.98) lat (ms,99%): 16.41 err/s: 0.00 reconn/s: 0.00
[ 1297s ] thds: 256 tps: 65641.78 qps: 65641.78 (r/w/o: 0.00/46758.54/18883.24) lat (ms,99%): 7.56 err/s: 0.00 reconn/s: 0.00
[ 1298s ] thds: 256 tps: 38736.94 qps: 38736.94 (r/w/o: 0.00/27624.96/11111.98) lat (ms,99%): 15.55 err/s: 0.00 reconn/s: 0.00
[ 1299s ] thds: 256 tps: 51147.80 qps: 51147.80 (r/w/o: 0.00/36503.14/14644.66) lat (ms,99%): 16.12 err/s: 0.00 reconn/s: 0.00
[ 1300s ] thds: 256 tps: 34908.85 qps: 34908.85 (r/w/o: 0.00/24826.60/10082.24) lat (ms,99%): 16.12 err/s: 0.00 reconn/s: 0.00
[ 1301s ] thds: 256 tps: 60257.94 qps: 60257.94 (r/w/o: 0.00/42931.96/17325.98) lat (ms,99%): 13.46 err/s: 0.00 reconn/s: 0.00
[ 1302s ] thds: 256 tps: 44607.33 qps: 44607.33 (r/w/o: 0.00/31644.10/12963.22) lat (ms,99%): 179.94 err/s: 0.00 reconn/s: 0.00
[ 1303s ] thds: 256 tps: 52189.03 qps: 52189.03 (r/w/o: 0.00/37171.75/15017.28) lat (ms,99%): 7.43 err/s: 0.00 reconn/s: 0.00
 
[ 1319s ] thds: 256 tps: 67309.85 qps: 67309.85 (r/w/o: 0.00/47982.61/19327.24) lat (ms,99%): 9.39 err/s: 0.00 reconn/s: 0.00
[ 1320s ] thds: 256 tps: 6857.52 qps: 6857.52 (r/w/o: 0.00/4928.37/1929.15) lat (ms,99%): 189.93 err/s: 0.00 reconn/s: 0.00
[ 1321s ] thds: 256 tps: 6017.51 qps: 6017.51 (r/w/o: 0.00/4291.65/1725.86) lat (ms,99%): 1618.78 err/s: 0.00 reconn/s: 0.00
[ 1322s ] thds: 256 tps: 64763.63 qps: 64763.63 (r/w/o: 0.00/46201.32/18562.32) lat (ms,99%): 7.98 err/s: 0.00 reconn/s: 0.00

QPS dips, disk util increases when Mariadb starts to do the furious flushing .



 Comments   
Comment by Marko Mäkelä [ 2021-06-30 ]

gktomar, can you please try to collect Innodb_buffer_pool_pages_flushed and Innodb_checkpoint_age during the workload?

Also, can you share a minimal complete configuration for repeating this? I do not think that Galera should play a role here, so the minimal configuration should be without Galera.

Comment by Marko Mäkelä [ 2021-06-30 ]

I now see that Checkpoint.png is a graph of the Innodb_checkpoint_age, but it is not clear to me what BP data.png represents.

Comment by Gaurav Tomar [ 2021-06-30 ]

Estimated dirty limit is set by innodb_max_dirty_pages_pct variables which is set to 90, total data in buffer pool is 163GB, hence Estimated dirty limit becomes 146.7GB. Data dirty are the Modified db pages. What BP data image is representing is dirty pages are getting flushed only when innodb_max_dirty_pages_pct is crossed, as per the doc it should have been started after crossing innodb_max_dirty_pages_pct_lwm

One other impact of this issue is stopping the mariadb takes a longer time as it have to flush the dirty pages to disk during the shutdown.

This issue can be reproduced on a stand alone mariadb instance with below configurations:

innodb_log_file_size = 2G
innodb_buffer_pool_size = 32G
innodb_file_per_table = 1
innodb_open_files = 400
innodb_io_capacity = 3000
innodb_io_capacity_max = 5000
innodb_flush_method = O_DIRECT
innodb_flush_log_at_trx_commit=0
innodb_adaptive_flushing_lwm=10
innodb_max_dirty_pages_pct=90
innodb_max_dirty_pages_pct_lwm=10
innodb_flush_sync=1

Innodb_buffer_pool_pages_flushed and Innodb_checkpoint_age around the issue are provided in the description of the ticket.

Comment by Krunal Bauskar [ 2021-07-01 ]

I tried to re-validate the issue with 10.5.11 and found that the adaptive flushing does kick in as expected.
------------------------------------------------------------------------

a. Started server with
innodb_max_dirty_pages_pct=90
innodb_max_dirty_pages_pct_lwm=90 (***** not the high value)
(innodb_adaptive_flushing_lwm=10 (default))

b. This means adaptive flushing will kick in till when

(A): dirty_pages > 90% of the buffer pool pages
---- OR ----
(B): redo > 10% full.

[I purposely used this configuration so that we see the effect of (B) before (A) kicks in].

------------------------------

at-start
=========

MariaDB [(none)]> show status like 'Innodb_buffer_pool%pages%'; show status like '%check%age%';

Variable_name Value
Innodb_buffer_pool_pages_data 2055511
Innodb_buffer_pool_pages_dirty 0
Innodb_buffer_pool_pages_flushed 0
Innodb_buffer_pool_pages_free 525289
Innodb_buffer_pool_pages_made_not_young 0
Innodb_buffer_pool_pages_made_young 0
Innodb_buffer_pool_pages_misc 0
Innodb_buffer_pool_pages_old 758792
Innodb_buffer_pool_pages_total 2580800
Innodb_buffer_pool_pages_lru_flushed 0
Variable_name Value
Innodb_checkpoint_age 12
Innodb_checkpoint_max_age 8696599327

2 rows in set (0.000 sec)

-----------------

adaptive flushing kicked in
======================

MariaDB [(none)]> show status like 'Innodb_buffer_pool%pages%'; show status like '%check%age%';

Variable_name Value
Innodb_buffer_pool_pages_data 2198223
Innodb_buffer_pool_pages_dirty 142583
Innodb_buffer_pool_pages_flushed 4158
Innodb_buffer_pool_pages_free 382577
Innodb_buffer_pool_pages_made_not_young 542
Innodb_buffer_pool_pages_made_young 23750
Innodb_buffer_pool_pages_misc 0
Innodb_buffer_pool_pages_old 811433
Innodb_buffer_pool_pages_total 2580800
Innodb_buffer_pool_pages_lru_flushed 0
Variable_name Value
Innodb_checkpoint_age 1003881850
Innodb_checkpoint_max_age 8696599327

------------------------------

Observations:
1. If we see the output (2) above we can observe that dirty pct is approx 5% (142583). Given our threshold of 90% (A) is not satisfied.

2. checkpoint_age > 10% that causes adaptive flushing to kick in as (B) is satisfied.
The effect of adaptive flushing could be seen with increasing Innodb_buffer_pool_pages_flushed.

Comment by Krunal Bauskar [ 2021-07-01 ]

Gaurav Tomar,

1. As Marko pointed you should try to monitor the pages-flushed.

2. I see with your standalone configuration you have used a pretty small redo-log file size (2G).
That means once you cross 200 MB mark you should see adaptive-flushing getting active.

[There could be a thin line as to whether you are meeting condition (A) or (B).
To ensure that you can increase the innodb_max_dirty_pages_pct_lwm so ideally
(A) is too far and (B) will be met immediately].

3. Despite the said fix there are good chances that you may still hit flush-storm.
Enabling adaptive flushing doesn't ensure that flushing can flush to match the rate
of dirty page generation.

This is another known issue. Adaptive flushing algorithm needs to be either more aggressive or better tune to handle increasing pressure.
I recall discussing it with Marko a while back there are some experiments done https://jira.mariadb.org/browse/MDEV-25113 but I have not tested it
yet.

Comment by Gaurav Tomar [ 2021-07-01 ]

@Krunal: Yes it seems the adaptive flushing is actually getting kicked in but the rate of flushing is not much to keep the Innodb_checkpoint_age below Innodb_checkpoint_max_age.

Will wait for the 10.5.12 release to test this scenario again.

Comment by Marko Mäkelä [ 2021-07-01 ]

I do not think that anything has been recently changed about the old adaptive flushing code. But, we may have made that code obsolete.

Slightly after the 10.5.11 release, we reduced the latency related to page flushing and LRU eviction, to ultimately address MDEV-25451: MDEV-25954, MDEV-25948, MDEV-25113, MDEV-26004.

The reported MDEV-25113 scenario (read-only workload being slow after a read-write workload) was addressed by 3 fixes:

  1. removing buf_pool.flush_list_mutex acquisition from the page write completion callback
  2. (maybe this should have been called the MDEV-25451 fix) implementing a ‘soft’ aggressive flushing trigger
  3. (the main MDEV-25113 fix, postponed to MDEV-26004) reduce the latency of buf_LRU_get_free_block()

It may be that triggering a little more aggressive flushing earlier would reduce or eliminate the need for the old adaptive flushing mechanism. In control theory, PID controllers are well established and understood. The pre-flushing trigger is an example of a Positional controller with no I or D component. The "position" would be the checkpoint age.

If we still need the old adaptive flushing mechanism, I think that we should spend time to fully understand it, and conduct some experiments where we would collect more data and try to tune the P,I,D coefficients (if the old code really forms a PID controller).

Comment by Marko Mäkelä [ 2021-08-11 ]

gktomar, did you try MariaDB Server 10.5.12 yet?

Comment by Gaurav Tomar [ 2021-08-16 ]

@marko I'm currently testing this on 10.5.12, will get back with the results.

Comment by Marko Mäkelä [ 2021-09-16 ]

gktomar, do you have any results yet?

Comment by Gaurav Tomar [ 2021-09-16 ]

@Marko I tested the dirty pages flushing behavior in MariaDB 10.5.12, and below are my observations.

  1. Adaptive flushing is getting enabled only when innodb_max_dirty_pages_pct_lwm<>0 even if innodb_adaptive_flushing is set to on.
  2. Adaptive flushing is getting enabled both the times when the Innodb_checkpoint_age crosses the configured value of innodb_adaptive_flushing_lwm or when dirty pages cross the configured value of innodb_max_dirty_pages_pct_lwm.
  3. Innodb_checkpoint_age does not reach the Innodb_checkpoint_max_age in this version as it used to in 10.5.11.
  4. Instead when the Innodb_checkpoint_age reaches ~ 71% of the total redo log size, async flushing starts, keeping the Innodb_checkpoint_age below Innodb_checkpoint_max_age. Since innodb_flush_sync is enabled in our setup, we see a burst of IO at this point (this happens with furious flushing aka sync flush but not sure if this should be the case with async flushing also) and a small dip in QPS.

[ 466s ] thds: 36 tps: 8597.20 qps: 171976.99 (r/w/o: 120391.79/34389.80/17195.40) lat (ms,99%): 5.99 err/s: 0.00 reconn/s: 0.00
[ 467s ] thds: 36 tps: 6853.07 qps: 137122.31 (r/w/o: 95985.92/27431.26/13705.13) lat (ms,99%): 6.79 err/s: 0.00 reconn/s: 0.00
[ 468s ] thds: 36 tps: 8804.10 qps: 176017.02 (r/w/o: 123205.41/35202.40/17609.20) lat (ms,99%): 5.99 err/s: 0.00 reconn/s: 0.00

[ 696s ] thds: 36 tps: 8634.82 qps: 172760.46 (r/w/o: 120947.52/34541.29/17271.65) lat (ms,99%): 6.21 err/s: 0.00 reconn/s: 0.00
[ 697s ] thds: 36 tps: 6767.29 qps: 135356.78 (r/w/o: 94745.04/27077.16/13534.58) lat (ms,99%): 7.04 err/s: 0.00 reconn/s: 0.00
[ 698s ] thds: 36 tps: 8811.16 qps: 176146.18 (r/w/o: 123289.22/35235.64/17621.32) lat (ms,99%): 6.55 err/s: 0.00 reconn/s: 0.00

Comment by Gaurav Tomar [ 2021-09-16 ]

Here is the graph for checkpoint age in 10.5.12.

Comment by Marko Mäkelä [ 2021-09-16 ]

gktomar, thank you. It sounds like the flush-ahead patch is working as designed and preventing a deeper stall.

We may still want to do something about adaptive flushing, to have some background flushing prevent even that small drop of performance. On faster storage (such as NVMe), the write bursts should be less of a problem.

I have been running some benchmarks on 10.6 today. On a SATA SSD, when having about 40GiB of data in a 30GiB buffer pool (so that LRU eviction will be constantly happening during the oltp_update_non_index workload), I am seeing rather stable throughput. If I increase the buffer pool size to 40GiB so that no LRU eviction will take place, then I am seeing 5-second intervals with 0tps every now and then. We might dismiss that as a user error, because the innodb_io_capacity was configured way too high (for my NVMe, not the much slower SATA SSD).

Comment by Marko Mäkelä [ 2021-09-17 ]

It remains to be seen how much MDEV-26626 would improve this.

Comment by [ 2022-02-22 ]

I have the same issue on 10.6.7. Regardless of `innodb_io_capacity`, or `innodb_adaptive_flushing_lwm` the checkpoint age keeps growing until it reaches the 75% and then its flushed furiously.

I agree with @Gaurav Tomar that:

Adaptive flushing is getting enabled only when `innodb_max_dirty_pages_pct_lwm<>0` even if `innodb_adaptive_flushing` is set to `on`.

When I set `innodb_max_dirty_pages_pct_lwm=0.000001` the adaptive flushing seems to kick in properly. With `innodb_max_dirty_pages_pct_lwm=0`, the adaptive flushing never kicks in, even when `innodb_adaptive_flushing_lwm` is reached

Comment by Marko Mäkelä [ 2022-10-21 ]

shalogo, yes, the value innodb_max_dirty_pages_pct_lwm=0 is special ("use innodb_max_dirty_pages_pct"). For a few releases between MDEV-23855 and MDEV-24537, that special meaning had been lost. See also MDEV-27295.

axel mentioned some time ago that the adaptive flushing does not work as well as he would like to. I think that some test parameters and results are needed before we can improve this area.

Comment by Marko Mäkelä [ 2023-02-01 ]

MySQL Bug #74637 has been filed for an interesting idea. Rephrased in the MariaDB 10.5+ terminology: If the innodb_max_dirty_pages_pct or innodb_max_dirty_pages_pct_lwm condition is satisfied but we are not yet near the log_free_check() limit (neither buf_flush_async_lsn nor buf_flush_sync_lsn has been set), then buf_flush_page_cleaner() could write out pages from buf_pool.LRU instead of buf_pool.flush_list.

Comment by Marko Mäkelä [ 2023-02-07 ]

There are two parameters related to background flushing: innodb_max_dirty_pages_pct_lwm>0 (default 0) and innodb_adaptive_flushing=ON (default ON).

Comment by Marko Mäkelä [ 2023-02-20 ]

I noticed that there is one more parameter related to adaptive flushing: innodb_adaptive_flushing_lwm (default: 10 percent of the log capacity). That could enable adaptive flushing even when innodb_max_dirty_pages_pct_lwm=0. If the innodb_log_file_size is large and the workload consists of update-in-place of non-indexed columns and involves a lot of thrashing (the working set is larger than the buffer pool), such checkpoint-oriented adaptive flushing may not help much. You’d additionally need to set innodb_max_dirty_pages_pct_lwm to the maximum allowed percentage of dirty pages in the buffer pool before page writes would be initiated.

Comment by Marko Mäkelä [ 2023-02-20 ]

jgagne suggested to me that 1 second is a long time to wait for buf_flush_page_cleaner() to free up some pages. But, in an oltp_update_index test with innodb_adaptive_flushing_lwm=0, I observed a significant regression of throughput for reducing the scheduling interval to 0.1 seconds:

revision throughput/tps 99th-percentile latency/ms
10.6 67a6ad0a4a36bd59fabbdd6b1cdd38de54e82c79 123737.93 0.30
e6b4c770dba6f04c22dc2f66e7133f202b877f61 122920.34 0.30
52d65c66fdb5fd1554e6aefcb81587490424d8ee 111422.78 0.39

This was a quick 120-second run with 16 concurrent connections, 4GiB buffer pool and log, 8×2M rows.

The first revision (making buf_flush_page_cleaner() stock the buf_pool.free list) could be worth broader testing.

Comment by Marko Mäkelä [ 2023-03-01 ]

I revised the main commit, still based on the same 10.6 parent commit 67a6ad0a4a36bd59fabbdd6b1cdd38de54e82c79. I did not run any performance tests yet, but I think that this one should work more accurately and with fewer operations on buf_pool.mutex. I think that the original commit may have been wrongly evicting pages that had been freed.

Comment by Marko Mäkelä [ 2023-03-16 ]

Many performance fixes on top of this were made in MDEV-26827.

Comment by Marko Mäkelä [ 2023-04-26 ]

axel managed to reproduce a couple of times a strange anomaly where the system throughput would soar for a while, and the buf_flush_page_cleaner thread would remain idle.

It looks like in MDEV-24369, a multiplication by 100 for calculating pct_for_dirty was inadvertently deleted. This follow-up fix should address that and make the adaptive flushing proportionally reduce the page write rate below innodb_io_capacity/s. By MDEV-26827, any ‘unused’ page writes in each one-second slot would be used for writing out up to innodb_lru_scan_depth dirty pages from the end of buf_pool.LRU list.

Comment by Marko Mäkelä [ 2023-06-26 ]

Something similar to jeanfrancois.gagne’s idea had been implemented in MDEV-6932 but probably lost when the InnoDB changes from MySQL 5.7.9 were applied to MariaDB Server 10.2.2.

Comment by Marko Mäkelä [ 2023-11-17 ]

In order to fix a frequent regression test due to a timeout (MDEV-32681), I ported this change to 10.5 as well.

I observed a small improvement when running a 120-second, 16-client Sysbench oltp_update_index test on Intel Optane 960 NVMe, with 4 GiB buffer pool and log size, 8×2M rows, and innodb_flush_log_at_trx_commit=0. This is an I/O bound workload; the data directory would grow to 12 or 13 GiB (that is, 8 or 9 GiB of data files in addition to the 4 GiB ib_logfile0).

revision throughput/tps 99th-percentile latency/ms max latency/ms sum of latency/ms
baseline 113246.16 0.59 35.80 1913510.12
patch 116098.43 0.46 60.58 1914245.80

The maximum latency as well as the sum of latency are slightly worse, but those are a little random in nature. We can observe some improvement in the throughput as well as in the typical latency.

Generated at Thu Feb 08 09:42:24 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.