Details
-
Bug
-
Status: Closed (View Workflow)
-
Blocker
-
Resolution: Fixed
-
10.5.7, 10.5.8, 10.5.9
Description
- InnoDB flushing should happen if either of the factors is true
a. dirty_pct (dirty pages in buffer pool) > innodb_max_dirty_pages_pct_lwm
b. or innodb_adaptive_flushing_lwm limit is reached (default to 10%)
- condition (b) represent pressure on redo log and even if (a) is not reached
then (b) will cause flushing to start to help reduce the pressure on the redo-log.
- Based on the investigation so far it has been found that (b) condition
is not causing adaptive flushing to kick in.
---------------------------------------------------------------------------
Let's understand this with some quick experiment
Let's say we have a very large buffer pool (1m pages = 160 GB).
Also, let's set innodb_max_dirty_pages_pct_lwm = 70% which means flushing
will not happen till we reach that limit unless adaptive flushing kicks in
(with 69 GB data the limit is never hit).
Adaptive flushing should kick in if there is pressure being built on the redo log
and is controlled by innodb_adaptive_flushing_lwm (default to 10% unchanged for the experiment)).
I am running an update-index workload in parallel and as we could see despite redo log
crossing the 10% (innodb_adaptive_flushing_lwm) limit flushing fails to kick in.
[condition (b) is true].
Ideally, on crossing 10% of the redo-log size (20GB * 10% = 2GB) it should start flushing.
Max-checkpoint age is correctly set to 85% of the redo-log size (I recall it should be 80-85%).
MariaDB [(none)]> show status like 'Innodb_buffer_pool_pages%'; show status like 'Innodb_checkpoint_%';
-------------------------------------------------+
Variable_name Value
-------------------------------------------------+
Innodb_buffer_pool_pages_data 4496537
Innodb_buffer_pool_pages_dirty 3100258
Innodb_buffer_pool_pages_flushed 0
Innodb_buffer_pool_pages_free 5826663
.....
Innodb_checkpoint_age 4260770018
Innodb_checkpoint_max_age 17393908102
--------------------------------------+
MariaDB [(none)]> show status like 'Innodb_buffer_pool_pages%'; show status like 'Innodb_checkpoint_%';
-------------------------------------------------+
Variable_name Value
-------------------------------------------------+
Innodb_buffer_pool_pages_data 4523411
Innodb_buffer_pool_pages_dirty 4483055
Innodb_buffer_pool_pages_flushed 0
Innodb_buffer_pool_pages_free 5799789
.....
Innodb_checkpoint_age 15647589898
Innodb_checkpoint_max_age 17393908102
--------------------------------------+
Version tested on: 10.5 (#4498714)
and of-course a sudden drop in tps is seen once the redo-log hit the max-checkpoint age (84K -> 34K)
[ 255s ] thds: 1024 tps: 84861.92 qps: 84862.12 (r/w/o: 0.00/84862.12/0.00) lat (ms,95%): 12.75 err/s: 0.00 reconn/s: 0.00
[ 260s ] thds: 1024 tps: 78755.87 qps: 78755.87 (r/w/o: 0.00/78755.87/0.00) lat (ms,95%): 12.30 err/s: 0.00 reconn/s: 0.00
[ 265s ] thds: 1024 tps: 34419.32 qps: 34419.32 (r/w/o: 0.00/34419.32/0.00) lat (ms,95%): 27.17 err/s: 0.00 reconn/s: 0.00
[ 270s ] thds: 1024 tps: 53913.70 qps: 53913.70 (r/w/o: 0.00/53913.70/0.00) lat (ms,95%): 13.70 err/s: 0.00 reconn/s: 0.00
[ 275s ] thds: 1024 tps: 59043.41 qps: 59043.41 (r/w/o: 0.00/59043.41/0.00) lat (ms,95%): 14.73 err/s: 0.00 reconn/s: 0.00
[ 280s ] thds: 1024 tps: 73390.11 qps: 73390.11 (r/w/o: 0.00/73390.11/0.00) lat (ms,95%): 13.70 err/s: 0.00 reconn/s: 0.00
---------------
Said issue looks to be a regression and older version should be studied to findout when it started regressing but likely it is 10.5 onwards only.
Attachments
Issue Links
- is caused by
-
MDEV-23855 InnoDB log checkpointing causes regression for write-heavy OLTP
- Closed
- relates to
-
MDEV-24949 Enabling idle flushing (possible regression from MDEV-23855)
- Closed
-
MDEV-25113 Reduce effect of parallel background flush on select workload
- Closed
-
MDEV-25557 Document that innodb_adaptive_flushing=OFF != "no"
- Closed
-
MDEV-26055 Adaptive flushing is still not getting invoked in 10.5.11
- Closed