[MDEV-26055] Adaptive flushing is still not getting invoked in 10.5.11 Created: 2021-06-30 Updated: 2023-12-18 Resolved: 2023-03-16 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.5.11 |
| Fix Version/s: | 10.11.3, 11.0.2, 10.6.13, 10.8.8, 10.9.6, 10.10.4, 10.5.24 |
| Type: | Bug | Priority: | Major |
| Reporter: | Gaurav Tomar | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | regression | ||
| Environment: |
Ubuntu 20.04 LTS |
||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
Hi Team, This issue is related to MDEV-25093. Adaptive flushing is not getting invoked after crossing the configured value of innodb_adaptive_flushing_lwm. I have attached the graph for checkpoint. My setup is 3 node galera cluster running 10.5.11 version and 26.4.8-focal galera package installed on ubuntu focal 20.04. MDEV-25093 is supposed to be fixed in 10.5.10, but I see a similar behaviour in all the mariadb versions starting from 10.5.7 to 10.5.11. I have shared the graph for 10.5.11 though. Below are the configs I used.
I tried by increasing/decreasing the innodb_adaptive_flushing_lwm and innodb_max_dirty_pages_pct_lwm but sill facing the same issue. I also tweaked innodb_io_capacity and innodb_io_capacity_max but no luck, btw we are using NVMe disks for this setup. Below are the status when Innodb_checkpoint_age reaches Innodb_checkpoint_max_age
QPS around the issue
QPS dips, disk util increases when Mariadb starts to do the furious flushing . |
| Comments |
| Comment by Marko Mäkelä [ 2021-06-30 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
gktomar, can you please try to collect Innodb_buffer_pool_pages_flushed and Innodb_checkpoint_age during the workload? Also, can you share a minimal complete configuration for repeating this? I do not think that Galera should play a role here, so the minimal configuration should be without Galera. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-06-30 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I now see that Checkpoint.png | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Gaurav Tomar [ 2021-06-30 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Estimated dirty limit is set by innodb_max_dirty_pages_pct variables which is set to 90, total data in buffer pool is 163GB, hence Estimated dirty limit becomes 146.7GB. Data dirty are the Modified db pages. What BP data image is representing is dirty pages are getting flushed only when innodb_max_dirty_pages_pct is crossed, as per the doc it should have been started after crossing innodb_max_dirty_pages_pct_lwm One other impact of this issue is stopping the mariadb takes a longer time as it have to flush the dirty pages to disk during the shutdown. This issue can be reproduced on a stand alone mariadb instance with below configurations: innodb_log_file_size = 2G Innodb_buffer_pool_pages_flushed and Innodb_checkpoint_age around the issue are provided in the description of the ticket. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Krunal Bauskar [ 2021-07-01 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I tried to re-validate the issue with 10.5.11 and found that the adaptive flushing does kick in as expected. a. Started server with b. This means adaptive flushing will kick in till when (A): dirty_pages > 90% of the buffer pool pages [I purposely used this configuration so that we see the effect of (B) before (A) kicks in]. ------------------------------ at-start MariaDB [(none)]> show status like 'Innodb_buffer_pool%pages%'; show status like '%check%age%';
2 rows in set (0.000 sec) ----------------- adaptive flushing kicked in MariaDB [(none)]> show status like 'Innodb_buffer_pool%pages%'; show status like '%check%age%';
------------------------------ Observations: 2. checkpoint_age > 10% that causes adaptive flushing to kick in as (B) is satisfied. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Krunal Bauskar [ 2021-07-01 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Gaurav Tomar, 1. As Marko pointed you should try to monitor the pages-flushed. 2. I see with your standalone configuration you have used a pretty small redo-log file size (2G). [There could be a thin line as to whether you are meeting condition (A) or (B). 3. Despite the said fix there are good chances that you may still hit flush-storm. This is another known issue. Adaptive flushing algorithm needs to be either more aggressive or better tune to handle increasing pressure. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Gaurav Tomar [ 2021-07-01 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
@Krunal: Yes it seems the adaptive flushing is actually getting kicked in but the rate of flushing is not much to keep the Innodb_checkpoint_age below Innodb_checkpoint_max_age. Will wait for the 10.5.12 release to test this scenario again. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-07-01 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I do not think that anything has been recently changed about the old adaptive flushing code. But, we may have made that code obsolete. Slightly after the 10.5.11 release, we reduced the latency related to page flushing and LRU eviction, to ultimately address The reported
It may be that triggering a little more aggressive flushing earlier would reduce or eliminate the need for the old adaptive flushing mechanism. In control theory, PID controllers are well established and understood. The pre-flushing trigger is an example of a Positional controller with no I or D component. The "position" would be the checkpoint age. If we still need the old adaptive flushing mechanism, I think that we should spend time to fully understand it, and conduct some experiments where we would collect more data and try to tune the P,I,D coefficients (if the old code really forms a PID controller). | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-08-11 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
gktomar, did you try MariaDB Server 10.5.12 yet? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Gaurav Tomar [ 2021-08-16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
@marko I'm currently testing this on 10.5.12, will get back with the results. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-09-16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
gktomar, do you have any results yet? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Gaurav Tomar [ 2021-09-16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
@Marko I tested the dirty pages flushing behavior in MariaDB 10.5.12, and below are my observations.
[ 466s ] thds: 36 tps: 8597.20 qps: 171976.99 (r/w/o: 120391.79/34389.80/17195.40) lat (ms,99%): 5.99 err/s: 0.00 reconn/s: 0.00 [ 696s ] thds: 36 tps: 8634.82 qps: 172760.46 (r/w/o: 120947.52/34541.29/17271.65) lat (ms,99%): 6.21 err/s: 0.00 reconn/s: 0.00 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Gaurav Tomar [ 2021-09-16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-09-16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
gktomar, thank you. It sounds like the flush-ahead patch is working as designed and preventing a deeper stall. We may still want to do something about adaptive flushing, to have some background flushing prevent even that small drop of performance. On faster storage (such as NVMe), the write bursts should be less of a problem. I have been running some benchmarks on 10.6 today. On a SATA SSD, when having about 40GiB of data in a 30GiB buffer pool (so that LRU eviction will be constantly happening during the oltp_update_non_index workload), I am seeing rather stable throughput. If I increase the buffer pool size to 40GiB so that no LRU eviction will take place, then I am seeing 5-second intervals with 0tps every now and then. We might dismiss that as a user error, because the innodb_io_capacity was configured way too high (for my NVMe, not the much slower SATA SSD). | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-09-17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
It remains to be seen how much | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Leonard Sinquer [ 2022-02-22 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I have the same issue on 10.6.7. Regardless of `innodb_io_capacity`, or `innodb_adaptive_flushing_lwm` the checkpoint age keeps growing until it reaches the 75% and then its flushed furiously. I agree with @Gaurav Tomar that:
When I set `innodb_max_dirty_pages_pct_lwm=0.000001` the adaptive flushing seems to kick in properly. With `innodb_max_dirty_pages_pct_lwm=0`, the adaptive flushing never kicks in, even when `innodb_adaptive_flushing_lwm` is reached | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-10-21 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
shalogo, yes, the value innodb_max_dirty_pages_pct_lwm=0 is special ("use innodb_max_dirty_pages_pct"). For a few releases between axel mentioned some time ago that the adaptive flushing does not work as well as he would like to. I think that some test parameters and results are needed before we can improve this area. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-02-01 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
MySQL Bug #74637 has been filed for an interesting idea. Rephrased in the MariaDB 10.5+ terminology: If the innodb_max_dirty_pages_pct or innodb_max_dirty_pages_pct_lwm condition is satisfied but we are not yet near the log_free_check() limit (neither buf_flush_async_lsn nor buf_flush_sync_lsn has been set), then buf_flush_page_cleaner() could write out pages from buf_pool.LRU instead of buf_pool.flush_list. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-02-07 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
There are two parameters related to background flushing: innodb_max_dirty_pages_pct_lwm>0 (default 0) and innodb_adaptive_flushing=ON (default ON). | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-02-20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I noticed that there is one more parameter related to adaptive flushing: innodb_adaptive_flushing_lwm (default: 10 percent of the log capacity). That could enable adaptive flushing even when innodb_max_dirty_pages_pct_lwm=0. If the innodb_log_file_size is large and the workload consists of update-in-place of non-indexed columns and involves a lot of thrashing (the working set is larger than the buffer pool), such checkpoint-oriented adaptive flushing may not help much. You’d additionally need to set innodb_max_dirty_pages_pct_lwm to the maximum allowed percentage of dirty pages in the buffer pool before page writes would be initiated. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-02-20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
jgagne suggested to me that 1 second is a long time to wait for buf_flush_page_cleaner() to free up some pages. But, in an oltp_update_index test with innodb_adaptive_flushing_lwm=0, I observed a significant regression of throughput for reducing the scheduling interval to 0.1 seconds:
This was a quick 120-second run with 16 concurrent connections, 4GiB buffer pool and log, 8×2M rows. The first revision (making buf_flush_page_cleaner() stock the buf_pool.free list) could be worth broader testing. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-03-01 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I revised the main commit, still based on the same 10.6 parent commit 67a6ad0a4a36bd59fabbdd6b1cdd38de54e82c79. I did not run any performance tests yet, but I think that this one should work more accurately and with fewer operations on buf_pool.mutex. I think that the original commit may have been wrongly evicting pages that had been freed. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-03-16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Many performance fixes on top of this were made in | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-04-26 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
axel managed to reproduce a couple of times a strange anomaly where the system throughput would soar for a while, and the buf_flush_page_cleaner thread would remain idle. It looks like in | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-06-26 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Something similar to jeanfrancois.gagne’s idea had been implemented in | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-11-17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
In order to fix a frequent regression test due to a timeout ( I observed a small improvement when running a 120-second, 16-client Sysbench oltp_update_index test on Intel Optane 960 NVMe, with 4 GiB buffer pool and log size, 8×2M rows, and innodb_flush_log_at_trx_commit=0. This is an I/O bound workload; the data directory would grow to 12 or 13 GiB (that is, 8 or 9 GiB of data files in addition to the 4 GiB ib_logfile0).
The maximum latency as well as the sum of latency are slightly worse, but those are a little random in nature. We can observe some improvement in the throughput as well as in the typical latency. |