[MDEV-24949] Enabling idle flushing (possible regression from MDEV-23855) Created: 2021-02-23 Updated: 2023-03-16 Resolved: 2021-03-11 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.5.7, 10.5.8, 10.5.9, 10.6 |
| Fix Version/s: | 10.5.10, 10.6.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Krunal Bauskar | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Description |
|
| Comments |
| Comment by Krunal Bauskar [ 2021-02-23 ] | ||||||||||||||||||||||||||||
|
Posting the possible patch: https://github.com/mysqlonarm/server/commit/b5cf788d0f27a1fbebb3ccc489783efeb3c270e1 | ||||||||||||||||||||||||||||
| Comment by Krunal Bauskar [ 2021-02-23 ] | ||||||||||||||||||||||||||||
|
I also tested the performance of the patch to check if enabling idle flushing helps. As we can see from the graph above it is clearly seen that idle flushing makes helps improve the performance for next burst cycle. (Testing was done with both NVME and SAS type SSD). More detailed testing to rule out other effect will be done once the basic idea of the patch is accepted. | ||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-02-23 ] | ||||||||||||||||||||||||||||
|
I confirm that the idle flushing (when innodb_max_dirty_pages_pct_lwm>0; see also
Before Whether it might be useful to have the ‘background flushing’ run under this kind of circumstances (alternating read-only and read-write workload), I truly do not know. If such a read-only burst is short, and if the previously dirtied pages will be soon modified again, it could be useful to keep those dirty pages in the buffer pool, to reduce ‘write amplification’. Before a known read-only burst (such as generating a report), it might be useful to modify the two above mentioned parameters. krunalbauskar, can you also share the workload script? I am guessing that the script is alternating between write-heavy and no workload. Could you also try to create another script that would alternate between write-heavy and read-heavy workload? | ||||||||||||||||||||||||||||
| Comment by Krunal Bauskar [ 2021-02-23 ] | ||||||||||||||||||||||||||||
|
Marko, I tested the read-only workload instead of sleep. Graph attached. RO workload has not shown any difference in both cases (expected as data is in memory).
So all in all idle-flushing helps maintain stable tps with improved performance with no or negligible effect on ro-workload. | ||||||||||||||||||||||||||||
| Comment by Axel Schwenke [ 2021-03-09 ] | ||||||||||||||||||||||||||||
|
oltp.png my.cnf (stripped to the essentials)
| ||||||||||||||||||||||||||||
| Comment by Axel Schwenke [ 2021-03-09 ] | ||||||||||||||||||||||||||||
|
oltp_ts_16.png We seem not to flush "adaptive". It is either on oder off. I would expect the flushing to grow starting at innodb_max_dirty_pages_pct_lwm, reaching the configured innodb_io_capacity at innodb_max_dirty_pages_pct and going to panic mode (furious flushing) above. Its not happening! | ||||||||||||||||||||||||||||
| Comment by Krunal Bauskar [ 2021-03-09 ] | ||||||||||||||||||||||||||||
|
Axel, I see you are using https://github.com/MariaDB/server/commit/7ecee0c35af8085a8730fce9f2f5917c65406bb0. there are multiple issues with the it. one I have already pointed and eventual fix too had issues. So finally based on discussion with Marko we have decided to revive our original patch that is being tracked under PR#1758. You other graph has tag name | ||||||||||||||||||||||||||||
| Comment by Axel Schwenke [ 2021-03-09 ] | ||||||||||||||||||||||||||||
That is right. The variant labeled as " | ||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-03-09 ] | ||||||||||||||||||||||||||||
|
I think that the PR#1758 version is better than mine. I developed my version in order to fully understand the logic. In my attempt to ‘make it as simple as possible’ I violated the ‘but not any simpler’ part. In oltp_ts_16.png
| ||||||||||||||||||||||||||||
| Comment by Axel Schwenke [ 2021-03-09 ] | ||||||||||||||||||||||||||||
|
I rerun the test with the following changes:
oltpB.png oltp_ts_16B.png
| ||||||||||||||||||||||||||||
| Comment by Krunal Bauskar [ 2021-03-09 ] | ||||||||||||||||||||||||||||
|
Axel, I am just reading you *B graph. Distortion in read-only/select tps will be reduced once we enable idle_flush to consider pending read operation so the idle flushing will not kick in if there are read ops pending. Another issue I foresee is related to a spike in read-write workload at the start and then a drop. This seems to be originating from the issue that page-cleaner is free and so there is no background flushing going on. Once page-cleaner crosses 10% mark (default) background flushing starts there-by limiting the redo-log activity and in-turn affects the read-write workload. | ||||||||||||||||||||||||||||
| Comment by Axel Schwenke [ 2021-03-09 ] | ||||||||||||||||||||||||||||
It is not a 10% mark, but the innodb_max_dirty_pages_pct_lwm = 25 boundary being crossed, that causes the throtteling of R/W throughput by the heavy flushing. I repeat my complaint here:
| ||||||||||||||||||||||||||||
| Comment by Krunal Bauskar [ 2021-03-09 ] | ||||||||||||||||||||||||||||
|
Axel, I so much agree with you. For the experiment, I did this. Used a very large buffer pool (1m pages = 160 GB. Also, set innodb_max_dirty_pages_pct_lwm=70% which means flushing will not happen till we reach that limit unless adaptive flushing kicks in (with 69 GB data the limit is never hit). Adaptive flushing should kick in if there is pressure being build on redo log and is controlled by innodb_adaptive_flushing_lwm (default to 10% that is what I kept too)). I am running the update-index workload in parallel and as we could see the redo log has crossed the 10% limit and still there is nothing being flushed. MariaDB [(none)]> show status like 'Innodb_buffer_pool_pages%'; show status like 'Innodb_checkpoint_%';
----------------------------------------
-------------------------- MariaDB [(none)]> show status like 'Innodb_buffer_pool_pages%'; show status like 'Innodb_checkpoint_%';
----------------------------------------
--------------
-------------------------- All this with vanilla 10.5 trunk (no patch applied). and of-course the tps drop once we hit the max-dirty age (84K -> 34K) [ 255s ] thds: 1024 tps: 84861.92 qps: 84862.12 (r/w/o: 0.00/84862.12/0.00) lat (ms,95%): 12.75 err/s: 0.00 reconn/s: 0.00 | ||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-03-09 ] | ||||||||||||||||||||||||||||
|
I think that we indeed need a separate ticket about the adaptive flushing, and we may have to look at older versions than 10.5. It is possible that adaptive flushing was in some way broken in | ||||||||||||||||||||||||||||
| Comment by Krunal Bauskar [ 2021-03-09 ] | ||||||||||||||||||||||||||||
|
| ||||||||||||||||||||||||||||
| Comment by Krunal Bauskar [ 2021-03-10 ] | ||||||||||||||||||||||||||||
|
Axel, I have been investigating this issue that tends to suggest that idle-flushing slows down the select workload. It seems like any parallel flushing in the background could have the same effect on the select workload. Let's say we have 1000 pages with innodb_max_dirty_pages_pct_lwm = 25% that means threshold for idle flushing is 250. Now let's assume the read-write workload modifies 700 pages. On completion of read-write workload before the idle flushing kicks in, the existing algorithm will carry out adaptive flushing to flush for the range from 700-250. If we try to track select qps during this time it would be the same as that between 0-250. Let me share an example: block-1: ==== average tps during adpative flushing: 165K-166K block-2: ==== average tps during idle flushing: 165K-166K block-3: ==== average tps during no flushing: 170K ------------------------------------------------------------------------------------------------------------------------ If we keep the idle flushing aside for a minute then the problem could be also re-defined as parallel adaptive flushing during select workload slows down select workload (by 3% in the said case). This should be looked upon separately. I presume there is mutex contention that we are hitting given pages are being flushed and also being read from the same buffer pool. | ||||||||||||||||||||||||||||
| Comment by Axel Schwenke [ 2023-03-16 ] | ||||||||||||||||||||||||||||
There definitely is. While dirty pages are flushed from the buffer pool, read operations are slowed down. My sysbench benchmarks just don't show that, because I
But this is visible in sysbench tpcc. As soon as log flushing kicks in, there is a surge in througput. | ||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-03-16 ] | ||||||||||||||||||||||||||||
|
|