[MDEV-30628] 10.6 performance regression with sustained high-connection write-only OLTP workload (55-80% degradation) Created: 2023-02-09 Updated: 2024-01-12 |
|
| Status: | Needs Feedback |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.6.11, 10.6.12 |
| Fix Version/s: | 10.6, 10.11, 11.0, 11.1, 11.2 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Dishon Merkhai | Assignee: | Marko Mäkelä |
| Resolution: | Unresolved | Votes: | 5 |
| Labels: | performance, regression | ||
| Attachments: |
|
||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Description |
SummaryOn MariaDB 10.6.12 (found in 10.6.11) at a sustained (800 second) OLTP write-only workload at 64 benchmark threads/connections (48 core / 96 thread hardware), the throughput starts high (~2200 TPS) and then gradually decreases starting from the 450 second mark until the end of the run (ends at ~980 TPS - 55.4% drop-off in performance). Performance is stable on MariaDB 10.5.18 (no regression), which averaged 2625 TPS over the 800s benchmark, with no drop-off in TPS. innodb_flush_method was set to O_DIRECT for both MariaDB 10.6 and 10.5 benchmarks. Additional information
Reproduce
Other Details
"Logs & Flamegraphs" FolderThe attached Logs & Flamegraphs.zip file contains the following:
Each run has a photo graphing the TPS over time. The graphs for runs 1 of MariaDB 10.5.18 and 10.6.12 are also attached directly to this ticket. |
| Comments |
| Comment by Otto Kekäläinen [ 2023-02-16 ] | |||||||||||||||||||
|
If you marko can provide some pointers here Dishon might be able to spend a week or two trying to submit a PR to fix this. | |||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-02-16 ] | |||||||||||||||||||
|
otto, in fact, last week I worked on some changes that have passed initial correctness testing, but have not been tested for performance yet, for I will try to prepare a branch with these changes today or tomorrow. Testing and some analysis of parameters like those in | |||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-02-16 ] | |||||||||||||||||||
|
otto, I created a branch with some performance oriented changes, with two commits for | |||||||||||||||||||
| Comment by Dishon Merkhai [ 2023-02-17 ] | |||||||||||||||||||
|
Marko Mäkelä, thank you for providing insight into related developments, as well as compiling performance-related changes into a branch. I benchmarked the branch you shared and found that the issue still happens, but two characteristics are different. Comparing the branch (which is based on MariaDB 10.6.13) against the baseline MariaDB 10.6.12: 1. The overall TPS prior to the performance degradation is lower with the branch: ~1800 TPS with the branch on MariaDB 10.6.13 vs. ~2300 TPS on the baseline MariaDB 10.6.12 (i.e. 21.74% slower with the branch). I ran the benchmark four times using the same configuration/environment as in the original bug report. I've attached two graphs, mariadb-10-6-13-with-branch-four-runs.png and mariadb-10-6-13-with-branch-average.png. The first shows the TPS over time for each of the four runs, the second shows the average TPS across all four runs. Note: I'm not sure whether the difference in performance is related to 10.6.12 vs. 10.6.13 or if its due to the changes in the branch. I can run benchmarks with MariaDB 10.6.13 as the baseline to determine that as well. | |||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-02-20 ] | |||||||||||||||||||
|
dishonme, thank you for running some benchmarks. First, I need to clarify that the VERSION file in the source tree is usually updated to indicate the next minor release as soon as the latest release has been published. This "10.6.13" is actually pretty close to the mariadb-10.6.12 release tag. It is very likely that The reduction of throughput over time may due to | |||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-02-20 ] | |||||||||||||||||||
|
dishonme, I posted some quick results in | |||||||||||||||||||
| Comment by Dishon Merkhai [ 2023-02-22 ] | |||||||||||||||||||
|
Marko Mäkelä, I tracked only the innodb_buffer_pool_wait_free metric during the original benchmarks. In response to your comments, I'll do the following next:
I could also attempt to narrow down the exact minor/patch version of 10.6 where the issue first appears? Though I'm not immediately sure how useful this would be. Thanks as well for helping me interpret the VERSION file. | |||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-02-24 ] | |||||||||||||||||||
|
dishonme, thank you, I am looking forward to your results. I see these possible follow-ups to
| |||||||||||||||||||
| Comment by Dishon Merkhai [ 2023-02-27 ] | |||||||||||||||||||
|
Marko Mäkelä, I will run benchmarks on the first commit today. I should have results in a few hours. | |||||||||||||||||||
| Comment by Dishon Merkhai [ 2023-02-27 ] | |||||||||||||||||||
|
Marko Mäkelä, I have attached benchmark results (six runs) of only the first commit (improve adaptive flushing). I still need to run benchmarks with metrics collected, I'll aim to get that done today and update when complete - same goes for the off-CPU flamegraphs. | |||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-03-01 ] | |||||||||||||||||||
|
dishonme, thank you. That commit made some significant improvement in our internal Sysbench TPC-C tests, but caused a regression for some other tests. I am revising it. Did you sample the history list length during the run? While I believe that it is a plausible explanation for the regression that you are observing, we do not have any evidence of that yet. | |||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-03-01 ] | |||||||||||||||||||
|
dishonme, I force-pushed the branch. I did not have a chance to run any performance tests on the revised commit of | |||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-03-02 ] | |||||||||||||||||||
|
Initial results from our internal testing (with a non-default innodb_max_dirty_pages_pct_lwm) are positive. Another test with the default innodb_max_dirty_pages_pct_lwm=0 has been started, so that the buf_flush_page_cleaner() will have a chance to write out or evict any pages from the buf_pool.LRU list. | |||||||||||||||||||
| Comment by Dishon Merkhai [ 2023-03-02 ] | |||||||||||||||||||
|
Marko Mäkelä, I apologize on the lack of updates; I'll run benchmarks with metrics (history list length, etc.) today and share the results, I'll collect metrics on:
| |||||||||||||||||||
| Comment by Dishon Merkhai [ 2023-03-03 ] | |||||||||||||||||||
|
Marko Mäkelä, I have attached a zip file ( benchmarks-with-metrics.zip For all three, Innodb_history_list_length steadily increases, reaching 1,420,000+ by the end of the 800s run. Minor note: The 800s mark of the metric logs is a record of the metrics a few seconds (1-5s) after the benchmark is completed. Looking at the 790s mark should be more indicative of the metrics at the very end of the run while load is still applied. | |||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-03-03 ] | |||||||||||||||||||
|
dishonme, thank you. This confirms my suspicion that | |||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-03-03 ] | |||||||||||||||||||
|
For the record, I attempted to revive | |||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-03-10 ] | |||||||||||||||||||
|
dishonme, we got some very promising performance test results. I have now rebased the branch. This commit corresponds to what showed promising results. After discussing with axel, I added the status variable INNODB_BUFFER_POOL_PAGES_SPLIT, because we suspect that an increased write activity starting at a 1200-second mark into his benchmark could be related to page splits. I would appreciate it if you could repeat the performance tests. Based on our own testing, it might be that nothing further needs to be done about | |||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-03-10 ] | |||||||||||||||||||
|
One more fixup to buf_LRU_get_free_block() is needed to prevent a hang. | |||||||||||||||||||
| Comment by Dishon Merkhai [ 2023-03-13 ] | |||||||||||||||||||
|
Marko Mäkelä, that's great news - I'll run the benchmarks later tonight based on the fixup commit and share the results here by tomorrow. | |||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-03-13 ] | |||||||||||||||||||
|
dishonme, thank you. If possible, I would recommend testing the current head of the branch. The branch now also includes a fix of | |||||||||||||||||||
| Comment by Dishon Merkhai [ 2023-03-14 ] | |||||||||||||||||||
|
Marko Mäkelä, I didn't get to run the benchmark yesterday. I see there are also new commits, I'll run the benchmark on the current head of the branch, I'll let you know when I have the results. | |||||||||||||||||||
| Comment by Dishon Merkhai [ 2023-03-15 ] | |||||||||||||||||||
|
Marko Mäkelä, I have attached a zip file ( benchmarks-fixed-hangs-commit-00d2a3c7.zip It looks like throughput still decreases over time, however the first 500 seconds are more stable compared to past benchmarks on different commits/branches (looking at the sudden TPS dips). | |||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-03-16 ] | |||||||||||||||||||
|
dishonme, thank you. Sorry, it has been a moving target. There were some changes on Tuesday and Wednesday, to avoid hangs related to buffer page eviction. Could you please test 10.6 as of this commit once more? | |||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-03-21 ] | |||||||||||||||||||
|
dishonme, I see that during the 800 seconds of the run-4-db-metrics.log, Innodb_history_list_length grows all the way to 1,347,228 (at 790s) before starting to shrink a little during the last 10-second interval. This strongly suggests that we need to do something about Based on our internal testing, we will not change the default value innodb_max_dirty_pages_pct_lwm=0. | |||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-03-22 ] | |||||||||||||||||||
|
dishonme, I created a branch that removes the throttling that was added in | |||||||||||||||||||
| Comment by Dishon Merkhai [ 2023-03-22 ] | |||||||||||||||||||
|
Marko Mäkelä,
The decrease in the last 10 seconds is very likely due to the sysbench benchmark terminating (the metric log is lagging by a few seconds). I'll benchmark the following commits tomorrow morning and share the results: | |||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-03-27 ] | |||||||||||||||||||
|
dishonme, I am looking forward to your updates. There is also | |||||||||||||||||||
| Comment by Dishon Merkhai [ 2023-03-27 ] | |||||||||||||||||||
|
Marko Mäkelä, I have attached results for the code changes: MDEV-30628 Benchmarks March 24-27.zip The results for MDEV-26827-Make-page-flushing-even-faster show throughput begin to decrease around the 480s mark. The results for Revert-most-of-ea52a3e are mixed:
| |||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-03-29 ] | |||||||||||||||||||
|
dishonme, thank you. axel mentioned having observed something similar, and also I remember from the past (some time in 2021 or 2022) that when I ran the Sysbench oltp_update_index benchmark for some minutes, the growth of the history list length would cause significant slowdown. A rule of thumb from your runs seems to be that once the history list length grows above 950,000 the performance will suffer. In a "bad" run of Revert-most-of-ea52a3e I see that the history list remains much shorter than in a "good" run that I checked. Whether that is due to less work having been done, or due to something else, I do not know. It could be useful to log also Innodb_max_trx_id (export_vars.innodb_max_trx_id = trx_sys.get_max_trx_id()) to keep track of how much actual work had been done at each point. It looks like right after we are done testing | |||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-03-29 ] | |||||||||||||||||||
|
To combat the growth of the history list, I have one more idea:
Something similar (minus the if statement) was the only thing that was not reverted in Revert-most-of-ea52a3e. | |||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-04-26 ] | |||||||||||||||||||
|
I can reproduce the excessive history list length growth in | |||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-04-27 ] | |||||||||||||||||||
|
dishonme, can you please test a 10.6 branch that includes a fix of Furthermore, I wonder if 10.6 would be closer to 10.5 performance if you specify innodb_flush_method=fsync, to essentially revert | |||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-07-07 ] | |||||||||||||||||||
|
I’m sorry to see that this report had been closed as incomplete. Some related news:
| |||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-08-11 ] | |||||||||||||||||||
|
dishonme, does your test fare any better with MariaDB Server 10.6.14, or better yet, with a recent development snapshot of 10.6 (which is close to the upcoming 10.6.15 release)? I believe that the main remaining problem is the history list growth. It would be advisable to play with the DML rate limiting parameters that were improved in For I/O bound workloads (where the working set exceeds innodb_buffer_pool_size) it could be worth testing if innodb_flush_method=fsync (essentially reverting | |||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-10-25 ] | |||||||||||||||||||
|
dishonme, can you please test the latest 10.6 branch, which now includes a fix of | |||||||||||||||||||
| Comment by Dishon Merkhai [ 2023-11-03 ] | |||||||||||||||||||
|
Hi Marko Mäkelä, apologies for the lack of responses - I can test the latest revision of 10.6 this coming week and will share the results once completed. | |||||||||||||||||||
| Comment by Marko Mäkelä [ 2024-01-12 ] | |||||||||||||||||||
|
I think that | |||||||||||||||||||
| Comment by Dishon Merkhai [ 2024-01-12 ] | |||||||||||||||||||
|
Hi, I can test the commits resulting from The benchmarks I ran for the previously mentioned Will share results once prepared. |