[MDEV-26356] Performance regression after dict_sys.mutex removal Created: 2021-08-13 Updated: 2023-10-25 Resolved: 2021-09-16 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.6, 10.7 |
| Fix Version/s: | 10.6.5, 10.7.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Krunal Bauskar | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | performance, purge | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||
| Description |
|
Users can configure the number of purge threads to use. Currently, the existing logic looks at the increasing history length and accordingly increases the purge threads (respecting the user-set threshold). User active user workload causes history length to increase that, in turn, leads to more purge threads to get scheduled. Purge operation generates redo-records in addition to user workload redo-records. This dual redo generations cause intense pressure on redo-log which can easily cause redo-log to hit the threshold and there-by-causing jitter in overall throughput (including furious flushing). The proposed patch explores an adaptive purge thread scheduling approach based on the redo log fill factor. Logic tends to dynamically increase or decrease the purge threads based on how much redo log is filled, leaving enough space/resources/flush bandwidth for the user workload. Testing done so far has revealed quite encouraging results especially with slower disk where-in flush is unable to catch up with redo log generation rate. Increasing in history length doesn't tend to have a regressing effect on the queries. |
| Comments |
| Comment by Krunal Bauskar [ 2021-08-13 ] | |||||||||||
|
PR created: https://github.com/MariaDB/server/pull/1889 | |||||||||||
| Comment by Krunal Bauskar [ 2021-08-13 ] | |||||||||||
|
Check the attached graph for the details on how the patch helps slow disk case. As we could see from the graph above not only the performance stabilizes it also helps improve the performance by 2-2.5x. Visible user workload improvement with less jitter and controlled history length increase. | |||||||||||
| Comment by Marko Mäkelä [ 2021-09-14 ] | |||||||||||
|
I think that this must be treated as a bug in 10.6, to avoid an apparent performance regression after the fix of | |||||||||||
| Comment by Krunal Bauskar [ 2021-09-15 ] | |||||||||||
|
Marko, I don't foresee this has anything to do with dict_sys mutex removal as the new heading points it to. | |||||||||||
| Comment by Marko Mäkelä [ 2021-09-15 ] | |||||||||||
|
krunalbauskar, you are right that this optimization is meaningful independently of The change of the title documents why we would want to apply this change already in the 10.6 GA release series, instead of handling this as a performance enhancement in 10.7. It might be useful to additionally throttle the purge activity based on buffer pool contention (buf_pool.LRU eviction rate). But, because of | |||||||||||
| Comment by Krunal Bauskar [ 2021-09-16 ] | |||||||||||
|
1. Base purge framework was simplified and the patch of the adaptive purge was then applied over and above it. All that development is now being tracked under bb-10.6- | |||||||||||
| Comment by Krunal Bauskar [ 2021-09-16 ] | |||||||||||
|
please check "purge-thread=8 + nvme disk + [cpu|io-bound]" 1. for cpu-bound there is no change in tps. (expected) Note: in both cases we didn't hit the redo-log contention that would show the real effect of purge given disk was nvme and redo-log-size=20G (with data-size=70 and buffer-pool=80G/35G). | |||||||||||
| Comment by Marko Mäkelä [ 2021-09-16 ] | |||||||||||
|
krunalbauskar, thank you. I ran some more tests locally. On a SATA SSD misconfigured with too high innodb_io_capacity, using 40G buffer pool and 40G of data, this patch reduced the number of 5-second intervals with 0tps. The results are somewhat random, probably due to ‘furious flushing’ related to checkpointing. But, my general impression is that this does improve or stabilize performance. I will run one more comparison with the NVMe storage to get a more complete picture. | |||||||||||
| Comment by Marko Mäkelä [ 2021-09-16 ] | |||||||||||
|
After running oltp_update_non_index all day with different configurations, I can confirm that this seems to be an improvement. I observed an occasional regression (see adaptive_purge.tar.gz On both the SATA SSD and the NVMe, this patch seems to reduce the amount or duration of throughput dips. | |||||||||||
| Comment by Marko Mäkelä [ 2021-09-16 ] | |||||||||||
|
There were 2 tests hanging in wait_all_purged.inc. I tracked it down to the following code:
That is, if we consider the log to be too full, we will skip purge forever. It turns out that we sometimes failed to advance the log checkpoint even if the buffer pool was clean ( Before the break statement above, we might want to initiate a page flush. For now, I did not change that, to avoid the need to run extensive performance tests again. Instead, I patched the innodb_max_purge_lag_wait logic so that the innodb_gis.rtree_compress test would not time out. | |||||||||||
| Comment by Marko Mäkelä [ 2023-09-27 ] | |||||||||||
|
While analyzing | |||||||||||
| Comment by Marko Mäkelä [ 2023-09-27 ] | |||||||||||
|
I reran bench.sh from adaptive_purge.tar.gz With my experimental revert in | |||||||||||
| Comment by Marko Mäkelä [ 2023-09-27 ] | |||||||||||
|
During the 1-hour test run (most time being spent in a sequential load of data), the 10GiB redo log was overwritten more than 7 times (the LSN grew to 73G):
This is clearly a misconfigured (on purpose) system; the innodb_log_file_size should be large enough to accommodate 1 or 2 hours of writes. | |||||||||||
| Comment by Marko Mäkelä [ 2023-09-27 ] | |||||||||||
|
I repeated the experiment with innodb_log_file_size=80g; the raw data is in patched2-ssd40-80.txt I think that better performance could be achieved by not setting innodb_max_dirty_pages_pct_lwm=10 or touching the LRU parameters, and perhaps setting innodb_purge_threads=32. After all, we are testing with 10, 20, 40, 80, and 160 concurrent writer connections. | |||||||||||
| Comment by Marko Mäkelä [ 2023-09-27 ] | |||||||||||
|
I did one more run, with 40GiB buffer pool, 80GiB log, no flushing related parameters. I intended this to employ 32 purge threads, hence the file name for the raw data: patched2-ssd40-80-32.txt This time, the history list grew to 89 million transactions, and the slow shutdown took almost 25 minutes:
The slow shutdown (which automatically switches to the maximum innodb_purge_threads=32) fully employed only about 1.7 CPU cores. I checked some perf record while it was running, and I noticed that quite some CPU time was being spent on table lookup. Maybe each purge task should maintain a larger local cache that maps table ID to table pointers. Eviction would be prevented by reference counts. This test uses 100 tables of 1½ million rows each. Unlike my test in | |||||||||||
| Comment by Marko Mäkelä [ 2023-09-28 ] | |||||||||||
|
In |