Details
-
Bug
-
Status: Stalled (View Workflow)
-
Critical
-
Resolution: Unresolved
-
10.5, 10.6, 10.7(EOL), 10.8(EOL), 10.9(EOL), 10.10(EOL), 10.11, 11.0(EOL), 11.1(EOL), 11.2(EOL), 11.3(EOL), 11.4, 11.5(EOL), 11.6(EOL)
-
ubuntu 22.04
Description
While I haven't seen significant performance regressions when comparing modern MariaDB (11.4, 10.11) with older MariaDB via sysbench with low concurrency workloads (see here). I have seen perf regressions once I use workloads with some concurrency.
This will take a few days to properly document.
From a server with 8 cores and sysbench run with 4 threads ...
- the numbers in the table are the throughput relative to MariaDB 10.2.44 (x.ma100244_rel.z11a_bee.pk1) where 1.0 means the same, < 1.0 means a regression and > 1.0 means an improvement
If I use 0.8 as a cutoff, meaning some version gets less than 80% of the throughput relative to MariaDB 10.2, then from column 6 (col-6) the problem microbenchmarks are:
- update-index_range=100, relative throughput is 0.25 in 11.4.1, problem arrives in 10.3
- update-one_range=100, relative throughput is 0.65 in 11.4.1, problem arrives in 10.6
- write-only_range=10000 , relative throughput is 0.77 in 11.4.1, problem arrives in 10.3.
Next step for this is to get flamegraphs and maybe PMP stacks.
The table relies on fixed width fonts to be readable but the "preformatted" option in JIRA doesn't do what I want it to do so the data is here
Next up is a server with 2 sockets and 12 cores/socket and the benchmark was run with 16 threads. The results are here. Again, using 0.8 as a cutoff and looking at col-6 (MariaDB 11.4.1) the problem microbenchmarks are:
- insert_range=100, relative throughput is 0.73 in 11.4.1, there are gradual regressions starting in 10.3, but the largest are from 10.11 and 11.4
- update-index_range=100, relative throughput is 0.18 in 11.4.1, problem starts in 10.5 and 10.11->11.4 is the biggest drop
- update-inlist_range=100, relative thoughput is 0.56 in 11.4.1, problem is gradual from 10.3 through 11.4
- update-nonindex_range=100, relative throughput is 0.69 in 11.4.1, problems arrive in 10.11 and 11.4
- update-one_range=100, relative throughput is 0.61 in 11.4.1, problem arrives in 10.6
- update-zipf_range=100, relative throughput is 0.75 in 11.4.1, problem arrives in 11.4
- write-only_range=10000, relative throughput is 0.59 in 11.4.1, problems arrive in 10.11 and 11.4
Finally a server with 32 cores (AMD Threadripper) and the benchmark was run with 24 threads. The results are here and the problem microbenchmarks are:
- points-notcovered-pk_range=100, relative throughput is 0.65 in 11.4.1, problem arrives in 10.5
- points-notcovered-si_range=100, relative throughput is 0.77 in 11.4.1, problem arrives in 10.5
- random-points_range=1000, relative throughput is 0.65 in 11.4.1, problem arrives in 10.5
- random-points_range=100, relative throughput is 0.65 in 11.4.1, problem arrives in 10.5
- range-notcovered-si_range=100, relative throughput is 0.59 in 11.4.1, problem arrives in 10.5
- read-write_range=10, relative throughput is 0.79 in 11.4.1, problem arrives in 10.11
- update-index_range=100, relative throughput is 0.64 in 11.4.1, problem arrives in 10.11 and 11.4
- update-inlist_range=100, relative throughput is 0.61 in 11.4.1, problem arrives in 10.3, 10.5, 10.11
- write-only_range=10000, relative throughput is 0.75 in 11.4.1, problem arrives in 10.11, 11.4
At this point my hypothesis is that the problem is from a few changes to InnoDB but I need more data to confirm or deny that.
On the 24-core server (2 sockets, 12 cores/socket) I repeated sysbench for 1, 4, 8, 12, 16 and 18 threads. And then on the 32-core server I repeated it for 1, 4, 8, 12, 16, 20 and 24 threads. The goal was to determine at which thread count the regressions become obvious. Alas, I only used a subset of the microbenchmarks to get results in less time. Another run with more microbenchmarks is in progress.
The results will be in comments to follow.
Attachments
Issue Links
- is blocked by
-
MDEV-34515 Contention between secondary index UPDATE and purge due to large innodb_purge_batch_size
-
- Closed
-
-
MDEV-34759 buf_page_get_low() is unnecessarily acquiring exclusive latch on secondary index pages
-
- Closed
-
- is caused by
-
MDEV-15058 Remove multiple InnoDB buffer pool instances
-
- Closed
-
- relates to
-
MDEV-32176 Contention in ha_innobase::info_low (dict_table::lock_mutex_lock)
-
- Closed
-
-
MDEV-35125 Unnecessary buf_pool.page_hash lookups
-
- Closed
-
-
MDEV-35155 Performance degradation and unstable observed on 10.6.19
-
- Confirmed
-
Results using the latest point releases with both low-concurrency (1 thread) and high concurrency (40 threads). Not much is new, other than I have results from new versions:
All of the my.cnf files I use are archived here:
The gists I link to below use a naming pattern like x.ma110502_rel_withdbg.z11b_c8r32.pk1 and x.ma110502_rel_withdbg.z11b_lwas4k_c8r32.pk1. For now ignore the "x." at the start and the ".pk1" at the end. The "ma110502_rel_withdbg" means:
The links that follow have results in terms of relative QPS (rQPS) which is: (QPS for my version / QPS for older MariaDB). When I look at results across MariaDB major versions then the base case is 10.2.44. When I look at results only for 10.5 point releases then the base case is 10.4.34. When the relative QPS is 1.0 then QPS has not changed, when it is much less than 1.0 then there is a large regression.
First up is the results for the latest point releases at low concurrency. For point releases that support innodb_log_write_ahead_size I include results with it set to =4k and with it not set. The results are here and modern MariaDB almost always has relative QPS >= 0.9 which is OK.
For low concurrency with a focus on MariaDB 10.5 see here which has numbers for 10.5.0, 10.5.4, 10.5.10, 10.5.20, 10.5.24 and 10.5.27. The reason to share the results at low concurrency is to show there aren't big regressions in 10.5 at low concurrency in contrast to the results below that I share.
And note that my focus is on 10.5 because that is where the big regressions first arrived. They are still here, but to understand the root cause I need to show where they start.
Next up is results from the latest. point releases at high concurrency (40 threads) on a server with 48 cores (real cores, AMD SMT is disabled). The relative QPS numbers are here. In the worst cases the relative QPS drops to ~0.4 in 10.5 and has remained there through 11.7. This means older MariaDB gets ~2.5X more QPS than modern MariaDB:
Results from the classic sysbench transaction are here. With them the relative QPS in modern MariaDB is 0.83 or 0.88 depending on the length of the range queries it uses. The regressions here arrived in 10.11 and not much changes since then. If you only use classic sysbench then you will miss some of the large regressions, because the regressions here are smaller than above.
A few write-heavy tests also have large regressions, but they aren't as bad as the results mentioned above
Results from 10.5 point releases are here for high concurrency. I tried all point releases from 10.5.0 through 10.5.10, the even number point releases from 10.5.12 through 10.5.26 and then 10.5.27. I won't annotate them other than to say that the big regressions for read-heavy microbenchmarks arrived prior 10.5.10