[MDEV-33966] sysbench performance regression with concurrent workloads - Jira

XML

Word

Printable

Details

Type: Bug
Status: Stalled (View Workflow)
Priority: Critical
Resolution: Unresolved
Affects Version/s: 10.5(EOL), 10.6, 10.7(EOL), 10.8(EOL), 10.9(EOL), 10.10(EOL), 10.11, 11.0(EOL), 11.1(EOL), 11.2(EOL), 11.3(EOL), 11.4, 11.5(EOL), 11.6(EOL)
Fix Version/s: 10.11, 11.4
Component/s: Storage Engine - InnoDB
Labels:
Environment:
ubuntu 22.04

Bug Category:
Related to performance
Sprint:
Q3/2025 Maintenance, Q2/2026 Server Maintenance

Description

While I haven't seen significant performance regressions when comparing modern MariaDB (11.4, 10.11) with older MariaDB via sysbench with low concurrency workloads (see here). I have seen perf regressions once I use workloads with some concurrency.

This will take a few days to properly document.

From a server with 8 cores and sysbench run with 4 threads ...

the numbers in the table are the throughput relative to MariaDB 10.2.44 (x.ma100244_rel.z11a_bee.pk1) where 1.0 means the same, < 1.0 means a regression and > 1.0 means an improvement

If I use 0.8 as a cutoff, meaning some version gets less than 80% of the throughput relative to MariaDB 10.2, then from column 6 (col-6) the problem microbenchmarks are:

update-index_range=100, relative throughput is 0.25 in 11.4.1, problem arrives in 10.3
update-one_range=100, relative throughput is 0.65 in 11.4.1, problem arrives in 10.6
write-only_range=10000 , relative throughput is 0.77 in 11.4.1, problem arrives in 10.3.

Next step for this is to get flamegraphs and maybe PMP stacks.

The table relies on fixed width fonts to be readable but the "preformatted" option in JIRA doesn't do what I want it to do so the data is here

Next up is a server with 2 sockets and 12 cores/socket and the benchmark was run with 16 threads. The results are here. Again, using 0.8 as a cutoff and looking at col-6 (MariaDB 11.4.1) the problem microbenchmarks are:

insert_range=100, relative throughput is 0.73 in 11.4.1, there are gradual regressions starting in 10.3, but the largest are from 10.11 and 11.4
update-index_range=100, relative throughput is 0.18 in 11.4.1, problem starts in 10.5 and 10.11->11.4 is the biggest drop
update-inlist_range=100, relative thoughput is 0.56 in 11.4.1, problem is gradual from 10.3 through 11.4
update-nonindex_range=100, relative throughput is 0.69 in 11.4.1, problems arrive in 10.11 and 11.4
update-one_range=100, relative throughput is 0.61 in 11.4.1, problem arrives in 10.6
update-zipf_range=100, relative throughput is 0.75 in 11.4.1, problem arrives in 11.4
write-only_range=10000, relative throughput is 0.59 in 11.4.1, problems arrive in 10.11 and 11.4

Finally a server with 32 cores (AMD Threadripper) and the benchmark was run with 24 threads. The results are here and the problem microbenchmarks are:

points-notcovered-pk_range=100, relative throughput is 0.65 in 11.4.1, problem arrives in 10.5
points-notcovered-si_range=100, relative throughput is 0.77 in 11.4.1, problem arrives in 10.5
random-points_range=1000, relative throughput is 0.65 in 11.4.1, problem arrives in 10.5
random-points_range=100, relative throughput is 0.65 in 11.4.1, problem arrives in 10.5
range-notcovered-si_range=100, relative throughput is 0.59 in 11.4.1, problem arrives in 10.5
read-write_range=10, relative throughput is 0.79 in 11.4.1, problem arrives in 10.11
update-index_range=100, relative throughput is 0.64 in 11.4.1, problem arrives in 10.11 and 11.4
update-inlist_range=100, relative throughput is 0.61 in 11.4.1, problem arrives in 10.3, 10.5, 10.11
write-only_range=10000, relative throughput is 0.75 in 11.4.1, problem arrives in 10.11, 11.4

At this point my hypothesis is that the problem is from a few changes to InnoDB but I need more data to confirm or deny that.

On the 24-core server (2 sockets, 12 cores/socket) I repeated sysbench for 1, 4, 8, 12, 16 and 18 threads. And then on the 32-core server I repeated it for 1, 4, 8, 12, 16, 20 and 24 threads. The goal was to determine at which thread count the regressions become obvious. Alas, I only used a subset of the microbenchmarks to get results in less time. Another run with more microbenchmarks is in progress.

The results will be in comments to follow.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

dwarf_vu124_35155_33966_nsp_new_flamegraph.svg
2.22 MB
2025-10-31 10:34
image (1).png
60 kB
2025-10-30 01:42
image (2).png
96 kB
2025-10-30 01:42
image (3).png
82 kB
2025-10-30 01:42
log_csv_baseline.zip
3.15 MB
2025-10-24 17:35
log_csv_MDEV_33966.zip
3.20 MB
2025-10-24 17:35
mdev_37244_1212_flamegraph.zip
1.60 MB
2025-10-17 13:43
OFFCPU_MDEV_33966_Wrong_Patch.zip
322 kB
2025-11-11 11:31
Percentage_Drop.png
55 kB
2025-10-30 01:43
Stored_Procedure_No_Stored_Procedure_Flamegraph.zip
2.82 MB
2025-10-30 01:41

Issue Links

blocks

MDEV-37924 buf_pool.mutex contention under I/O-limited OLTP workload

Open

is blocked by

MDEV-34515 Contention between secondary index UPDATE and purge due to large innodb_purge_batch_size

Closed

MDEV-34759 buf_page_get_low() is unnecessarily acquiring exclusive latch on secondary index pages

Closed

MDEV-36931 performance regression in TPROC-C workload in 10.6.17

Closed

MDEV-38069 Heavy contention on buf_pool.flush_list_mutex

Closed

is caused by

MDEV-15058 Remove multiple InnoDB buffer pool instances

Closed

relates to

MDEV-16232 Use fewer mini-transactions

Stalled

MDEV-31956 SSD based InnoDB buffer pool extension

Stalled

MDEV-32176 Contention in ha_innobase::info_low (dict_table::lock_mutex_lock)

Closed

MDEV-35125 Unnecessary buf_pool.page_hash lookups

Closed

MDEV-35155 Small innodb_log_file_size leads to excessive write amplification

Closed

MDEV-37992 select with large in-list has too much mutex contention

Open

SAMU-332 Loading...

(1 is caused by, 7 relates to)

Activity

People

Assignee:: Marko Mäkelä

Reporter:: Mark Callaghan

Assigned for Implementation:: Marko Mäkelä

Assigned for Review:: Vladislav Lesin (Inactive)

Assigned for Testing:: Vladislav Lesin (Inactive)

Votes:: 1 Vote for this issue

Watchers:: 21 Start watching this issue

Dates

Created:: 2024-04-22 22:18

Updated:: 2026-03-05 08:41

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.