[MDEV-34178] Performance regression with IO-bound insert benchmark - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.6.17, 10.11.7, 11.4.1
Fix Version/s: 10.6.19, 10.11.9, 11.1.6, 11.2.5, 11.4.3, 11.5.2
Component/s: Storage Engine - InnoDB
Labels:
None

Description

This is similar to regressions I found with CPU-bound sysbench (MDEV-33966), but I opened a separate issue because this occurs with IO-bound Insert Benchmark.

I ran an IO-bound Insert Benchmark (IO-bound because the working set and database are much larger than memory) on a small server (8 cores, 16G RAM) to compare MariaDB LTS releases for 10.2, 10.3, 10.4, 10.5, 10.6, 10.11 and upcoming 11.4 with MySQL 5.6, 5.7 and 8.0.

A result for a CPU-bound benchmark is here which covers a test that uses a smaller database that can be cached and isn't IO-bound. In the cased case, MariaDB doesn't have large regressions from 10.2 to 11.4. Here, with an IO-bound setup there are regressions.

The IO-bound tests were run for 1 and 4 clients:

all DBMS - 1 client and 4 clients
all MariaDB LTS - 1 client and 4 clients
all MySQL - 1 client and 4 clients
MariaDB vs MySQL - 1 client and 4 clients
MariaDB 10.11 - 1 client and 4 clients
MariaDB 11.4 - 1 client and 4 clients

The way I label results context on the DBMS version and my.cnf

ma101107_rel.cz11a_bee - MariaDB 10.11.7 with the cz11a_bee config that uses innodb_flush_method=O_DIRECT_NO_FSYNC and innodb_change_buffering=none
ma101107_rel.cz11b_bee - MariaDB 10.11.7 with the cz11b_bee config that uses innodb_flush_method=O_DIRECT and innodb_change_buffering=none
ma110401_rel.cz11b_bee - MariaDB 10.11.7 with the cz11b_bee config that uses innodb_flush_method=O_DIRECT and the InnoDB change buffer has been removed
my8036_rel.cz11a_bee - MySQL 8.0.36 with the cz11a_bee config that uses innodb_flush_method=O_DIRECT_NO_FSYNC and innodb_change_buffering=all
my8036_rel.cz11d_bee - MySQL 8.0.36 with the cz11d_bee config that uses innodb_flush_method=O_DIRECT_NO_FSYNC and innodb_change_buffering=none

Note that the cz11d_bee config for MySQL 8.0.36 is similar to the cz11a_bee config for MariaDB.

My claim about the regressions is based on the following

start with the MariaDB vs MySQL comparison for 1 client and 4 clients and the results for MySQL 8.0.36 (my8036_rel.cz11a_bee and my8036_rel.cz11d_bee) are much better than for MariaDB 10.11.7 (ma101107_rel.cz11a_bee) and 11.4.1 (ma110401_rel.cz11b_bee).
then look at results for MariaDB LTS releases with 1 client and 4 clients and see some regressions from ma100433 (10.4.33) to ma100524 (10.5.24) and larger regressions from ma100524 to ma100617 (10.6.17)
then look at the HW metrics for MariaDB LTS releases from the 1 client setup. These are values from vmstat and iostat normalized by query and insert rates to understand HW efficiency. For the write heavy benchmark steps from 10.4.33 through 10.6.17 I see a ~20% increase in context switches per insert (cspq) and ~20% decrease in CPU per insert (cpupq) – see for l.i1 and for l.i2. Most of the change is from 10.5.24 to 10.6.17. I assume this is a result of the changes in 10.6 to replace some mutexes and rw-locks from spinning to not-spinning. So there is less CPU burned, but more lock waiters are going to sleep.
the read-write benchmark steps also show a similar pattern as the write rate increases. See the 1 client results for range queries and point queries when the background write rate is 1000/s. Although here I see an increase in cspq (more context switches per query == more threads going to sleep) but not a large decrease in cpupq (CPU per query)

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

image-2024-06-26-16-08-14-626.png
81 kB
2024-06-26 14:08
MDEV-34178_g1_g2.pdf
52 kB
2024-06-19 17:05
MDEV-34178.pdf
76 kB
2024-06-19 20:18
test_output_sudo2.txt
0.4 kB
2024-08-20 19:07
test_output1.txt
3 kB
2024-08-20 19:07
test_output3.txt
77 kB
2024-08-21 20:52
update_index_10.11.txt
106 kB
2024-06-26 14:11
update_index_10.4.txt
103 kB
2024-06-26 14:11

Issue Links

blocks

MDEV-34759 buf_page_get_low() is unnecessarily acquiring exclusive latch on secondary index pages

Closed

relates to

MDEV-34431 More fine grained control of spin loops could be useful

Stalled

MDEV-34443 ha_innobase::info_low() does not distinguish HA_STATUS_VARIABLE_EXTRA

Closed

MDEV-34458 wait_for_read in buf_page_get_low hurts performance

Closed

Activity

Ascending order - Click to sort in descending order

View 72 older comments

Mark Callaghan added a comment - 2024-09-23 00:49

@kirill.perov@mariadb.com can you be more specific

The perf reports that I publish include links to charts with QPS reported per 1-second interval. To find them grep for "Graphs for performance per 1-second interval"

From looking at the charts now I tend to see more variance in the MariaDB results, and variance isn't good.

Mark Callaghan added a comment - 2024-09-23 00:49 @kirill.perov@mariadb.com can you be more specific The perf reports that I publish include links to charts with QPS reported per 1-second interval. To find them grep for "Graphs for performance per 1-second interval" From looking at the charts now I tend to see more variance in the MariaDB results, and variance isn't good.

Mark Callaghan added a comment - 2024-12-31 19:37

This is my last big batch of results from the Insert Benchmark because my contract for testing InnoDB has run out. I have results from a big server (48 cores) and a small server (8 cores).

There are 1 to 3 concurrent connections per client and at most 2 busy connections per client. On the big server the test is run with 20 clients. On the small server the test is run with 1 and 6 clients.

The test is also repeated with cached (database fits in InnoDB block cache) and IO-bound (working set is larger than memory) workloads.

The my .cnf files are in subdirectories here for the small server and here for the big server. I try to make them as similar as possible – for example, the InnoDB change buffer is disabled for the older versions that still supported it.

For the big server with a cached workload the base case is MariaDB 10.4.34 and I tested the latest version from each LTS release after that.

range queries (qr*) has no regression from 10.4 to 11.7
point queries (qp*) has a small regression (11.7.1 gets about 87% of the QPS vs 10.4.34)
initial load (l.i0) shows a large regression (11.7.1 gets 56% of the throughput vs 10.4.34). Most of the regression arrived in 10.5 and the root cause might be CPU overhead which increased a lot from 10.4 through 10.11 (see here
random inserts+deletes (l.i1, l.i2) has a large regression (11.7.1 gets <= 68^% of the throughput vs 10.4.34). Most of the regression arrived in 10.5 and CPU overhead might be the root cause (see here)

For the big server with an IO-bound workload the base case is MariaDB 10.4.34

none of the versions were able to sustain the target insert rates for qr500 & qp500 (10000/s) or for qr1000 & qp1000 (20000/s), see here and look at the third table
range query performance (qr*) improved from 10.4 to 11.7 but that is harder to judge given the failure to sustain the target write rates
point query performance is also hard to judge given the target write rate failures, but for qp100 MariaDB 11.7 does the same as 10.4, and then 11.7 has regressions for qp500 and qp1000
the initial load (l.i0) which isn't IO-bound is similar to the result above for the cached workload. There is a large regression from 10.4 to 11.7, most of that arrived in 10.5 and CPU overheads might be the issue see cpupq here
the l.i1 and l.i2 steps that do random inserts+deletes have a small regression for l.i1 and no regression for l.i2

For the small server with a cached workload and 1 client – see here the base case is MariaDB 10.2.44

there is a small regression for the initial load (l.i0) where 11.7.1 gets 88% of the throughput vs 10.2.44
there are small regressions for range queries (qr*) where 11.7.1 gets ~90% of the throughput vs 10.2.44
there are medium regressions for point queries (qp*) where 11.7.1 gets ~80% of the throughput vs 10.2.44
for random inserts+deletes (l.i1, l.i2) 11.7.1 gets ~10% more throughput than 10.2.44
MariaDB 11.7.1 and MySQL 8.0.40 have similar performance

For the small server with a cached workload and 6 client – see here the base case is MariaDB 10.2.44

there is a small regression for the initial load (l.i0) where 11.7.1 gets 85% of the throughput vs 10.2.44
there are no regressions for range queries (qr*)
there are small regressions for point queries (qp*) where 11.7.1 gets ~90% of the throughput vs 10.2.44
for random inserts+deletes (l.i1, l.i2) 11.7.1 gets less throughput than 10.2.44 on l.i1 but more on l.i2
MariaDB 11.7.1 and MySQL 8.0.40 have similar performance except on the l.i1 and l.i2 steps (random inserts + deletes) where MariaDB is about 2X faster

For the small server with an IO-bound workload and 1 client – see here
the base case is MariaDB 10.2.44

there is a small regression for the initial load (l.i0) where 11.7.1 gets 89% of the throughput vs 10.2.44
there are small regressions for range queries (qr*) where 11.7.1 gets ~90% of the throughput vs 10.2.44
there are no regressions for point queries (qp*)
for random inserts+deletes (l.i1, l.i2) 11.7.1 gets about 80% of the throughput vs 10.2.44. A large part of this regression arrived in 11.4
for MariaDB 11.7.1 vs MySQL 8.0.40, MariaDB is faster on the initial load (l.i0), MySQL is faster on random inserts + deletes (l.i1, l.i2), MySQL is faster on range queries and they are similar on point queries

For the small server with an IO-bound workload and 6 clients – see here
the base case is MariaDB 10.2.44

MariaDB doesn't sustain the target write rates (6000/s) for the qr1000 and qp1000 steps with 10.5, and 11.4 through 11.7
there is a medium regression for the initial load (l.i0) where 11.7.1 gets 81% of the throughput vs 10.2.44
there are medium regressions for range queries (qr*) where 11.7.1 gets 91%, 81% and 82% of the throughput vs 10.2.44. Recent versions don't sustain the target write rate for qr1000.
there are medium regressions for point queries (qr*) where 11.7.1 gets 101%, 71% and 71% of the throughput vs 10.2.44. Recent versions don't sustain the target write rate for qp1000.
for random inserts+deletes (l.i1, l.i2) 11.7.1 gets 30% and 78% of the throughput vs 10.2 for l.i1 and l.i2. From the metrics for l.i1 nothing stands out as CPU/operation (cpupq) and context switches/operation (csps) don't change much. Starting in 11.4 the average context switch rate (csps) and the average CPU utilization (cpups) starting both drop a lot. I assume there is some kind of stall (perhaps mutex contention) or thread scheduling issue. I will repeat a few tests to get thread stacks with PMP.
for MariaDB 11.7.1 vs MySQL 8.0.40, perf is similar on the initial load (l.i0), MySQL is faster on random inserts + deletes (l.i1, l.i2), MySQL is a lot faster on point and range queries.

Mark Callaghan added a comment - 2024-12-31 19:37 This is my last big batch of results from the Insert Benchmark because my contract for testing InnoDB has run out. I have results from a big server (48 cores) and a small server (8 cores). There are 1 to 3 concurrent connections per client and at most 2 busy connections per client. On the big server the test is run with 20 clients. On the small server the test is run with 1 and 6 clients. The test is also repeated with cached (database fits in InnoDB block cache) and IO-bound (working set is larger than memory) workloads. The my .cnf files are in subdirectories here for the small server and here for the big server . I try to make them as similar as possible – for example, the InnoDB change buffer is disabled for the older versions that still supported it. For the big server with a cached workload the base case is MariaDB 10.4.34 and I tested the latest version from each LTS release after that. range queries (qr*) has no regression from 10.4 to 11.7 point queries (qp*) has a small regression (11.7.1 gets about 87% of the QPS vs 10.4.34) initial load (l.i0) shows a large regression (11.7.1 gets 56% of the throughput vs 10.4.34). Most of the regression arrived in 10.5 and the root cause might be CPU overhead which increased a lot from 10.4 through 10.11 ( see here random inserts+deletes (l.i1, l.i2) has a large regression (11.7.1 gets <= 68^% of the throughput vs 10.4.34). Most of the regression arrived in 10.5 and CPU overhead might be the root cause ( see here ) For the big server with an IO-bound workload the base case is MariaDB 10.4.34 none of the versions were able to sustain the target insert rates for qr500 & qp500 (10000/s) or for qr1000 & qp1000 (20000/s), see here and look at the third table range query performance (qr*) improved from 10.4 to 11.7 but that is harder to judge given the failure to sustain the target write rates point query performance is also hard to judge given the target write rate failures, but for qp100 MariaDB 11.7 does the same as 10.4, and then 11.7 has regressions for qp500 and qp1000 the initial load (l.i0) which isn't IO-bound is similar to the result above for the cached workload. There is a large regression from 10.4 to 11.7, most of that arrived in 10.5 and CPU overheads might be the issue see cpupq here the l.i1 and l.i2 steps that do random inserts+deletes have a small regression for l.i1 and no regression for l.i2 For the small server with a cached workload and 1 client – see here the base case is MariaDB 10.2.44 there is a small regression for the initial load (l.i0) where 11.7.1 gets 88% of the throughput vs 10.2.44 there are small regressions for range queries (qr*) where 11.7.1 gets ~90% of the throughput vs 10.2.44 there are medium regressions for point queries (qp*) where 11.7.1 gets ~80% of the throughput vs 10.2.44 for random inserts+deletes (l.i1, l.i2) 11.7.1 gets ~10% more throughput than 10.2.44 MariaDB 11.7.1 and MySQL 8.0.40 have similar performance For the small server with a cached workload and 6 client – see here the base case is MariaDB 10.2.44 there is a small regression for the initial load (l.i0) where 11.7.1 gets 85% of the throughput vs 10.2.44 there are no regressions for range queries (qr*) there are small regressions for point queries (qp*) where 11.7.1 gets ~90% of the throughput vs 10.2.44 for random inserts+deletes (l.i1, l.i2) 11.7.1 gets less throughput than 10.2.44 on l.i1 but more on l.i2 MariaDB 11.7.1 and MySQL 8.0.40 have similar performance except on the l.i1 and l.i2 steps (random inserts + deletes) where MariaDB is about 2X faster For the small server with an IO-bound workload and 1 client – see here the base case is MariaDB 10.2.44 there is a small regression for the initial load (l.i0) where 11.7.1 gets 89% of the throughput vs 10.2.44 there are small regressions for range queries (qr*) where 11.7.1 gets ~90% of the throughput vs 10.2.44 there are no regressions for point queries (qp*) for random inserts+deletes (l.i1, l.i2) 11.7.1 gets about 80% of the throughput vs 10.2.44. A large part of this regression arrived in 11.4 for MariaDB 11.7.1 vs MySQL 8.0.40, MariaDB is faster on the initial load (l.i0), MySQL is faster on random inserts + deletes (l.i1, l.i2), MySQL is faster on range queries and they are similar on point queries For the small server with an IO-bound workload and 6 clients – see here the base case is MariaDB 10.2.44 MariaDB doesn't sustain the target write rates (6000/s) for the qr1000 and qp1000 steps with 10.5, and 11.4 through 11.7 there is a medium regression for the initial load (l.i0) where 11.7.1 gets 81% of the throughput vs 10.2.44 there are medium regressions for range queries (qr*) where 11.7.1 gets 91%, 81% and 82% of the throughput vs 10.2.44. Recent versions don't sustain the target write rate for qr1000. there are medium regressions for point queries (qr*) where 11.7.1 gets 101%, 71% and 71% of the throughput vs 10.2.44. Recent versions don't sustain the target write rate for qp1000. for random inserts+deletes (l.i1, l.i2) 11.7.1 gets 30% and 78% of the throughput vs 10.2 for l.i1 and l.i2. From the metrics for l.i1 nothing stands out as CPU/operation (cpupq) and context switches/operation (csps) don't change much. Starting in 11.4 the average context switch rate (csps) and the average CPU utilization (cpups) starting both drop a lot. I assume there is some kind of stall (perhaps mutex contention) or thread scheduling issue. I will repeat a few tests to get thread stacks with PMP. for MariaDB 11.7.1 vs MySQL 8.0.40, perf is similar on the initial load (l.i0), MySQL is faster on random inserts + deletes (l.i1, l.i2), MySQL is a lot faster on point and range queries.

Mark Callaghan added a comment - 2025-01-02 18:48

The previous comment mentions there is a large regression for the l.i1 benchmark step that arrives in 11.4 for the IO-bound workload with 6 clients on a small server. This gist has thread stacks collected by PMP (they are aggregated over many samples to compare 11.10.10 and 11.4.4. What I see for 11.4.4 is that the most frequent stacks are from buffer pool flushing not keeping up (see here) in 11.4.4 but that doesn't show up for 10.11.10 (see here)

Mark Callaghan added a comment - 2025-01-02 18:48 The previous comment mentions there is a large regression for the l.i1 benchmark step that arrives in 11.4 for the IO-bound workload with 6 clients on a small server. This gist has thread stacks collected by PMP (they are aggregated over many samples to compare 11.10.10 and 11.4.4. What I see for 11.4.4 is that the most frequent stacks are from buffer pool flushing not keeping up ( see here ) in 11.4.4 but that doesn't show up for 10.11.10 ( see here )

Marko Mäkelä added a comment - 2025-01-03 08:07

Thank you, mdcallag. Indeed, it looks like only 11.4.4 but not 10.11.10 (you typoed "11.10.10" once) is blocking because of running out of buffer pool. Apart from the removal of the change buffer (~~MDEV-29694~~), there shouldn’t be anything obvious in 11.4 that should make InnoDB slower. In fact, it is expected to be the opposite due to changes like ~~MDEV-30216~~ and ~~MDEV-32042~~.

If you had enabled innodb_change_buffering in 10.11, that could be the simple explanation. But I saw innodb_change_buffering=none in some …/etc/my.cnf that I checked in https://github.com/mdcallag/mytools/blob/master/bench/conf/arc/oct24/c32r128/.

If this is an apples-to-apples comparison (the change buffer is disabled in both), then we may have to experiment with some "hybrid" builds (such as applying the InnoDB from 10.11 to MariaDB Server 11.4 or vice versa) in order to narrow down the cause of this regression.

Marko Mäkelä added a comment - 2025-01-03 08:07 Thank you, mdcallag . Indeed, it looks like only 11.4.4 but not 10.11.10 (you typoed "11.10.10" once) is blocking because of running out of buffer pool. Apart from the removal of the change buffer ( MDEV-29694 ), there shouldn’t be anything obvious in 11.4 that should make InnoDB slower. In fact, it is expected to be the opposite due to changes like MDEV-30216 and MDEV-32042 . If you had enabled innodb_change_buffering in 10.11, that could be the simple explanation. But I saw innodb_change_buffering=none in some …/etc/my.cnf that I checked in https://github.com/mdcallag/mytools/blob/master/bench/conf/arc/oct24/c32r128/ . If this is an apples-to-apples comparison (the change buffer is disabled in both), then we may have to experiment with some "hybrid" builds (such as applying the InnoDB from 10.11 to MariaDB Server 11.4 or vice versa) in order to narrow down the cause of this regression.

Mark Callaghan added a comment - 2025-01-03 18:03

All of the my.cnf I use have innodb_change_buffering=none unless I call it out. I try to make the configurations as similar as possible.

I tested (IO-bound, not IO-bound) X (big server, small server). And for the small server test at 1 and 6 clients. The problem only shows up on (IO-bound, small server, 6 clients). In most cases I try to avoid over-subscribing the CPU where that means the number of busy connections is >= number of CPU cores. But in the 6 client case it is possible that I did over-subscribe which is why the problem only shows up on that case.

Looking at the insert rate by time during l.i1 for 10.11.10 and for 11.4.4 what I see is that things go bad fast for 11.4.4.

Mark Callaghan added a comment - 2025-01-03 18:03 All of the my.cnf I use have innodb_change_buffering=none unless I call it out. I try to make the configurations as similar as possible. I tested (IO-bound, not IO-bound) X (big server, small server). And for the small server test at 1 and 6 clients. The problem only shows up on (IO-bound, small server, 6 clients). In most cases I try to avoid over-subscribing the CPU where that means the number of busy connections is >= number of CPU cores. But in the 6 client case it is possible that I did over-subscribe which is why the problem only shows up on that case. Looking at the insert rate by time during l.i1 for 10.11.10 and for 11.4.4 what I see is that things go bad fast for 11.4.4.

MariaDB Server

Performance regression with IO-bound insert benchmark

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration