[MDEV-21423] lock-free trx_sys get performance regression cause by lf_find and ut_delay - Jira

Details

Type: Bug
Status: Stalled (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: 10.5.0
Fix Version/s: 10.6
Component/s: Storage Engine - InnoDB
Labels:
Environment:
linux

Epic Link:
InnoDB trx_sys improvements

Description

Hello, guys

we have port the lock-free trx-sys, however I find that the oltp_read_write case get too much performance regression compare with non-lock-free version..
Especially when the isolation-level is "read-committed", lock-free trx-sys get about 40% performance regression.
I guess Mariadb has the same problem..

This is my sysbench test configure

bench_type=oltp_read_write;

threads=560

tables=8

table_size=500000

There is another issue that relate to the lock-free trx_sys
https://jira.mariadb.org/browse/MDEV-20630?filter=-2

Below is the sysbench result:
you can find that the lockfree trx-sys vs non-lockfree trx-sys is

tps 13405.28 vs 20095.02

qps 268105.66 vs 401900.40

lockfree trx-sys :  isolation-level

read-committed

SQL statistics:

    queries performed:

        read:                            33803098

        write:                           9658028

        other:                           4829014

        total:                           48290140

    transactions:                        2414507 (13405.28 per sec.)

    queries:                             48290140 (268105.66 per sec.)

    ignored errors:                      0      (0.00 per sec.)

    reconnects:                          0      (0.00 per sec.)

General statistics:

    total time:                          180.1141s

    total number of events:              2414507

Latency (ms):

         min:                                    2.96

         avg:                                   41.75

         max:                                 4487.73

         95th percentile:                       92.42

         sum:                            100805088.64

Threads fairness:

    events (avg/stddev):           4311.6196/167.34

    execution time (avg/stddev):   180.0091/0.01

non-lockfree: read-committed

SQL statistics:

    queries performed:

        read:                            50672678

        write:                           14477908

        other:                           7238954

        total:                           72389540

    transactions:                        3619477 (20095.02 per sec.)

    queries:                             72389540 (401900.40 per sec.)

    ignored errors:                      0      (0.00 per sec.)

    reconnects:                          0      (0.00 per sec.)

General statistics:

    total time:                          180.1161s

    total number of events:              3619477

Latency (ms):

         min:                                    2.47

         avg:                                   27.85

         max:                                  198.43

         95th percentile:                       52.89

         sum:                            100798260.68

Threads fairness:

    events (avg/stddev):           6463.3518/107.19

    execution time (avg/stddev):   179.9969/0.01

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

08569298-111E-4929-A922-9E9AE6541BDF.png
238 kB
2020-01-05 17:24
50A6071F-CF5C-44EB-A8F5-14FAA7EFEFD4.png
262 kB
2020-01-05 17:24
image-2021-10-29-17-26-46-207.png
796 kB
2021-10-29 15:26
load_test.cc
4 kB
2021-10-29 16:46
screenshot-1.png
816 kB
2021-10-29 15:21

Issue Links

relates to

MDEV-28313 InnoDB transactions are not aligned at cache lines

Closed

MDEV-28445 Secondary index locking invokes costly trx_sys.get_min_trx_id()

Closed

MDEV-30357 Performance regression in locking reads from secondary indexes

Closed

MDEV-13190 InnoDB write-only performance regression

Open

MDEV-20630 lf_hash get performance regression since the bucket size won't decrease

Confirmed

MDEV-33067 SCN(Sequence Commit Number) based MVCC

Open

(1 relates to)

Activity

Ascending order - Click to sort in descending order

View 35 older comments

Marko Mäkelä added a comment - 2022-04-28 06:20

I reran the 30-second 8×100,000-row Sysbench oltp_update_index with innodb_flush_log_at_trx_commit=0 to get some quick indication of the impact:

revision	20	40	80	160	320	640
10.6+patch	158434.28	185131.85	170423.99	190336.86	186661.45	179461.32
10.6	161207.93	185221.65	171324.52	189943.30	186307.93	177596.38
10.9+MDEV-28313+patch+MDEV-26603	174544.91	178149.89	110558.41	125144.57	127529.77	147725.99
10.9+MDEV-28313+patch	171584.13	182691.25	136949.96	130384.91	130686.76	144726.54
10.9+~~MDEV-28313~~	172770.79	182122.98	110902.51	127673.10	127307.35	143449.37
10.9+~~MDEV-28313~~ (previous run)	169572.38	191460.20	137424.75	137625.91	141308.08	151053.27

The last two rows indicate that there is quite a bit of variation in the throughput, in addition to the checkpoint glitch that occurs during the 80-connection test.

The combination with ~~MDEV-26603~~ must also be tested against a baseline with innodb_flush_log_at_trx_commit=1:

revision	20	40	80	160	320	640
10.9+MDEV-28313+patch+MDEV-26603	38357.41	77825.55	148901.55	159469.58	128778.08	138870.00
10.9+MDEV-28313+patch	45049.02	85527.73	150008.44	160022.04	126585.47	142077.57

So, unfortunately even this fix does not cure the counterintuitive regression revealed by ~~MDEV-26603~~.
Side note: At 160 concurrent connections, the durable configuration innodb_flush_log_at_trx_commit=1 resulted in better throughput than innodb_flush_log_at_trx_commit=0, possibly thanks to the group commit locks acting as a throttle that prevented more costly contention elsewhere.

axel, can you please run your standard benchmarks on 10.6+patch?

Marko Mäkelä added a comment - 2022-04-28 06:20 I reran the 30-second 8×100,000-row Sysbench oltp_update_index with innodb_flush_log_at_trx_commit=0 to get some quick indication of the impact: revision 20 40 80 160 320 640 10.6+patch 158434.28 185131.85 170423.99 190336.86 186661.45 179461.32 10.6 161207.93 185221.65 171324.52 189943.30 186307.93 177596.38 10.9+MDEV-28313+patch+MDEV-26603 174544.91 178149.89 110558.41 125144.57 127529.77 147725.99 10.9+MDEV-28313+patch 171584.13 182691.25 136949.96 130384.91 130686.76 144726.54 10.9+ MDEV-28313 172770.79 182122.98 110902.51 127673.10 127307.35 143449.37 10.9+ MDEV-28313 (previous run) 169572.38 191460.20 137424.75 137625.91 141308.08 151053.27 The last two rows indicate that there is quite a bit of variation in the throughput, in addition to the checkpoint glitch that occurs during the 80-connection test. The combination with MDEV-26603 must also be tested against a baseline with innodb_flush_log_at_trx_commit=1 : revision 20 40 80 160 320 640 10.9+MDEV-28313+patch+MDEV-26603 38357.41 77825.55 148901.55 159469.58 128778.08 138870.00 10.9+MDEV-28313+patch 45049.02 85527.73 150008.44 160022.04 126585.47 142077.57 So, unfortunately even this fix does not cure the counterintuitive regression revealed by MDEV-26603 . Side note: At 160 concurrent connections, the durable configuration innodb_flush_log_at_trx_commit=1 resulted in better throughput than innodb_flush_log_at_trx_commit=0 , possibly thanks to the group commit locks acting as a throttle that prevented more costly contention elsewhere. axel , can you please run your standard benchmarks on 10.6+patch ?

Marko Mäkelä added a comment - 2022-04-29 09:50

I filed a separate ticket ~~MDEV-28445~~ for the clean-up, because it did not show any difference (for the better or the worse) in axel’s standard test battery. Therefore, we cannot claim that it would fix this performance regression.

I guess that our standard test batteries might not exercise locking conflicts at all, especially on secondary indexes. Something bigger like TPC-C might show a difference.

Marko Mäkelä added a comment - 2022-04-29 09:50 I filed a separate ticket MDEV-28445 for the clean-up, because it did not show any difference (for the better or the worse) in axel ’s standard test battery. Therefore, we cannot claim that it would fix this performance regression. I guess that our standard test batteries might not exercise locking conflicts at all, especially on secondary indexes. Something bigger like TPC-C might show a difference.

Marko Mäkelä added a comment - 2023-03-14 07:15

~~MDEV-28445~~ caused a performance regression ~~MDEV-30357~~. As a part of the fix, I would implement a cache that avoids some repeated traversal of trx_sys.rw_trx_hash in repeated invocations of trx_sys_t::find_same_or_older() within the same transaction.

Marko Mäkelä added a comment - 2023-03-14 07:15 MDEV-28445 caused a performance regression MDEV-30357 . As a part of the fix, I would implement a cache that avoids some repeated traversal of trx_sys.rw_trx_hash in repeated invocations of trx_sys_t::find_same_or_older() within the same transaction.

Marko Mäkelä added a comment - 2023-10-01 11:43

MDEV-20630 may be more rewarding to fix first. I see that the lock-free hash table is using std::memory_order_seq_cst, while a less intrusive memory order (or explicit memory barriers) might work. I have not studied that code in much detail. What I attempted so far was to make InnoDB invoke the expensive operations less often (~~MDEV-28445~~, ~~MDEV-30357~~), and to replace the lock-free hash table trx_sys.rw_trx_hash with a locking one, which resulted in worse performance.

Marko Mäkelä added a comment - 2023-10-01 11:43 MDEV-20630 may be more rewarding to fix first. I see that the lock-free hash table is using std::memory_order_seq_cst , while a less intrusive memory order (or explicit memory barriers) might work. I have not studied that code in much detail. What I attempted so far was to make InnoDB invoke the expensive operations less often ( MDEV-28445 , MDEV-30357 ), and to replace the lock-free hash table trx_sys.rw_trx_hash with a locking one, which resulted in worse performance.

JiraAutomate added a comment - 2023-12-05 11:49

Automated message:
----------------------------
Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

JiraAutomate added a comment - 2023-12-05 11:49 Automated message: ---------------------------- Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

MariaDB Server

lock-free trx_sys get performance regression cause by lf_find and ut_delay

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration