Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-21423

lock-free trx_sys get performance regression cause by lf_find and ut_delay

Details

    Description

      Hello, guys

      we have port the lock-free trx-sys, however I find that the oltp_read_write case get too much performance regression compare with non-lock-free version..
      Especially when the isolation-level is "read-committed", lock-free trx-sys get about 40% performance regression.
      I guess Mariadb has the same problem..

      This is my sysbench test configure

      bench_type=oltp_read_write;
      threads=560
      tables=8
      table_size=500000
      

      There is another issue that relate to the lock-free trx_sys
      https://jira.mariadb.org/browse/MDEV-20630?filter=-2

      Below is the sysbench result:
      you can find that the lockfree trx-sys vs non-lockfree trx-sys is

      tps 13405.28 vs 20095.02 
      qps 268105.66 vs 401900.40
      

      lockfree trx-sys :  isolation-level 
       
      read-committed
       
      SQL statistics:
          queries performed:
              read:                            33803098
              write:                           9658028
              other:                           4829014
              total:                           48290140
          transactions:                        2414507 (13405.28 per sec.)
          queries:                             48290140 (268105.66 per sec.)
          ignored errors:                      0      (0.00 per sec.)
          reconnects:                          0      (0.00 per sec.)
       
      General statistics:
          total time:                          180.1141s
          total number of events:              2414507
       
      Latency (ms):
               min:                                    2.96
               avg:                                   41.75
               max:                                 4487.73
               95th percentile:                       92.42
               sum:                            100805088.64
       
      Threads fairness:
          events (avg/stddev):           4311.6196/167.34
          execution time (avg/stddev):   180.0091/0.01
      
      

      non-lockfree: read-committed
       
      SQL statistics:
          queries performed:
              read:                            50672678
              write:                           14477908
              other:                           7238954
              total:                           72389540
          transactions:                        3619477 (20095.02 per sec.)
          queries:                             72389540 (401900.40 per sec.)
          ignored errors:                      0      (0.00 per sec.)
          reconnects:                          0      (0.00 per sec.)
       
      General statistics:
          total time:                          180.1161s
          total number of events:              3619477
       
      Latency (ms):
               min:                                    2.47
               avg:                                   27.85
               max:                                  198.43
               95th percentile:                       52.89
               sum:                            100798260.68
       
      Threads fairness:
          events (avg/stddev):           6463.3518/107.19
          execution time (avg/stddev):   179.9969/0.01
      

      Attachments

        Issue Links

          Activity

            I reran the 30-second 8×100,000-row Sysbench oltp_update_index with innodb_flush_log_at_trx_commit=0 to get some quick indication of the impact:

            revision 20 40 80 160 320 640
            10.6+patch 158434.28 185131.85 170423.99 190336.86 186661.45 179461.32
            10.6 161207.93 185221.65 171324.52 189943.30 186307.93 177596.38
            10.9+MDEV-28313+patch+MDEV-26603 174544.91 178149.89 110558.41 125144.57 127529.77 147725.99
            10.9+MDEV-28313+patch 171584.13 182691.25 136949.96 130384.91 130686.76 144726.54
            10.9+MDEV-28313 172770.79 182122.98 110902.51 127673.10 127307.35 143449.37
            10.9+MDEV-28313 (previous run) 169572.38 191460.20 137424.75 137625.91 141308.08 151053.27

            The last two rows indicate that there is quite a bit of variation in the throughput, in addition to the checkpoint glitch that occurs during the 80-connection test.

            The combination with MDEV-26603 must also be tested against a baseline with innodb_flush_log_at_trx_commit=1:

            revision 20 40 80 160 320 640
            10.9+MDEV-28313+patch+MDEV-26603 38357.41 77825.55 148901.55 159469.58 128778.08 138870.00
            10.9+MDEV-28313+patch 45049.02 85527.73 150008.44 160022.04 126585.47 142077.57

            So, unfortunately even this fix does not cure the counterintuitive regression revealed by MDEV-26603.
            Side note: At 160 concurrent connections, the durable configuration innodb_flush_log_at_trx_commit=1 resulted in better throughput than innodb_flush_log_at_trx_commit=0, possibly thanks to the group commit locks acting as a throttle that prevented more costly contention elsewhere.

            axel, can you please run your standard benchmarks on 10.6+patch?

            marko Marko Mäkelä added a comment - I reran the 30-second 8×100,000-row Sysbench oltp_update_index with innodb_flush_log_at_trx_commit=0 to get some quick indication of the impact: revision 20 40 80 160 320 640 10.6+patch 158434.28 185131.85 170423.99 190336.86 186661.45 179461.32 10.6 161207.93 185221.65 171324.52 189943.30 186307.93 177596.38 10.9+MDEV-28313+patch+MDEV-26603 174544.91 178149.89 110558.41 125144.57 127529.77 147725.99 10.9+MDEV-28313+patch 171584.13 182691.25 136949.96 130384.91 130686.76 144726.54 10.9+ MDEV-28313 172770.79 182122.98 110902.51 127673.10 127307.35 143449.37 10.9+ MDEV-28313 (previous run) 169572.38 191460.20 137424.75 137625.91 141308.08 151053.27 The last two rows indicate that there is quite a bit of variation in the throughput, in addition to the checkpoint glitch that occurs during the 80-connection test. The combination with MDEV-26603 must also be tested against a baseline with innodb_flush_log_at_trx_commit=1 : revision 20 40 80 160 320 640 10.9+MDEV-28313+patch+MDEV-26603 38357.41 77825.55 148901.55 159469.58 128778.08 138870.00 10.9+MDEV-28313+patch 45049.02 85527.73 150008.44 160022.04 126585.47 142077.57 So, unfortunately even this fix does not cure the counterintuitive regression revealed by MDEV-26603 . Side note: At 160 concurrent connections, the durable configuration innodb_flush_log_at_trx_commit=1 resulted in better throughput than innodb_flush_log_at_trx_commit=0 , possibly thanks to the group commit locks acting as a throttle that prevented more costly contention elsewhere. axel , can you please run your standard benchmarks on 10.6+patch ?

            I filed a separate ticket MDEV-28445 for the clean-up, because it did not show any difference (for the better or the worse) in axel’s standard test battery. Therefore, we cannot claim that it would fix this performance regression.

            I guess that our standard test batteries might not exercise locking conflicts at all, especially on secondary indexes. Something bigger like TPC-C might show a difference.

            marko Marko Mäkelä added a comment - I filed a separate ticket MDEV-28445 for the clean-up, because it did not show any difference (for the better or the worse) in axel ’s standard test battery. Therefore, we cannot claim that it would fix this performance regression. I guess that our standard test batteries might not exercise locking conflicts at all, especially on secondary indexes. Something bigger like TPC-C might show a difference.

            MDEV-28445 caused a performance regression MDEV-30357. As a part of the fix, I would implement a cache that avoids some repeated traversal of trx_sys.rw_trx_hash in repeated invocations of trx_sys_t::find_same_or_older() within the same transaction.

            marko Marko Mäkelä added a comment - MDEV-28445 caused a performance regression MDEV-30357 . As a part of the fix, I would implement a cache that avoids some repeated traversal of trx_sys.rw_trx_hash in repeated invocations of trx_sys_t::find_same_or_older() within the same transaction.

            MDEV-20630 may be more rewarding to fix first. I see that the lock-free hash table is using std::memory_order_seq_cst, while a less intrusive memory order (or explicit memory barriers) might work. I have not studied that code in much detail. What I attempted so far was to make InnoDB invoke the expensive operations less often (MDEV-28445, MDEV-30357), and to replace the lock-free hash table trx_sys.rw_trx_hash with a locking one, which resulted in worse performance.

            marko Marko Mäkelä added a comment - MDEV-20630 may be more rewarding to fix first. I see that the lock-free hash table is using std::memory_order_seq_cst , while a less intrusive memory order (or explicit memory barriers) might work. I have not studied that code in much detail. What I attempted so far was to make InnoDB invoke the expensive operations less often ( MDEV-28445 , MDEV-30357 ), and to replace the lock-free hash table trx_sys.rw_trx_hash with a locking one, which resulted in worse performance.
            JIraAutomate JiraAutomate added a comment -

            Automated message:
            ----------------------------
            Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

            JIraAutomate JiraAutomate added a comment - Automated message: ---------------------------- Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

            People

              marko Marko Mäkelä
              baotiao zongzhi chen
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.