Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33853

Async rollback prepared transactions during binlog crash recovery

Details

    Description

      When doing server recovery, the active transactions will be rolled
      back by InnoDB background rollback thread automatically. The
      prepared transactions will be committed or rolled back accordingly
      by binlog recovery. Binlog recovery is done in main thread before
      the server can provide service to users. If there is a big
      transaction to rollback, the server will not available for a long
      time.

      It is better to make the prepared transactions to be rolled back by the background rollback thread.

      Attachments

        Issue Links

          Activity

            axel Axel Schwenke added a comment -

            I have run a general InnoDB mixed workload performance test on commit e5145b22629 in branch bb-11.6-MDEV-33853 and its predecessor commit 9811d23b6d0. The binlog was enabled in all tests; once async and once sync:

            MDEV-33853.pdf

            The only worrying result is for t_oltp_writes_innodb (OLTP write-only) with sync binlog. There are also differences in t_oltp_full_innodb (OLTP read/write) but they are in favor of MDEV-33853. And for t_oltp_insert_innodb_batched (10x INSERT per trx) - those are probably bogus. The numbers are generally quite unstable for async binlog. I will repeat the test to see if it's reproducible ...

            axel Axel Schwenke added a comment - I have run a general InnoDB mixed workload performance test on commit e5145b22629 in branch bb-11.6- MDEV-33853 and its predecessor commit 9811d23b6d0 . The binlog was enabled in all tests; once async and once sync: MDEV-33853.pdf The only worrying result is for t_oltp_writes_innodb (OLTP write-only) with sync binlog. There are also differences in t_oltp_full_innodb (OLTP read/write) but they are in favor of MDEV-33853 . And for t_oltp_insert_innodb_batched (10x INSERT per trx) - those are probably bogus. The numbers are generally quite unstable for async binlog. I will repeat the test to see if it's reproducible ...
            axel Axel Schwenke added a comment -

            The second run of the general InnoDB tests completed. I put all results in one plot:

            MDEV-33853C.pdf

            It turns out, that for the workload in t_oltp_full_innodb we have a high likelihood that MDEV-33853 is indeed faster with async binlog. The differences for t_oltp_insert_innodb_batched seem also seem to be real. But they concern just 2 thread counts. For 32 threads MDEV-33853 is slower and for 64 threads it's faster. So overall it's a draw.

            All other differences turned out bogus. So from that point of view MDEV-33853 is ok.

            axel Axel Schwenke added a comment - The second run of the general InnoDB tests completed. I put all results in one plot: MDEV-33853C.pdf It turns out, that for the workload in t_oltp_full_innodb we have a high likelihood that MDEV-33853 is indeed faster with async binlog. The differences for t_oltp_insert_innodb_batched seem also seem to be real. But they concern just 2 thread counts. For 32 threads MDEV-33853 is slower and for 64 threads it's faster. So overall it's a draw. All other differences turned out bogus. So from that point of view MDEV-33853 is ok.

            axel, thank you for testing this. If I understood it correctly, you are running Sysbench workloads that do not involve restarting the server, nor any XA transactions for that matter. For that kind of test scenario, I don’t think that this code change can make any difference. The InnoDB changes are strictly limited to server startup, and I guess so are the changes to xarecover_do_commit_or_rollback() and xarecover_complete_and_count().

            That is, the observed differences ought to be due to random noise, or possibly due to slightly changed code layout (the number of MMU pages that the busy part of the executable code is residing on).

            marko Marko Mäkelä added a comment - axel , thank you for testing this. If I understood it correctly, you are running Sysbench workloads that do not involve restarting the server, nor any XA transactions for that matter. For that kind of test scenario, I don’t think that this code change can make any difference. The InnoDB changes are strictly limited to server startup, and I guess so are the changes to xarecover_do_commit_or_rollback() and xarecover_complete_and_count() . That is, the observed differences ought to be due to random noise, or possibly due to slightly changed code layout (the number of MMU pages that the busy part of the executable code is residing on).
            axel Axel Schwenke added a comment -

            Thanks marko. I have not yet a test case for measuring server startup time. I thought of running a sysbench prepare job and killing the server in between. Then save the resulting datadir and use it for starting different server builds.
            But if this reqires the transactions to be XA, then it will not work out of the box. Sysbench does not use XA:

            axel Axel Schwenke added a comment - Thanks marko . I have not yet a test case for measuring server startup time. I thought of running a sysbench prepare job and killing the server in between. Then save the resulting datadir and use it for starting different server builds. But if this reqires the transactions to be XA, then it will not work out of the box. Sysbench does not use XA:

            libing, thank you for your contribution and patience!

            marko Marko Mäkelä added a comment - libing , thank you for your contribution and patience!

            People

              marko Marko Mäkelä
              libing Libing Song
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.