Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-11934

MariaRocks: Group Commit with binlog

Details

    • 10.2.5-1

    Description

      MyRocks has group commit with the binary log based on MySQL API:
      https://github.com/facebook/mysql-5.6/commit/14a0d4a97c09b52fa7450e6a3d56ebe7ed193ab6

      Inside MyRocks/RocksDB:

      • One can set m_rocksdb_tx->GetWriteOptions()->sync to false to avoid flushing.
      • One can flush WAL to disk with rdb->SyncWAL() call.
      • RocksDB has its own group commit imlementation which "just works" and is not visible from outside of RocksDB API.

      == MySQL's Group Commit API ==

      Here is a description of how it works when safe settings are ( sync_binlog=1, rocksdb_enable_2pc=ON, rocksdb_write_sync=ON)

      === Prepare ===
      The storage engine checks `thd->durability_property == HA_IGNORE_DURABILITY`.
      If true, it sets sync=false, which causes RocksDB not to persist the Prepare operation to disk.

      === Flush logs ===

      Then SQL layer calls rocksdb_flush_wal() which makes the effect of
      rocksdb_prepare() call persistent by calling SyncWAL().

      If we crash at this point, recovery process will roll back the prepared
      transaction in MyRocks.

      Then, SQL layer writes and flushes the binlog. If we crash after that, recovery
      will commit the prepared MyRocks' transaction.

      As far as MyRocks is concerned, each SyncWAL() call is made individually.
      RocksDB has its own Group Commit implementation under the hood.

      === Commit ===

      Then SQL layer calls rocksdb_commit().

      Commit writes to WAL too, but does not sync it.
      (The effect of rocksdb_prepare() was flushed, the binlog has the information about whether the recovery should commit or roll back, the binlog has been flushed to disk)

      == MariaDB ==

      MariaDB 10.2 has thd->durability_property but it is always equal to HA_REGULAR_DURABILITY

      For actually doing Group Commit, MariaDB 10.0+ has new handlerton functions:

      • handlerton->prepare_ordered
      • handlerton->commit_ordered
      • (handlerton->commit is still there and still used also)
      • handlerton->commit_checkpoint_request

      Attachments

        1. _b.test.innodb
          3 kB
          Sergei Petrunia
        2. _b.test.myrocks
          4 kB
          Sergei Petrunia
        3. commit-time-histogram.png
          47 kB
          Sergei Petrunia
        4. commit-time-histogram.png
          47 kB
          Sergei Petrunia
        5. oct17-benchmark.ods
          40 kB
          Sergei Petrunia
        6. oct17-benchmark-result-sshot.png
          39 kB
          Sergei Petrunia
        7. psergey-test-scaling.test
          3 kB
          Sergei Petrunia
        8. psergey-test-scaling2.test
          4 kB
          Sergei Petrunia
        9. test-rocksdb-gcommit.tgz
          2 kB
          Sergei Petrunia

        Issue Links

          Activity

            psergei Sergei Petrunia added a comment - - edited

            Re-ran the benchmark while collecting more data:
            MariaDB:

             TH CONCURRENCY QUERIES TIME QPS WSYNCS WSYNCED Binlog_bytes Binlog_commits Binlog_group_commits 
            TR 4 5000 84.3612 59.2690 0 2500 1196120 5000 4720
            TR 8 5000 43.4948 114.9563 0 1251 1201960 5000 2678
            TR 16 5000 21.2343 235.4681 0 625 1201682 4992 1499
            TR 32 5000 10.8093 462.5646 0 313 1202374 4992 854
            TR 64 5000 5.4506 917.3302 0 157 1202630 4992 547
            TR 128 5000 2.8370 1762.4251 0 80 1202902 4992 339
            

            FB/MySQL-5.6

            TH CONCURRENCY QUERIES TIME QPS WSYNCS WSYNCED Binlog_bytes Binlog_fsync 
            TR 4 5000 169.1061 29.5672 2500 2500 995000 2500
            TR 8 5000 85.9797 58.1533 1250 1251 995000 1250
            TR 16 5000 43.1937 115.7576 624 625 993408 624
            TR 32 5000 21.4033 233.6088 312 313 993408 312
            TR 64 5000 10.5582 473.5656 156 157 993408 156
            TR 128 5000 5.6467 885.4729 79 80 993408 79
            

            psergei Sergei Petrunia added a comment - - edited Re-ran the benchmark while collecting more data: MariaDB: TH CONCURRENCY QUERIES TIME QPS WSYNCS WSYNCED Binlog_bytes Binlog_commits Binlog_group_commits TR 4 5000 84.3612 59.2690 0 2500 1196120 5000 4720 TR 8 5000 43.4948 114.9563 0 1251 1201960 5000 2678 TR 16 5000 21.2343 235.4681 0 625 1201682 4992 1499 TR 32 5000 10.8093 462.5646 0 313 1202374 4992 854 TR 64 5000 5.4506 917.3302 0 157 1202630 4992 547 TR 128 5000 2.8370 1762.4251 0 80 1202902 4992 339 FB/MySQL-5.6 TH CONCURRENCY QUERIES TIME QPS WSYNCS WSYNCED Binlog_bytes Binlog_fsync TR 4 5000 169.1061 29.5672 2500 2500 995000 2500 TR 8 5000 85.9797 58.1533 1250 1251 995000 1250 TR 16 5000 43.1937 115.7576 624 625 993408 624 TR 32 5000 21.4033 233.6088 312 313 993408 312 TR 64 5000 10.5582 473.5656 156 157 993408 156 TR 128 5000 5.6467 885.4729 79 80 993408 79

            File used for producing the above: psergey-test-scaling2.test

            psergei Sergei Petrunia added a comment - File used for producing the above: psergey-test-scaling2.test

            MDEV-14103 is about testing for this task

            psergei Sergei Petrunia added a comment - MDEV-14103 is about testing for this task
            psergei Sergei Petrunia added a comment - KB page https://mariadb.com/kb/en/library/myrocks-and-group-commit-with-binary-log/

            Closing as this has been fixed and testing found no issues.

            psergei Sergei Petrunia added a comment - Closing as this has been fixed and testing found no issues.

            People

              psergei Sergei Petrunia
              psergei Sergei Petrunia
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.