Details
-
Task
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
None
-
10.2.5-1
Description
MyRocks has group commit with the binary log based on MySQL API:
https://github.com/facebook/mysql-5.6/commit/14a0d4a97c09b52fa7450e6a3d56ebe7ed193ab6
Inside MyRocks/RocksDB:
- One can set m_rocksdb_tx->GetWriteOptions()->sync to false to avoid flushing.
- One can flush WAL to disk with rdb->SyncWAL() call.
- RocksDB has its own group commit imlementation which "just works" and is not visible from outside of RocksDB API.
== MySQL's Group Commit API ==
Here is a description of how it works when safe settings are ( sync_binlog=1, rocksdb_enable_2pc=ON, rocksdb_write_sync=ON)
=== Prepare ===
The storage engine checks `thd->durability_property == HA_IGNORE_DURABILITY`.
If true, it sets sync=false, which causes RocksDB not to persist the Prepare operation to disk.
=== Flush logs ===
Then SQL layer calls rocksdb_flush_wal() which makes the effect of
rocksdb_prepare() call persistent by calling SyncWAL().
If we crash at this point, recovery process will roll back the prepared
transaction in MyRocks.
Then, SQL layer writes and flushes the binlog. If we crash after that, recovery
will commit the prepared MyRocks' transaction.
As far as MyRocks is concerned, each SyncWAL() call is made individually.
RocksDB has its own Group Commit implementation under the hood.
=== Commit ===
Then SQL layer calls rocksdb_commit().
Commit writes to WAL too, but does not sync it.
(The effect of rocksdb_prepare() was flushed, the binlog has the information about whether the recovery should commit or roll back, the binlog has been flushed to disk)
== MariaDB ==
MariaDB 10.2 has thd->durability_property but it is always equal to HA_REGULAR_DURABILITY
For actually doing Group Commit, MariaDB 10.0+ has new handlerton functions:
- handlerton->prepare_ordered
- handlerton->commit_ordered
- (handlerton->commit is still there and still used also)
- handlerton->commit_checkpoint_request
Attachments
Issue Links
- is part of
-
MDEV-9658 Make MyRocks in MariaDB stable
-
- Closed
-
- relates to
-
MDEV-14103 Testing for group commit in MyRocks
-
- Closed
-
Re-ran the benchmark while collecting more data:
MariaDB:
TH CONCURRENCY QUERIES TIME QPS WSYNCS WSYNCED Binlog_bytes Binlog_commits Binlog_group_commits
TR 4 5000 84.3612 59.2690 0 2500 1196120 5000 4720
TR 8 5000 43.4948 114.9563 0 1251 1201960 5000 2678
TR 16 5000 21.2343 235.4681 0 625 1201682 4992 1499
TR 32 5000 10.8093 462.5646 0 313 1202374 4992 854
TR 64 5000 5.4506 917.3302 0 157 1202630 4992 547
TR 128 5000 2.8370 1762.4251 0 80 1202902 4992 339
FB/MySQL-5.6
TH CONCURRENCY QUERIES TIME QPS WSYNCS WSYNCED Binlog_bytes Binlog_fsync
TR 4 5000 169.1061 29.5672 2500 2500 995000 2500
TR 8 5000 85.9797 58.1533 1250 1251 995000 1250
TR 16 5000 43.1937 115.7576 624 625 993408 624
TR 32 5000 21.4033 233.6088 312 313 993408 312
TR 64 5000 10.5582 473.5656 156 157 993408 156
TR 128 5000 5.6467 885.4729 79 80 993408 79