MyRocks has group commit with the binary log based on MySQL API:
- One can set m_rocksdb_tx->GetWriteOptions()->sync to false to avoid flushing.
- One can flush WAL to disk with rdb->SyncWAL() call.
- RocksDB has its own group commit imlementation which "just works" and is not visible from outside of RocksDB API.
Here is a description of how it works when safe settings are ( sync_binlog=1, rocksdb_enable_2pc=ON, rocksdb_write_sync=ON)
=== Prepare ===
The storage engine checks `thd->durability_property == HA_IGNORE_DURABILITY`.
If true, it sets sync=false, which causes RocksDB not to persist the Prepare operation to disk.
=== Flush logs ===
Then SQL layer calls rocksdb_flush_wal() which makes the effect of
rocksdb_prepare() call persistent by calling SyncWAL().
If we crash at this point, recovery process will roll back the prepared
transaction in MyRocks.
Then, SQL layer writes and flushes the binlog. If we crash after that, recovery
will commit the prepared MyRocks' transaction.
As far as MyRocks is concerned, each SyncWAL() call is made individually.
RocksDB has its own Group Commit implementation under the hood.
=== Commit ===
Then SQL layer calls rocksdb_commit().
Commit writes to WAL too, but does not sync it.
(The effect of rocksdb_prepare() was flushed, the binlog has the information about whether the recovery should commit or roll back, the binlog has been flushed to disk)
MariaDB 10.2 has thd->durability_property but it is always equal to HA_REGULAR_DURABILITY
For actually doing Group Commit, MariaDB 10.0+ has new handlerton functions:
- (handlerton->commit is still there and still used also)