[MDEV-14138] RocksDB Crash recovery is broken Created: 2017-10-26 Updated: 2017-12-05 Resolved: 2017-12-05 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - RocksDB |
| Affects Version/s: | 10.2.9 |
| Fix Version/s: | 10.2.12 |
| Type: | Bug | Priority: | Major |
| Reporter: | Andrii Nikitin (Inactive) | Assignee: | Sergei Petrunia |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Description |
|
EDIT: further analysis shows that the problem is repeatable without OS crash as well Prerequisites:
1. Comment out "exit 1" at top of script
3. Run command below, which will ask root password, generate helper scripts, initialize new datadir in current folder, start server on port 3313, put high load on it :
4. After ~1 min the terminal will ask to press any key to crash your OS
Examine outcome in
|
| Comments |
| Comment by Andrii Nikitin (Inactive) [ 2017-10-30 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
I've discovered option rocksdb_use_fsync , which is OFF by default, should re-test with it. | ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andrii Nikitin (Inactive) [ 2017-10-30 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
OK, rocksdb_use_fsync doesn't seem to help:
| ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andrii Nikitin (Inactive) [ 2017-10-30 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
With small tweaks I did run the script with MyRocks binaries installed from current Master 5.6. Btw the recovery log contains following, some of which I didn't see in MariaDB case :
| ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2017-10-31 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
rocksdb_use_fsync means "Use fsync() call instead of fdatasync()". Both variants should provide persistence. The above RocksDB settings should provide data persistence (I have also checked this with Herman Lee @ FB). So there's something to look at, here. | ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andrii Nikitin (Inactive) [ 2017-10-31 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
I can reproduce the problem without system crash as well - by just replacing last line `echo c | sudo tee /proc/sysrq-trigger` with `kill -9 $(cat $env1/dt/p.id)` .
it looks that binlog recovery is working properly and it is RocksDB crash recovery to blame. | ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andrii Nikitin (Inactive) [ 2017-10-31 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
Indeed binary log has rows which are missing from the instance after crash recovery:
And this is the only reference of the row in binary log (right before process got killed):
So I am changing title from "Recover from binlog is broken with RocksDB Engine after OS Crash" to "RocksDB сrash recovery is broken" | ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andrii Nikitin (Inactive) [ 2017-12-04 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
I can trigger the problem with steps above every time when 10.2.9 is installed http://ftp.hosteurope.de/mirror/archive.mariadb.org/mariadb-10.2.9/repo/ubuntu/pool/main/m/mariadb-10.2/ , but 10.2.10 and current 10.2 worked flawlessly. So I am OK to close this call . | ||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Petrunia [ 2017-12-05 ] | ||||||||||||||||||||||||||||||||||||||||||||||||
|
Closing as this seems to be no longer repeatable. Current code passes the test. |