Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-14138

RocksDB Crash recovery is broken

    XMLWordPrintable

Details

    Description

      EDIT: further analysis shows that the problem is repeatable without OS crash as well
      Use these scripts to crash your OS under load and try to recover later:
      https://github.com/AndriiNikitin/bugs/blob/master/MDEV-14103-crashOS.sh
      https://github.com/AndriiNikitin/bugs/blob/master/MDEV-14103-recoverOS.sh

      Prerequisites:

      • physical machine
      • git, m4, bash installed
      • MariaDB 10.2.9 with RocksDB plugin installed
      • ulimit -n 4000
      • up to 200Mb of disk space
      • ext3, ext4 or xfs or similar filesystem.

      1. Comment out "exit 1" at top of script MDEV-14103-crashOS.sh if you have recent backup
      2. Clone scripts with command below and make sure that cloned folder is your working directory for this case:

      git clone http://github.com/AndriiNikitin/mariadb-environs
      cd mariadb-environs
      

      3. Run command below, which will ask root password, generate helper scripts, initialize new datadir in current folder, start server on port 3313, put high load on it :

      ENGINE=RocksDB EXTRA_OPT='sync_binlog=1 rocksdb_flush_log_at_trx_commit=1 rocksdb_enable_2pc=ON' bash MDEV-14103-crashOS.sh
      

      4. After ~1 min the terminal will ask to press any key to crash your OS
      5. Make sure that OS is not usable anymore, reboot
      6. cd to the working folder (mariadb-environs)
      7. Run command below, which will check consistency of datafiles in recovered data, and load binlog into new instance on port 3314 and compare content of tables in original recovered datadir and one which was rebuilt from binlog:

      bash MDEV-14103-recoverOS.sh
      

      Examine outcome in MDEV-14103.log
      When ENGINE=InnoDB together with sync_binlog=1 are used - all checks pass properly.
      Various inconsistencies observed in other cases, which is expected - except case with "RocksDB sync-binlog=1 rocksdb_flush_log_at_trx_commit=1 rocksdb_enable_2pc=ON":

      ===========================================
      ===========================================
      InnoDB sync-binlog=1
      ===========================================
      consistency checks completed
      done
      ===========================================
      ===========================================
      InnoDB sync-binlog=0
      ===========================================
      consistency checks completed
      row count is different in d1 : (1706 1707) vs (1275 1275)
      row count is different in d2 : (1509 1509) vs (1308 1308)
      row count is different in d3 : (1367 1367) vs (1164 1164)
      row count is different in d4 : (1546 1546) vs (1184 1184)
      row count is different in d5 : (1491 1491) vs (1066 1066)
      row count is different in d6 : (1397 1397) vs (1164 1164)
      row count is different in d7 : (1698 1698) vs (1143 1143)
      row count is different in d8 : (1810 1810) vs (1307 1307)
      row count is different in d9 : (1336 1336) vs (989 989)
      row count is different in d10 : (1506 1506) vs (1100 1100)
      row count is different in d11 : (1762 1762) vs (1358 1358)
      row count is different in d12 : (1523 1523) vs (1251 1251)
      row count is different in d13 : (1622 1622) vs (1225 1225)
      row count is different in d14 : (1720 1720) vs (1187 1187)
      row count is different in d15 : (1535 1535) vs (1159 1159)
      row count is different in d16 : (1742 1742) vs (1256 1256)
      row count is different in d17 : (1852 1852) vs (1329 1329)
      row count is different in d18 : (1729 1729) vs (1263 1263)
      row count is different in d19 : (1746 1746) vs (1379 1379)
      row count is different in d20 : (1615 1615) vs (1289 1289)
      row count is different in d21 : (1250 1250) vs (1067 1067)
      row count is different in d22 : (1548 1548) vs (1144 1144)
      row count is different in d23 : (1717 1717) vs (1358 1358)
      row count is different in d24 : (1487 1487) vs (1199 1199)
      row count is different in d25 : (1410 1410) vs (1063 1063)
      row count is different in d26 : (1370 1370) vs (1096 1096)
      row count is different in d27 : (1458 1458) vs (1164 1164)
      row count is different in d28 : (1422 1422) vs (1118 1118)
      row count is different in d29 : (1638 1638) vs (1226 1226)
      row count is different in d30 : (1464 1464) vs (1083 1083)
      row count is different in d31 : (1827 1827) vs (1352 1352)
      row count is different in d32 : (1756 1756) vs (1298 1298)
      row count is different in d33 : (1450 1450) vs (1163 1163)
      row count is different in d34 : (1542 1542) vs (1149 1149)
      row count is different in d35 : (1716 1716) vs (1346 1346)
      row count is different in d36 : (1758 1758) vs (1269 1269)
      row count is different in d37 : (1660 1660) vs (1108 1108)
      row count is different in d38 : (1846 1846) vs (1331 1331)
      row count is different in d39 : (1529 1529) vs (1180 1180)
      row count is different in d40 : (1516 1516) vs (1274 1274)
      magic row not found
      done
      ===========================================
      ===========================================
      RocksDB sync-binlog=1 rocksdb_flush_log_at_trx_commit=1 rocksdb_enable_2pc=ON
      ===========================================
      consistency checks completed
      row count is different in d3 : (286 286) vs (290 290)
      row count is different in d4 : (287 287) vs (290 290)
      row count is different in d5 : (285 285) vs (287 287)
      row count is different in d7 : (292 292) vs (293 293)
      row count is different in d8 : (288 288) vs (289 289)
      row count is different in d9 : (280 280) vs (281 281)
      row count is different in d10 : (283 283) vs (285 285)
      row count is different in d12 : (286 286) vs (287 287)
      row count is different in d15 : (280 280) vs (281 281)
      row count is different in d16 : (288 288) vs (289 289)
      row count is different in d18 : (283 283) vs (284 284)
      row count is different in d21 : (277 277) vs (278 278)
      row count is different in d22 : (285 285) vs (288 288)
      row count is different in d25 : (282 282) vs (284 284)
      row count is different in d26 : (283 283) vs (284 284)
      row count is different in d27 : (280 280) vs (282 282)
      row count is different in d28 : (286 286) vs (288 288)
      row count is different in d29 : (281 281) vs (283 283)
      row count is different in d30 : (280 280) vs (281 281)
      row count is different in d31 : (281 281) vs (282 282)
      row count is different in d32 : (284 284) vs (285 285)
      row count is different in d34 : (280 280) vs (281 281)
      row count is different in d37 : (280 280) vs (281 281)
      row count is different in d38 : (283 283) vs (285 285)
      row count is different in d39 : (282 282) vs (285 285)
      row count is different in d40 : (282 282) vs (284 284)
      done
      ===========================================
      ===========================================
      InnoDB sync-binlog=1 innodb_flush_log_at_trx_commit=0
      ===========================================
      consistency checks completed
      done
      

      Attachments

        Issue Links

          Activity

            People

              psergei Sergei Petrunia
              anikitin Andrii Nikitin (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.