Status: Closed (View Workflow)
Debian Linux version 4.9.0-8-amd64 (email@example.com) (gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1) ) #1 SMP Debian 4.9.110-3+deb9u6 (2018-10-08)
64GB RAM machine with RAID1 SSD 256GB for logs and 10TB HDD for tables
We had a database setup running in production for quite some time (> 5 months) and it crashed during last weekend without obvious reason. MariaDB error log shows this on startup of the server:
We tried the various recovery modes of rocksdb_wal_recovery_mode (0,2,3) but none seems to have any effect, the output in the log is exactly the same.
Here is the rocksdb config part of the my.cnf:
Google did not return helpful results, most similar thing anyone had is this in my search: https://github.com/facebook/mysql-5.6/issues/391
Checking the rocksdb log it seems that the recovery phase is successfully completed, yet the rocksdb engine receives a shutdown signal from the MariaDB process. The log does not show any error message just before it goes into shutdown, so my best guess is that the shutdown signal is not internally produced but comes from MariaDB). Here is the regarding log:
Another observation is that rocksdb created ~6600 wal files in the regarding folder, yet most of them are just empty files. Only about 50 files are filled with content up to the point in time the crash occurred. So maybe something is wrong here. We have ~2500 files in the rocksdb datadir, a couply of them are also log files which correspond to files created in the wal dir.
I have the feeling the high number of files and the assertion failure might be connected, but it is just a guess. I could imagine that it is actually just a configuration problem which results in too many files being opened simultaneously leading to the assertion failure, just guessing wildly here.
I increased hard and soft limits for open files to 20000 for the mysql user, however this did not solve the problem. I also tried setting a limit to rocksdb max open files, this changed the error message in the MariaDB log, however it still failed with the same assertion failure but not with a buffer overflow anymore. I upgraded MariaDB from 10.2.18 to 10.2.21 but this did not solve the problem either.
I am happy to provide more details on request or to try out other things. Any support or ideas are greatly appreciated.