[MDEV-22056] Rocks db corrupts data when disk is out of space Created: 2020-03-27  Updated: 2020-04-05

Status: Open
Project: MariaDB Server
Component/s: Storage Engine - RocksDB
Affects Version/s: 10.4.10
Fix Version/s: 10.4

Type: Bug Priority: Major
Reporter: Philip orleans Assignee: Sergei Petrunia
Resolution: Unresolved Votes: 1
Labels: rocksdb
Environment:

Linux


Issue Links:
Relates
relates to MDEV-17567 Atomic DDL Closed

 Description   

How do we eliminate this risk and how do we fix this issue after it happens?
I issued a command to truncate a table, and since it was stuck, I rebooted.

2020-03-27 5:18:17 0 [Warning] RocksDB: Schema mismatch - A .frm file exists for table asterisk.acc, but that table is not registered in RocksDB
2020-03-27 5:18:17 0 [ERROR] RocksDB: Problems validating data dictionary against .frm files, exiting
2020-03-27 5:18:17 0 [ERROR] RocksDB: Failed to initialize DDL manager.
2020-03-27 5:18:17 0 [ERROR] Plugin 'ROCKSDB' init function returned error.
2020-03-27 5:18:17 0 [ERROR] Plugin 'ROCKSDB' registration as a STORAGE ENGINE failed.
2020-03-27 5:18:17 0 [ERROR] Unknown/unsupported storage engine: rocksdb



 Comments   
Comment by Marko Mäkelä [ 2020-03-27 ]

philip_38, from the messages it would appear to me that you might be able to work around the problem by deleting the file asterisk/acc.frm from the data directory. Once the server has started up, you should be able to issue CREATE TABLE.

TRUNCATE is a bit special case of a DDL operation, because it will not modify the data dictionary on the SQL layer, or touch the .frm file. Apparently, the problem is that MyRocks will essentially drop the table when it runs out of space. InnoDB could be even worse in such scenarios, ever eager to commit suicide when it encounters a fatal error. In MDEV-13564 I rewrote its TRUNCATE so that the InnoDB in MariaDB will internally rename the table, create a new one, and drop the original table.

Could you create a minimal test case for this? It could involve creating a tiny file system on a loopback device. Such a test could also be helpful when we develop MDEV-17567.

Comment by Philip orleans [ 2020-03-27 ]

Could you elaborate on how to create a small file system on a loopback
device?

On Fri, Mar 27, 2020 at 3:19 AM Marko Mäkelä (Jira) <jira@mariadb.org>

Comment by Elena Stepanova [ 2020-04-05 ]

If a test case is really needed for debugging/fixing this, a better way to do it would be introducing a debug injection to imitate disk space outage (if applicable to this particular failure). Only then will the test case be portable and usable for the regression suite.

Generated at Thu Feb 08 09:11:51 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.