Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
10.11.16
-
None
-
Can result in data loss
Description
Summary
We hit a MyRocks corruption scenario where MariaDB restarts, MyRocks fails to initialize with:
Corruption: truncated record body
|
The server itself then continues to start with InnoDB crash recovery and binlog recovery, but MyRocks remains unavailable and a large number of table metadata errors follow:
Incorrect information in file: './pmacontrol/...frm'
|
This leaves the server partially up, but RocksDB tables are unusable because the .frm files appear out of sync with the MyRocks dictionary/state.
A notable part of this case is that there is no explicit crash record in the MariaDB error log and no OOM-killer event recorded by the kernel.
Observed timeframe
Between 2026-03-05 23:00 and 2026-03-06 01:00:
- I found no matching journalctl entries for mariadb.service
- no matching entries for mariadbd
- no OOM-killer event in kernel logs
- journalctl -u mariadb is empty for that exact window
So there is:
- no systemd-recorded stop/start in that period
- no kernel-recorded OOM event
- no OOM-killer evidence
- no explicit mysqld crash entry in the MariaDB error log for that incident window
Important evidence from MariaDB error log
The MariaDB error log shows the relevant sequence in error.log.old:
- repeated InnoDB: Memory pressure event disregarded messages from 23:18 onward
- a burst of aborted pmacontrol connections around 23:57:49-23:57:53
- MariaDB startup at 2026-03-05 23:58:35
- during that startup, MyRocks fails with:
RocksDB: Error opening instance, Status Code: 2, Status: Corruption: truncated record body - the server then performs InnoDB crash recovery
- the server then performs binlog recovery
- immediately after that, many errors appear like:
Incorrect information in file: './pmacontrol/...frm'
Representative references from the error log:
- repeated memory pressure warnings starting around error.log.old#L481872
- aborted connections around error.log.old#L481903
- startup at error.log.old#L481930
- MyRocks corruption at error.log.old#L481939
- InnoDB crash recovery at error.log.old#L481954
- binlog recovery at error.log.old#L481980
- .frm metadata errors starting around error.log.old#L482038
- continuing later, e.g. error.log.old#L483593
Important negative evidence
There is no corresponding:
- mysqld got signal
- assertion failure
- stack trace
- OOM killer log
- systemd stop/start record
So the incident looks like a restart followed by MyRocks corruption recovery failure, but without a normal crash signature in either MariaDB logs or kernel logs.
Representative server log excerpt
2026-03-06 9:16:18 0 [Note] Starting MariaDB 10.11.16-MariaDB-deb12-log ...
|
2026-03-06 9:16:18 0 [Note] RocksDB: 2 column families found
|
2026-03-06 9:16:20 0 [ERROR] RocksDB: Error opening instance, Status Code: 2, Status: Corruption: truncated record body
|
2026-03-06 9:16:20 0 [ERROR] Plugin 'ROCKSDB' registration as a STORAGE ENGINE failed.
|
...
|
2026-03-06 9:16:20 0 [Note] /usr/sbin/mariadbd: ready for connections.
|
...
|
2026-03-06 9:16:20 9 [ERROR] mariadbd: Incorrect information in file: './pmacontrol/ts_value_digest_text.frm'
|
2026-03-06 9:16:20 9 [ERROR] mariadbd: Incorrect information in file: './pmacontrol/ts_value_calculated_double.frm'
|
2026-03-06 9:16:20 9 [ERROR] mariadbd: Incorrect information in file: './pmacontrol/ts_value_slave_json.frm'
|
2026-03-06 9:16:20 9 [ERROR] mariadbd: Incorrect information in file: './pmacontrol/ts_value_general_int.frm'
|
2026-03-06 9:16:20 9 [ERROR] mariadbd: Incorrect information in file: './pmacontrol/ts_value_calculated_int.frm'
|
2026-03-06 9:16:20 9 [ERROR] mariadbd: Incorrect information in file: './pmacontrol/ts_value_slave_text.frm'
|
2026-03-06 9:16:20 9 [ERROR] mariadbd: Incorrect information in file: './pmacontrol/ts_value_general_text.frm'
|
2026-03-06 9:16:20 9 [ERROR] mariadbd: Incorrect information in file: './pmacontrol/ts_value_digest_double.frm'
|
2026-03-06 9:16:20 9 [ERROR] mariadbd: Incorrect information in file: './pmacontrol/ts_value_calculated_text.frm'
|
Allocator / memory environment
The server is running with jemalloc.
This is relevant because the incident happened in a context where MariaDB was logging repeated:
InnoDB: Memory pressure event disregarded
|
but there was still:
- no kernel OOM-killer event
- no explicit MariaDB crash signature in the error log
So this does not look like a straightforward kernel OOM kill. The behavior happened under memory pressure, with jemalloc in use, and ended in MyRocks startup corruption plus .frm metadata inconsistency.
Possible trigger
This happened after ALTER activity on RocksDB tables, with concurrent memory pressure visible on the InnoDB side (Memory pressure event disregarded messages), while running with jemalloc. I cannot prove kernel OOM, and there is no OOM-killer entry, but there was clearly memory pressure before the restart/corruption event.
Workaround / recovery
To recover MyRocks enough to start and proceed with repair, the following setting helped:
rocksdb_wal_recovery_mode=2
|
This appears necessary to get past the corrupted WAL state.
Expected behavior
One of the following should happen instead:
- ALTER on RocksDB should fail atomically without leaving MyRocks data dictionary / .frm files inconsistent.
- Startup should detect and report the exact metadata mismatch more explicitly.
- Recovery should not leave the server in a state where MyRocks is unavailable but the server otherwise appears started and usable.
Actual behavior
- MariaDB starts
- MyRocks plugin fails
- InnoDB crash recovery runs
- binlog recovery runs
- .frm errors continue for RocksDB tables
- affected RocksDB tables in schema pmacontrol are unusable
Environment
- MariaDB version: 10.11.16-MariaDB-deb12-log
- OS: Debian 12
- MyRocks enabled
- InnoDB also enabled
- jemalloc in use
Relevant my.cnf excerpt
[mysqld]
|
user = mysql
|
port = 3306
|
socket = /var/run/mysqld/mysqld.sock
|
datadir = /srv/mysql/data
|
tmpdir = /srv/mysql/tmp
|
log_error = /srv/mysql/log/error.log
|
skip-name-resolve
|
 |
max_connections = 100
|
connect_timeout = 10
|
wait_timeout = 600
|
max_allowed_packet = 256M
|
thread_cache_size = 128
|
 |
sort_buffer_size = 32M
|
tmp_table_size = 768M
|
max_heap_table_size = 768M
|
key_buffer_size = 128M
|
 |
default_storage_engine = InnoDB
|
innodb_buffer_pool_size = 2G
|
innodb_buffer_pool_size_auto_min = 1G
|
innodb_buffer_pool_size_max = 3G
|
innodb_log_file_size = 948M
|
innodb_log_buffer_size = 8M
|
innodb_file_per_table = 1
|
innodb_open_files = 2000
|
innodb_io_capacity = 2000
|
innodb_flush_method = O_DIRECT
|
innodb_strict_mode = 1
|
innodb_rollback_on_timeout = 1
|
 |
slow_query_log = 1
|
slow_query_log_file = /srv/mysql/log/mariadb-slow.log
|
long_query_time = 1
|
 |
log_bin = /srv/mysql/binlog/mariadb-bin
|
log_bin_index = /srv/mysql/binlog/mariadb-bin.index
|
binlog_expire_logs_seconds = 3600
|
max_binlog_size = 100M
|
sync_binlog = 10000
|
 |
performance_schema = ON
|
userstat = ON
|
query_response_time_stats = ON
|
event_scheduler = ON
|
 |
rocksdb_wal_recovery_mode = 2
|
rocksdb_flush_log_at_trx-commit = 2
|
 |
server-id = 394663081
|
report_host = ist-pmacontrol
|
 |
wsrep_on = OFF
|
wsrep_cluster_name = 68Koncept
|
Potentially related existing tickets
This issue looks related in theme, but not identical, to:
- MDEV-20406: Rocksdb gets corrupted on OOM during ALTER
MDEV-18204: RocksDB failed to start due to problems validating data dictionary against .frm files- MDEV-29749: RocksDB does not refuse nopad collation in time, leaves corrupt schema
The difference here is:
- no kernel OOM evidence
- no OOM-killer event
- no explicit crash signature in the MariaDB error log
- truncated record body instead of the exact messages in those reports
- server continues startup without MyRocks, then emits many .frm metadata errors
Attachments
Issue Links
- relates to
-
MDEV-25180 Atomic ALTER TABLE
-
- Closed
-