[MDEV-29204] InnoDB: Unknown error Required history data has been deleted Created: 2022-07-29  Updated: 2023-11-23  Resolved: 2022-10-31

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.3, 10.4
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Ovidiu Stanila Assignee: Marko Mäkelä
Resolution: Not a Bug Votes: 0
Labels: None
Environment:

CentOS 7.9.2009 MariaDB 10.4.25


Attachments: File jos_jobs-schema.sql     Text File mariadb-10.3.18_crash_2019_10_11-1.txt     Text File mariadb-10.3.21_crash_2019_12_25-1.txt     Text File mariadb-10.3.22_crash_2020_05_16-1.txt     Text File mariadb-10.3.23_crash_2020_05_29-1.txt     Text File mariadb-10.3.23_crash_2020_06_16-1.txt     Text File mariadb-10.4.25_crash_2022_07_27-1.txt     Text File mariadb_10.3.23_crash_2020-07-20_01-1.txt     Text File mariadb_10.3.23_crash_2020-07-20_02-1.txt     Text File mariadb_10.3.23_crash_2020-07-31-1.txt     Text File mariadb_10.3.27_crash_2020-02-01-1.txt     Text File mariadb_10.3.27_crash_2020-12-10-1.txt     File my.cnf    
Issue Links:
Relates
relates to MDEV-21100 [ERROR] [FATAL] InnoDB: Unknown error... Closed

 Description   

Since upgrading from MariaDB 10.2.25 to version 10.3.16 we started getting the error bellow, followed by a restart:

mysqld: 2022-07-27  8:38:51 4461671 [ERROR] [FATAL] InnoDB: Unknown error Required history data has been deleted
mysqld: 220727  8:38:51 [ERROR] mysqld got signal 6 ;

This happens randomly, recently it wasn't that frequent anymore, but it's still there.
We initially logged our problem in MDEV-21100 but as that's linked to Galera, we've been recommended to log this separately.
I've attached here a few stack traces from when we first started seeing this. These are not all crashes but just a few of them.

my.cnf is the configuration we're using for sometime now, not much has changed there over the time.

This (db10) is the "main" master, in classic master-master setup, where the other server is kept as a hot copy and doesn't send any data over.

The table `jos_jobs` is pretty basic and has pretty dynamic content, records are added and removed frequently, as it's used as a sort of job queue.

That's the odd thing, this table was re-created a few times until now (DROP/CREATE - auto_increment reset) and we also run OPTIMIZE on it daily (defragmentation). So it's strange we're getting errors on accessing InnoDB "history" data on a table with very dynamic content which doesn't keep records for too long stored in it.

I've tried to replicate this by running the same set queries for a longer period of time on a similar instance but failed to replicate the crash, so I'm stuck there.

This has been sticking with us even after upgrading to MariaDB 10.4. Upgrades were done incrementally (10.2 -> 10.3 -> 10.4), even for the minor versions (10.4.23 -> 10.4.24 -> 10.4.25), and we didn't have any problems during the upgrades.



 Comments   
Comment by Marko Mäkelä [ 2022-07-29 ]

Do you have an idea of the oldest version of MariaDB or MySQL that has ever touched the data files? Before MySQL 5.1.48 and MariaDB 5.1.48, InnoDB wrote some uninitialized garbage to some unused fields, and this particular garbage was not correctly ignored on an upgrade to MariaDB 10.3 before MDEV-27295 was fixed.

Comment by Ovidiu Stanila [ 2022-07-29 ]

I've gone through our logs and tickets and it seems this server comes back from MariaDB 10.0.31 built through a logical backup import.

Comment by Marko Mäkelä [ 2022-07-29 ]

Thank you. I think that the safest option would be to rebuild from a logical dump again. If you are using 10.4, I would recommend setting innodb_checksum_algorithm=full_crc32 before importing the dump. That setting (a file format with better page checksums) was introduced in 10.4, but not made the default until 10.5 in MDEV-19534.

Can you reproduce this after importing a logical dump?

Comment by Ovidiu Stanila [ 2022-08-01 ]

Any reason why we would need to rebuild the entire instance again?
We're using "innodb_file_per_table" and the issue occurs only on this specific table, not others, and only in some very specific circumstances (that we didn't manage to find until now). Isn't indexing/data limited to the .ibd data for this table?

Comment by Marko Mäkelä [ 2022-10-03 ]

ovidiu.stanila, the transaction metadata and undo logs are never stored in .ibd files but in the system tablespace and optionally undo log tablespaces. The .ibd files only contain pointers to that data (via the hidden columns DB_TRX_ID, DB_ROLL_PTR and the secondary index leaf page header field PAGE_MAX_TRX_ID). If the transaction metadata is corrupted in some way, all tables may be affected. While your .ibd files might be fine, it would be safest to rebuild the system tablespace.

If the database was initially built with MariaDB 10.0.31, then MDEV-27295 should not play a role here. There are many ways in which data could become corrupted. If crash recovery was ever executed or the database was ever restored from a backup, the rare bug that was fixed in MDEV-24449 might be an explanation. Likewise, if the parameter innodb_force_recovery was ever specified, with bad enough luck, a corruption of undo log or transaction metadata could be the result.

My last guess is that if MySQL 5.7 or a 5.7 compatible version of Percona Xtrabackup was ever invoked on the data files, then the transaction metadata could be corrupted. This is due to an incompatible change that was reverted in MDEV-12289 before it became part of any generally available MariaDB release.

Comment by Sergei Golubchik [ 2022-10-31 ]

The current explanation is that the tablespace was corrupted by an incompatible InnoDB version. Meaning, this is not a bug.

If there will be new information, showing that it was after all a bug, please, add a comment and we'll reopen the issue.

Comment by Marko Mäkelä [ 2023-11-23 ]

MDEV-27800 fixed a bug in an upgrade procedure related to MDEV-15132. This bug affects upgrades from data files that had originally been created before MySQL 5.1.48. Quote:

If the data files are already corrupted (they show transaction identifiers greater or equal to 281474976710656), then the correct course of action would seem to be to rebuild those data files.

Generated at Thu Feb 08 10:06:42 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.