[MDEV-28870] InnoDB: Missing FILE_CREATE, FILE_DELETE or FILE_MODIFY before FILE_CHECKPOINT during crash recovery Created: 2022-06-16 Updated: 2024-01-31 Resolved: 2022-06-21 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.5, 10.6, 10.7, 10.8, 10.9, 10.10 |
| Fix Version/s: | 10.6.9, 10.7.5, 10.8.4, 10.9.2, 10.10.1 |
| Type: | Bug | Priority: | Major |
| Reporter: | Matthias Leich | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | corruption, recovery, rr-profile | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||
| Description |
|
|
| Comments |
| Comment by Matthias Leich [ 2022-06-16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The bug is quite similar to | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-06-17 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thank you! This might also explain intermittent failures of the test atomic.rename_table where InnoDB fails to recover. It turns out that a FILE_MODIFY record was parsed for the file that will be complained about, but the first page of the tablespace had not been written before the server was killed:
This is related to | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Thirunarayanan Balathandayuthapani [ 2022-06-17 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
It looks like FILE_DELETE record is eaten by FILE_CHECKPOINT. So during recovery, InnoDB encounters FILE_MODIFY for the tablespace, not FILE_DELETE.
Reassigning this issue to marko like he suggested. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-06-17 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks to further debugging, the problem appears to be insufficient mutual exclusion between log checkpoint and fil_delete_tablespace(). I think that this race condition may have been caused by In this particular execution, the log checkpoint violated ACID by essentially deleting a FILE_DELETE record that had been written earlier. Had the server been killed before the checkpoint was updated, the FILE_DELETE record would have been processed by crash recovery and the file would have been deleted as expected. If such a race condition is possible also with regard to FILE_RENAME and renaming files, that should explain failures of the test atomic.rename_table ( | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-06-20 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
It looks like fil_delete_tablespace() must employ some logic similar to mtr_t::commit_shrink() that is part of innodb_undo_log_truncate. Possibly something like this:
Something similar is necessary around fil_name_write_rename() as well, to ensure that a log checkpoint may not occur between the write of a FILE_RENAME record and the actual rename operation in the file system. Only after the file has been deleted or renamed in the file system, we may allow a log checkpoint to discard the record. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-06-20 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
In addition to being similar to | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-06-21 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I ended up not fixing this in 10.5, because this affects the recovery DDL operations, and those operations were not crash-safe before 10.6. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jörn Wagner [ 2022-08-15 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi, I'm currently affected by this bug as it seems. Running 10.6.8. After a system restart, my local MariaDB instance will not start up with the aforementioned error. Will this fix only prevent the error from happening in the future or will it also fix a broken recovery log? Do I need to rebuild my whole database? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-08-15 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The recently released MariaDB Server 10.6.9 includes a number of recovery fixes, such as | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jörn Wagner [ 2022-08-15 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Unfortunately, I do not see MariaDB 10.6.9 available for download. https://jira.mariadb.org/projects/MDEV/versions/27507 says it's unreleased but all issues are done. https://mariadb.com/kb/en/mariadb-1069-changelog/ also states "There are currently no official packages or binaries available for download which contain the features". Where can we find or when can we expect binaries for that version? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2022-09-18 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Release planning is on this jira page. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-09-11 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
One more reason for these symptoms could be that multiple instances of InnoDB were running on the same data files. This would be a regression due to implementing |