[MDEV-31826] InnoDB may fail to recover after being killed in fil_delete_tablespace() Created: 2023-08-02 Updated: 2023-11-07 Resolved: 2023-10-27 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Backup, Storage Engine - InnoDB |
| Affects Version/s: | 10.6, 10.7, 10.8, 10.9, 10.10, 10.11, 11.0, 11.1, 11.2 |
| Fix Version/s: | 10.6.16, 10.10.7, 10.11.6, 11.0.4, 11.1.3, 11.2.2 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Marko Mäkelä | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | corruption, recovery, rr-profile-analyzed | ||
| Description |
|
mleich produced an rr replay trace where crash recovery fails like this:
The immediate reason for this is that recovery wants to read page 3 of tablespace 82 (test/B.ibd). That page had been left as all-zero by the server that had been killed right before the server restart. The buf_flush_page_cleaner() thread had invoked buf_flush_discard_page() on the page, with oldest_modification()=74626640 and FIL_PAGE_LSN=87174362. The reason for that is that the file was going to be deleted:
The DDL statement was CREATE OR REPLACE TABLE B AS SELECT * FROM `t8` /* E_R Thread32 QNO 21 CON_ID 29 */ but I think that this can affect any DDL operation that can delete InnoDB .ibd files: DROP TABLE, TRUNCATE TABLE, ALTER TABLE, OPTIMIZE TABLE. The problem appears to be that fil_delete_tablespace() is first setting the fil_space_t::STOPPING flag and only then writing the FILE_DELETE record. In the above execution, that record was never written and therefore never recovered. I got copy of the data directory, but the data.tar.gz is 12MiB, while the maximum file size limit of this Jira is only 10MiB. |
| Comments |
| Comment by Matthias Leich [ 2023-09-06 ] | |||||||||||||||||||||||||||
|
| |||||||||||||||||||||||||||
| Comment by Matthias Leich [ 2023-10-06 ] | |||||||||||||||||||||||||||
|
| |||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-10-16 ] | |||||||||||||||||||||||||||
|
Please consider my suggestion for an alternative solution. I am afraid that this fix could introduce a race condition if two threads happen to attempt to delete the same file concurrently. We discussed that such a race might not be possible, but I would like to play it safe. | |||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-10-24 ] | |||||||||||||||||||||||||||
|
I have a fix that does not work for encrypted tables yet. The deletion of fil_space_t::crypt_data is tricky. | |||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-10-25 ] | |||||||||||||||||||||||||||
|
I think that the cleanest way to fix this is to split the fil_space_t::STOPPING flag into two: STOPPING_READS and STOPPING_WRITES. We must stop reading pages or dirtying them in the fil_crypt_thread() as soon as the STOPPING_READS flag is set, and we must keep writing them until the STOPPING_WRITES flag has been set. That flag would not be set before the FILE_DELETE record has been durably written. | |||||||||||||||||||||||||||
| Comment by Vladislav Lesin [ 2023-10-26 ] | |||||||||||||||||||||||||||
|
The code looks good to me. | |||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-10-26 ] | |||||||||||||||||||||||||||
|
For the record, the error scenario is as follows:
The fix is to not discard any page writes before the FILE_DELETE record has been durably written to the log. The permission to discard writes is granted by setting the new STOPPING_WRITES flag. | |||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-11-06 ] | |||||||||||||||||||||||||||
|
The refactored fil_space_t::drop() fails to close the file handle in the error handling of ALTER TABLE…IMPORT TABLESPACE as well as in the |