[MDEV-31826] InnoDB may fail to recover after being killed in fil_delete_tablespace() - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Blocker
Resolution: Fixed
Affects Version/s: 10.6, 10.7(EOL), 10.8(EOL), 10.9(EOL), 10.10(EOL), 10.11, 11.0(EOL), 11.1(EOL), 11.2(EOL)
Fix Version/s: 10.6.16, 10.10.7, 10.11.6, 11.0.4, 11.1.3, 11.2.2
Component/s: Backup, Storage Engine - InnoDB
Labels:

Description

mleich produced an rr replay trace where crash recovery fails like this:

2023-08-01  8:26:12 0 [Note] InnoDB: Multi-batch recovery needed at LSN 78671708

2023-08-01  8:26:12 0 [Note] InnoDB: End of log at LSN=88328740

2023-08-01  8:26:12 0 [Note] InnoDB: To recover: LSN 78752404/88328740; 875 pages

2023-08-01  8:26:13 0 [Note] InnoDB: Set innodb_force_recovery=1 to ignore corrupted pages.

The immediate reason for this is that recovery wants to read page 3 of tablespace 82 (test/B.ibd). That page had been left as all-zero by the server that had been killed right before the server restart. The buf_flush_page_cleaner() thread had invoked buf_flush_discard_page() on the page, with oldest_modification()=74626640 and FIL_PAGE_LSN=87174362. The reason for that is that the file was going to be deleted:

(rr) backtrace

#0  fil_space_t::check_pending_operations (id=82) at /data/Server/bb-11.2-MDEV-14795E/storage/innobase/fil/fil0fil.cc:1707

#1  0x000055c51e4f7f46 in fil_delete_tablespace (id=82) at /data/Server/bb-11.2-MDEV-14795E/storage/innobase/fil/fil0fil.cc:1774

#2  0x000055c51e4f2643 in trx_t::commit (this=this@entry=0xb3b200fc180, deleted=std::vector of length 0, capacity 0) at /data/Server/bb-11.2-MDEV-14795E/storage/innobase/dict/drop.cc:270

#3  0x000055c51e3d0982 in ha_innobase::delete_table (this=<optimized out>, name=<optimized out>) at /data/Server/bb-11.2-MDEV-14795E/storage/innobase/handler/ha_innodb.cc:13689

#4  0x000055c51e16ec05 in hton_drop_table (hton=<optimized out>, path=<optimized out>) at /data/Server/bb-11.2-MDEV-14795E/sql/handler.cc:578

#5  0x000055c51e1742e9 in ha_delete_table (thd=thd@entry=0x461168000c68, hton=hton@entry=0x55c520a22b88, path=path@entry=0x18885dda5e80 "./test/B", db=db@entry=0x461168010b20, alias=alias@entry=0x461168010b30,

    generate_warning=generate_warning@entry=false) at /data/Server/bb-11.2-MDEV-14795E/sql/handler.cc:3192

...

(rr) finish

Run till exit from #0  fil_space_t::check_pending_operations (id=82) at /data/Server/bb-11.2-MDEV-14795E/storage/innobase/fil/fil0fil.cc:1707

Thread 3 received signal SIGKILL, Killed.

The DDL statement was CREATE OR REPLACE TABLE B AS SELECT * FROM `t8` /* E_R Thread32 QNO 21 CON_ID 29 */ but I think that this can affect any DDL operation that can delete InnoDB .ibd files: DROP TABLE, TRUNCATE TABLE, ALTER TABLE, OPTIMIZE TABLE.

The problem appears to be that fil_delete_tablespace() is first setting the fil_space_t::STOPPING flag and only then writing the FILE_DELETE record. In the above execution, that record was never written and therefore never recovered.

I got copy of the data directory, but the data.tar.gz is 12MiB, while the maximum file size limit of this Jira is only 10MiB.

Attachments

Activity

People

Assignee:: Marko Mäkelä

Reporter:: Marko Mäkelä

Votes:: 1 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 2023-08-02 09:19

Updated:: 2023-11-07 11:01

Resolved:: 2023-10-27 13:28

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server