Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-31826

InnoDB may fail to recover after being killed in fil_delete_tablespace()




      mleich produced an rr replay trace where crash recovery fails like this:

      2023-08-01  8:26:12 0 [Note] InnoDB: Multi-batch recovery needed at LSN 78671708
      2023-08-01  8:26:12 0 [Note] InnoDB: End of log at LSN=88328740
      2023-08-01  8:26:12 0 [Note] InnoDB: To recover: LSN 78752404/88328740; 875 pages
      2023-08-01  8:26:13 0 [Note] InnoDB: Set innodb_force_recovery=1 to ignore corrupted pages.

      The immediate reason for this is that recovery wants to read page 3 of tablespace 82 (test/B.ibd). That page had been left as all-zero by the server that had been killed right before the server restart. The buf_flush_page_cleaner() thread had invoked buf_flush_discard_page() on the page, with oldest_modification()=74626640 and FIL_PAGE_LSN=87174362. The reason for that is that the file was going to be deleted:

      (rr) backtrace
      #0  fil_space_t::check_pending_operations (id=82) at /data/Server/bb-11.2-MDEV-14795E/storage/innobase/fil/fil0fil.cc:1707
      #1  0x000055c51e4f7f46 in fil_delete_tablespace (id=82) at /data/Server/bb-11.2-MDEV-14795E/storage/innobase/fil/fil0fil.cc:1774
      #2  0x000055c51e4f2643 in trx_t::commit (this=this@entry=0xb3b200fc180, deleted=std::vector of length 0, capacity 0) at /data/Server/bb-11.2-MDEV-14795E/storage/innobase/dict/drop.cc:270
      #3  0x000055c51e3d0982 in ha_innobase::delete_table (this=<optimized out>, name=<optimized out>) at /data/Server/bb-11.2-MDEV-14795E/storage/innobase/handler/ha_innodb.cc:13689
      #4  0x000055c51e16ec05 in hton_drop_table (hton=<optimized out>, path=<optimized out>) at /data/Server/bb-11.2-MDEV-14795E/sql/handler.cc:578
      #5  0x000055c51e1742e9 in ha_delete_table (thd=thd@entry=0x461168000c68, hton=hton@entry=0x55c520a22b88, path=path@entry=0x18885dda5e80 "./test/B", db=db@entry=0x461168010b20, alias=alias@entry=0x461168010b30, 
          generate_warning=generate_warning@entry=false) at /data/Server/bb-11.2-MDEV-14795E/sql/handler.cc:3192
      (rr) finish
      Run till exit from #0  fil_space_t::check_pending_operations (id=82) at /data/Server/bb-11.2-MDEV-14795E/storage/innobase/fil/fil0fil.cc:1707
      Thread 3 received signal SIGKILL, Killed.

      The DDL statement was CREATE OR REPLACE TABLE B AS SELECT * FROM `t8` /* E_R Thread32 QNO 21 CON_ID 29 */ but I think that this can affect any DDL operation that can delete InnoDB .ibd files: DROP TABLE, TRUNCATE TABLE, ALTER TABLE, OPTIMIZE TABLE.

      The problem appears to be that fil_delete_tablespace() is first setting the fil_space_t::STOPPING flag and only then writing the FILE_DELETE record. In the above execution, that record was never written and therefore never recovered.

      I got copy of the data directory, but the data.tar.gz is 12MiB, while the maximum file size limit of this Jira is only 10MiB.




            marko Marko Mäkelä
            marko Marko Mäkelä
            1 Vote for this issue
            4 Start watching this issue



              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.