Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-31826

InnoDB may fail to recover after being killed in fil_delete_tablespace()

    XMLWordPrintable

Details

    Description

      mleich produced an rr replay trace where crash recovery fails like this:

      2023-08-01  8:26:12 0 [Note] InnoDB: Multi-batch recovery needed at LSN 78671708
      2023-08-01  8:26:12 0 [Note] InnoDB: End of log at LSN=88328740
      2023-08-01  8:26:12 0 [Note] InnoDB: To recover: LSN 78752404/88328740; 875 pages
      2023-08-01  8:26:13 0 [Note] InnoDB: Set innodb_force_recovery=1 to ignore corrupted pages.
      

      The immediate reason for this is that recovery wants to read page 3 of tablespace 82 (test/B.ibd). That page had been left as all-zero by the server that had been killed right before the server restart. The buf_flush_page_cleaner() thread had invoked buf_flush_discard_page() on the page, with oldest_modification()=74626640 and FIL_PAGE_LSN=87174362. The reason for that is that the file was going to be deleted:

      (rr) backtrace
      #0  fil_space_t::check_pending_operations (id=82) at /data/Server/bb-11.2-MDEV-14795E/storage/innobase/fil/fil0fil.cc:1707
      #1  0x000055c51e4f7f46 in fil_delete_tablespace (id=82) at /data/Server/bb-11.2-MDEV-14795E/storage/innobase/fil/fil0fil.cc:1774
      #2  0x000055c51e4f2643 in trx_t::commit (this=this@entry=0xb3b200fc180, deleted=std::vector of length 0, capacity 0) at /data/Server/bb-11.2-MDEV-14795E/storage/innobase/dict/drop.cc:270
      #3  0x000055c51e3d0982 in ha_innobase::delete_table (this=<optimized out>, name=<optimized out>) at /data/Server/bb-11.2-MDEV-14795E/storage/innobase/handler/ha_innodb.cc:13689
      #4  0x000055c51e16ec05 in hton_drop_table (hton=<optimized out>, path=<optimized out>) at /data/Server/bb-11.2-MDEV-14795E/sql/handler.cc:578
      #5  0x000055c51e1742e9 in ha_delete_table (thd=thd@entry=0x461168000c68, hton=hton@entry=0x55c520a22b88, path=path@entry=0x18885dda5e80 "./test/B", db=db@entry=0x461168010b20, alias=alias@entry=0x461168010b30, 
          generate_warning=generate_warning@entry=false) at /data/Server/bb-11.2-MDEV-14795E/sql/handler.cc:3192
      ...
      (rr) finish
      Run till exit from #0  fil_space_t::check_pending_operations (id=82) at /data/Server/bb-11.2-MDEV-14795E/storage/innobase/fil/fil0fil.cc:1707
       
      Thread 3 received signal SIGKILL, Killed.
      

      The DDL statement was CREATE OR REPLACE TABLE B AS SELECT * FROM `t8` /* E_R Thread32 QNO 21 CON_ID 29 */ but I think that this can affect any DDL operation that can delete InnoDB .ibd files: DROP TABLE, TRUNCATE TABLE, ALTER TABLE, OPTIMIZE TABLE.

      The problem appears to be that fil_delete_tablespace() is first setting the fil_space_t::STOPPING flag and only then writing the FILE_DELETE record. In the above execution, that record was never written and therefore never recovered.

      I got copy of the data directory, but the data.tar.gz is 12MiB, while the maximum file size limit of this Jira is only 10MiB.

      Attachments

        Activity

          People

            marko Marko Mäkelä
            marko Marko Mäkelä
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.