Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-31826

InnoDB may fail to recover after being killed in fil_delete_tablespace()

Details

    Description

      mleich produced an rr replay trace where crash recovery fails like this:

      2023-08-01  8:26:12 0 [Note] InnoDB: Multi-batch recovery needed at LSN 78671708
      2023-08-01  8:26:12 0 [Note] InnoDB: End of log at LSN=88328740
      2023-08-01  8:26:12 0 [Note] InnoDB: To recover: LSN 78752404/88328740; 875 pages
      2023-08-01  8:26:13 0 [Note] InnoDB: Set innodb_force_recovery=1 to ignore corrupted pages.
      

      The immediate reason for this is that recovery wants to read page 3 of tablespace 82 (test/B.ibd). That page had been left as all-zero by the server that had been killed right before the server restart. The buf_flush_page_cleaner() thread had invoked buf_flush_discard_page() on the page, with oldest_modification()=74626640 and FIL_PAGE_LSN=87174362. The reason for that is that the file was going to be deleted:

      (rr) backtrace
      #0  fil_space_t::check_pending_operations (id=82) at /data/Server/bb-11.2-MDEV-14795E/storage/innobase/fil/fil0fil.cc:1707
      #1  0x000055c51e4f7f46 in fil_delete_tablespace (id=82) at /data/Server/bb-11.2-MDEV-14795E/storage/innobase/fil/fil0fil.cc:1774
      #2  0x000055c51e4f2643 in trx_t::commit (this=this@entry=0xb3b200fc180, deleted=std::vector of length 0, capacity 0) at /data/Server/bb-11.2-MDEV-14795E/storage/innobase/dict/drop.cc:270
      #3  0x000055c51e3d0982 in ha_innobase::delete_table (this=<optimized out>, name=<optimized out>) at /data/Server/bb-11.2-MDEV-14795E/storage/innobase/handler/ha_innodb.cc:13689
      #4  0x000055c51e16ec05 in hton_drop_table (hton=<optimized out>, path=<optimized out>) at /data/Server/bb-11.2-MDEV-14795E/sql/handler.cc:578
      #5  0x000055c51e1742e9 in ha_delete_table (thd=thd@entry=0x461168000c68, hton=hton@entry=0x55c520a22b88, path=path@entry=0x18885dda5e80 "./test/B", db=db@entry=0x461168010b20, alias=alias@entry=0x461168010b30, 
          generate_warning=generate_warning@entry=false) at /data/Server/bb-11.2-MDEV-14795E/sql/handler.cc:3192
      ...
      (rr) finish
      Run till exit from #0  fil_space_t::check_pending_operations (id=82) at /data/Server/bb-11.2-MDEV-14795E/storage/innobase/fil/fil0fil.cc:1707
       
      Thread 3 received signal SIGKILL, Killed.
      

      The DDL statement was CREATE OR REPLACE TABLE B AS SELECT * FROM `t8` /* E_R Thread32 QNO 21 CON_ID 29 */ but I think that this can affect any DDL operation that can delete InnoDB .ibd files: DROP TABLE, TRUNCATE TABLE, ALTER TABLE, OPTIMIZE TABLE.

      The problem appears to be that fil_delete_tablespace() is first setting the fil_space_t::STOPPING flag and only then writing the FILE_DELETE record. In the above execution, that record was never written and therefore never recovered.

      I got copy of the data directory, but the data.tar.gz is 12MiB, while the maximum file size limit of this Jira is only 10MiB.

      Attachments

        Activity

          mleich Matthias Leich added a comment - - edited

          origin/bb-10.6-MDEV-31826 94cecbfc68698cd410ee7f8d03218d13cb9b88b5 2023-09-04T18:02:16+05:30 tends serious to DB server hangs and long semaphore waits.
           
          sdp:/data1/results/1693843747/Freeze1/rqg.log  (no crash recovery involved)
          line 894   ERROR: Reporter 'Deadlock': Got no response from server to query '/*!100108 SET @@max_statement_time = 0 */' within 195s.
                            INFO: Reporter 'Deadlock': Killing <Db server> with pid 2100035 with SIGHUP in order to force debug output.                
          line 930  INFO: Reporter 'Deadlock': Killing <Db server> with pid 2100035 with SIGSEGV in order to capture core.
          line 1012 Quite many entries
                             [rr 2100035 177568]2023-09-04  9:11:38 38 [Warning] InnoDB: Trying to delete tablespace './test/#sql-alter-200b43-26.ibd' but there are 19 pending operations
                             over a long timespan.
          line 1034 debug output from the server error log
          line 1508 backtrace
           
          sdp:/data1/results/1693843747/Freeze1$ _RR_TRACE_DIR=./1/rr rr replay --mark-stdio
          ----------------------------------------------------
          Scenario:
          1. Several sessions run concurrent some DDL/DML mix.
          2. During 1. is ongoing intentional SIGKILL of the DB server
          3. Restart with success
          4. Only one connection is active and checks tables, runs OPTIMIZE etc.
              Than the server disappears during an "OPTIMIZE TABLE `test`.`t7`"
              Within the server error log are
              several
                   [Warning] InnoDB: Trying to delete tablespace './test/#sql-ib128.ibd' but there are 92 pending operations
              followed by
                   [ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch.
           
          sdp:/data1/results/1693843747/semaphore_wait$ _RR_TRACE_DIR=./1/rr rr replay --mark-stdio
          

          mleich Matthias Leich added a comment - - edited origin/bb-10.6-MDEV-31826 94cecbfc68698cd410ee7f8d03218d13cb9b88b5 2023-09-04T18:02:16+05:30 tends serious to DB server hangs and long semaphore waits.   sdp:/data1/results/1693843747/Freeze1/rqg.log (no crash recovery involved) line 894 ERROR: Reporter 'Deadlock': Got no response from server to query '/*!100108 SET @@max_statement_time = 0 */' within 195s. INFO: Reporter 'Deadlock': Killing <Db server> with pid 2100035 with SIGHUP in order to force debug output. line 930 INFO: Reporter 'Deadlock': Killing <Db server> with pid 2100035 with SIGSEGV in order to capture core. line 1012 Quite many entries [rr 2100035 177568]2023-09-04 9:11:38 38 [Warning] InnoDB: Trying to delete tablespace './test/#sql-alter-200b43-26.ibd' but there are 19 pending operations over a long timespan. line 1034 debug output from the server error log line 1508 backtrace   sdp:/data1/results/1693843747/Freeze1$ _RR_TRACE_DIR=./1/rr rr replay --mark-stdio ---------------------------------------------------- Scenario: 1. Several sessions run concurrent some DDL/DML mix. 2. During 1. is ongoing intentional SIGKILL of the DB server 3. Restart with success 4. Only one connection is active and checks tables, runs OPTIMIZE etc. Than the server disappears during an "OPTIMIZE TABLE `test`.`t7`" Within the server error log are several [Warning] InnoDB: Trying to delete tablespace './test/#sql-ib128.ibd' but there are 92 pending operations followed by [ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch.   sdp:/data1/results/1693843747/semaphore_wait$ _RR_TRACE_DIR=./1/rr rr replay --mark-stdio

          origin/bb-10.6-MDEV-31826 288d38ef137c55b409c9d0f403686d702cb878d5 2023-09-20T11:59:13+05:30
          performed well in RQG testing. All bad effects met are already known.
          It needs to be mentioned that I do not have some test which replays the bug described here with sufficient high likelihood.
          

          mleich Matthias Leich added a comment - origin/bb-10.6-MDEV-31826 288d38ef137c55b409c9d0f403686d702cb878d5 2023-09-20T11:59:13+05:30 performed well in RQG testing. All bad effects met are already known. It needs to be mentioned that I do not have some test which replays the bug described here with sufficient high likelihood.

          Please consider my suggestion for an alternative solution. I am afraid that this fix could introduce a race condition if two threads happen to attempt to delete the same file concurrently. We discussed that such a race might not be possible, but I would like to play it safe.

          marko Marko Mäkelä added a comment - Please consider my suggestion for an alternative solution . I am afraid that this fix could introduce a race condition if two threads happen to attempt to delete the same file concurrently. We discussed that such a race might not be possible, but I would like to play it safe.

          I have a fix that does not work for encrypted tables yet. The deletion of fil_space_t::crypt_data is tricky.

          marko Marko Mäkelä added a comment - I have a fix that does not work for encrypted tables yet. The deletion of fil_space_t::crypt_data is tricky.

          I think that the cleanest way to fix this is to split the fil_space_t::STOPPING flag into two: STOPPING_READS and STOPPING_WRITES. We must stop reading pages or dirtying them in the fil_crypt_thread() as soon as the STOPPING_READS flag is set, and we must keep writing them until the STOPPING_WRITES flag has been set. That flag would not be set before the FILE_DELETE record has been durably written.

          marko Marko Mäkelä added a comment - I think that the cleanest way to fix this is to split the fil_space_t::STOPPING flag into two: STOPPING_READS and STOPPING_WRITES . We must stop reading pages or dirtying them in the fil_crypt_thread() as soon as the STOPPING_READS flag is set, and we must keep writing them until the STOPPING_WRITES flag has been set. That flag would not be set before the FILE_DELETE record has been durably written.

          The code looks good to me.

          vlad.lesin Vladislav Lesin added a comment - The code looks good to me.

          For the record, the error scenario is as follows:

          1. fil_delete_tablespace() sets the fil_space_t::STOPPING flag for the tablespace of an .ibd file.
          2. The buf_flush_page_cleaner() thread discards some changed pages for the file and advances the log checkpoint a little.
          3. The server process is killed before fil_delete_tablespace() wrote a FILE_DELETE record.
          4. Recovery will try to apply log to pages of the tablespace, because there was no FILE_DELETE record. This will fail, because some pages that had been modified since the latest checkpoint had not been written by the page cleaner.

          The fix is to not discard any page writes before the FILE_DELETE record has been durably written to the log. The permission to discard writes is granted by setting the new STOPPING_WRITES flag.

          marko Marko Mäkelä added a comment - For the record, the error scenario is as follows: fil_delete_tablespace() sets the fil_space_t::STOPPING flag for the tablespace of an .ibd file. The buf_flush_page_cleaner() thread discards some changed pages for the file and advances the log checkpoint a little. The server process is killed before fil_delete_tablespace() wrote a FILE_DELETE record. Recovery will try to apply log to pages of the tablespace, because there was no FILE_DELETE record. This will fail, because some pages that had been modified since the latest checkpoint had not been written by the page cleaner. The fix is to not discard any page writes before the FILE_DELETE record has been durably written to the log. The permission to discard writes is granted by setting the new STOPPING_WRITES flag.

          The refactored fil_space_t::drop() fails to close the file handle in the error handling of ALTER TABLE…IMPORT TABLESPACE as well as in the MDEV-19229 upgrade logic. The latter was caught on Microsoft Windows on the merge to 10.11. A fixup was pushed.

          marko Marko Mäkelä added a comment - The refactored fil_space_t::drop() fails to close the file handle in the error handling of ALTER TABLE…IMPORT TABLESPACE as well as in the MDEV-19229 upgrade logic. The latter was caught on Microsoft Windows on the merge to 10.11. A fixup was pushed.

          People

            marko Marko Mäkelä
            marko Marko Mäkelä
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.