Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33098

The test innodb.doublewrite_debug occasionally fails to start up InnoDB

Details

    Description

      The test innodb.doublewrite_debug that was recently added in MDEV-32242 is occasionally failing on 10.6 and 10.11 like this:

      10.6 98287bd2d6877969514469ebfbf27f9951b99657

      innodb.doublewrite_debug '32k,innodb,strict_crc32' w28 [ fail ]
              Test ended at 2023-12-19 16:08:33
       
      CURRENT_TEST: innodb.doublewrite_debug
      mysqltest: At line 131: query 'select f1, f2 from t1' failed: ER_UNKNOWN_STORAGE_ENGINE (1286): Unknown storage engine 'InnoDB'
       
      The result from queries just before the failure was:
      < snip >
      # system tablespace is corrupted and 2nd page as corrupted.
      set global innodb_log_checkpoint_now = 1;
      begin;
      insert into t1 values (6, repeat('%', 400));
      # Make the first page dirty for system tablespace
      set global innodb_saved_page_number_debug = 0;
      set global innodb_fil_make_page_dirty_debug = 0;
      # Make the second page dirty for system tablespace
      set global innodb_saved_page_number_debug = 1;
      set global innodb_fil_make_page_dirty_debug = 0;
      set global innodb_buf_flush_list_now = 1;
      # Corrupt the 1st page (page_no=0) and 2nd page of the system tablespace.
      # restart
      FOUND 2 /InnoDB: Restoring page \[page id: space=0, page number=0\] of datafile/ in mysqld.1.err
      FOUND 1 /InnoDB: Recovered page \[page id: space=0, page number=1\]/ in mysqld.1.err
      check table t1;
      Table	Op	Msg_type	Msg_text
      test.t1	check	Error	Unknown storage engine 'InnoDB'
      test.t1	check	error	Corrupt
      select f1, f2 from t1;
      

      Attachments

        Issue Links

          Activity

            I analyzed one failure on 10.11 in more detail. It seems to me that there unexpectedly was a log checkpoint right at the time when the server was killed. This seems to have revealed a code bug, which I think is already being addressed in MDEV-32968:

            2023-12-20 18:21:02 0 [Note] InnoDB: Completed initialization of buffer pool
            2023-12-20 18:21:02 0 [Note] InnoDB: Setting O_DIRECT on file ./ibdata1 failed
            2023-12-20 18:21:02 0 [ERROR] InnoDB: Data file './ibdata1' uses page size 16384, but the innodb_page_size start-up parameter is 4096
            2023-12-20 18:21:02 0 [Note] InnoDB: Restoring page [page id: space=0, page number=0] of datafile './ibdata1' from the doublewrite buffer. Writing 4096 bytes into file './ibdata1'
            2023-12-20 18:21:02 0 [Note] InnoDB: End of log at LSN=81124
            2023-12-20 18:21:02 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 1 row operations to undo
            2023-12-20 18:21:02 0 [Note] InnoDB: Trx id counter is 31
            2023-12-20 18:21:02 0 [Note] InnoDB: 128 rollback segments are active.
            2023-12-20 18:21:02 0 [Note] InnoDB: Starting in background the rollback of recovered transactions
            2023-12-20 18:21:02 0 [Note] InnoDB: Rolled back recovered transaction 29
            2023-12-20 18:21:02 0 [Note] InnoDB: Rollback of non-prepared transactions completed
            2023-12-20 18:21:02 0 [Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1"
            2023-12-20 18:21:02 0 [Note] InnoDB: Setting file './ibtmp1' size to 12.000MiB. Physically writing the file full; Please wait ...
            2023-12-20 18:21:02 0 [Note] InnoDB: File './ibtmp1' size is now 12.000MiB.
            2023-12-20 18:21:02 0 [Note] InnoDB: log sequence number 81124 (memory-mapped); transaction id 33
            

            The code bug here is that we are attempting to restore a page from the doublewrite buffer, even though there had been no server crash (the ib_logfile0 is logically empty). I would like to see how this would behave with the MDEV-32968 fix applied. In 10.11 thanks to MDEV-14425 we can make use of memory-mapped log file in /dev/shm. That could make the test fail much more easily than on 10.6 where we write the log via regular system calls.

            In the ib_logfile0 right before offset 81124 (0x13ce4) we have the following:

            013cd0 32 ac 58 e8 fa 00 00 00 00 00 00 00 01 3c d4 01
            013ce0 7d 7a 5c 2e b0 01 05 00 2e 2f 74 65 73 74 2f 74
            

            At 0x13cd4 we have the 16-byte FILE_CHECKPOINT record comprising the LSN 0x13cd4, followed by the end marker 0x01 and a CRC-32C checksum. From the recovery point of view, the server that had been killed had apparently been shut down cleanly, and no doublewrite recovery should have been invoked. We should have failed to start up InnoDB due to a corruption.

            The test bug here is that we do not detect that an unexpected log checkpoint occurred, and carry on to corrupt the page. We should either fail or skip the test in that case. thiru has some ideas on how to avoid the log checkpoint harder. That may work, but I would also double check in the test that no checkpoint occurred. Something like this should make the test fail in a more obvious way (instead of being silently skipped, which was supposed to be fixed in MDEV-32242):

            --let CLEANUP_IF_CHECKPOINT=ERROR_IN_TEST;
            --source ../include/no_checkpoint_end.inc
            

            marko Marko Mäkelä added a comment - I analyzed one failure on 10.11 in more detail. It seems to me that there unexpectedly was a log checkpoint right at the time when the server was killed. This seems to have revealed a code bug, which I think is already being addressed in MDEV-32968 : 2023-12-20 18:21:02 0 [Note] InnoDB: Completed initialization of buffer pool 2023-12-20 18:21:02 0 [Note] InnoDB: Setting O_DIRECT on file ./ibdata1 failed 2023-12-20 18:21:02 0 [ERROR] InnoDB: Data file './ibdata1' uses page size 16384, but the innodb_page_size start-up parameter is 4096 2023-12-20 18:21:02 0 [Note] InnoDB: Restoring page [page id: space=0, page number=0] of datafile './ibdata1' from the doublewrite buffer. Writing 4096 bytes into file './ibdata1' 2023-12-20 18:21:02 0 [Note] InnoDB: End of log at LSN=81124 2023-12-20 18:21:02 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 1 row operations to undo 2023-12-20 18:21:02 0 [Note] InnoDB: Trx id counter is 31 2023-12-20 18:21:02 0 [Note] InnoDB: 128 rollback segments are active. 2023-12-20 18:21:02 0 [Note] InnoDB: Starting in background the rollback of recovered transactions 2023-12-20 18:21:02 0 [Note] InnoDB: Rolled back recovered transaction 29 2023-12-20 18:21:02 0 [Note] InnoDB: Rollback of non-prepared transactions completed 2023-12-20 18:21:02 0 [Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1" 2023-12-20 18:21:02 0 [Note] InnoDB: Setting file './ibtmp1' size to 12.000MiB. Physically writing the file full; Please wait ... 2023-12-20 18:21:02 0 [Note] InnoDB: File './ibtmp1' size is now 12.000MiB. 2023-12-20 18:21:02 0 [Note] InnoDB: log sequence number 81124 (memory-mapped); transaction id 33 The code bug here is that we are attempting to restore a page from the doublewrite buffer, even though there had been no server crash (the ib_logfile0 is logically empty). I would like to see how this would behave with the MDEV-32968 fix applied. In 10.11 thanks to MDEV-14425 we can make use of memory-mapped log file in /dev/shm . That could make the test fail much more easily than on 10.6 where we write the log via regular system calls. In the ib_logfile0 right before offset 81124 (0x13ce4) we have the following: 013cd0 32 ac 58 e8 fa 00 00 00 00 00 00 00 01 3c d4 01 013ce0 7d 7a 5c 2e b0 01 05 00 2e 2f 74 65 73 74 2f 74 At 0x13cd4 we have the 16-byte FILE_CHECKPOINT record comprising the LSN 0x13cd4, followed by the end marker 0x01 and a CRC-32C checksum. From the recovery point of view, the server that had been killed had apparently been shut down cleanly, and no doublewrite recovery should have been invoked. We should have failed to start up InnoDB due to a corruption. The test bug here is that we do not detect that an unexpected log checkpoint occurred, and carry on to corrupt the page. We should either fail or skip the test in that case. thiru has some ideas on how to avoid the log checkpoint harder. That may work, but I would also double check in the test that no checkpoint occurred. Something like this should make the test fail in a more obvious way (instead of being silently skipped, which was supposed to be fixed in MDEV-32242 ): --let CLEANUP_IF_CHECKPOINT=ERROR_IN_TEST; --source ../include/no_checkpoint_end.inc

            OK to push after addressing my review comments.

            On the merge from 10.5 to 10.6, we have to reenable the test. On the merge to 11.0, we should revert previously made changes to doublewrite_debug.combinations.

            marko Marko Mäkelä added a comment - OK to push after addressing my review comments. On the merge from 10.5 to 10.6, we have to reenable the test. On the merge to 11.0, we should revert previously made changes to doublewrite_debug.combinations .

            Unfortunately, on merge to 10.6 (and re-enabling the test) there was a failure. There was no log checkpoint, but one of the two intentionally corrupted pages was not restored from the doublewrite buffer:

            st-10.6-merge fc4492a8e160c2d047e156edfc8eebdbcaeb49e7

            2024-01-03 10:59:06 0 [Note] InnoDB: Restoring page [page id: space=0, page number=0] of datafile './ibdata1' from the doublewrite buffer. Writing 32768 bytes into file './ibdata1'
            2024-01-03 10:59:06 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=59641,59641
            2024-01-03 10:59:06 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 1 row operations to undo
            2024-01-03 10:59:06 0 [Note] InnoDB: Trx id counter is 31
            2024-01-03 10:59:06 0 [Note] InnoDB: To recover: 2 pages
            2024-01-03 10:59:06 0 [Note] InnoDB: Set innodb_force_recovery=1 to ignore corrupted pages.
            2024-01-03 10:59:06 0 [ERROR] InnoDB: Unable to apply log to corrupted page [page id: space=0, page number=1]
            

            The above is from mysql-test/var/16/log/mysqld.1.err in the attached var.tar.gz.

            marko Marko Mäkelä added a comment - Unfortunately, on merge to 10.6 (and re-enabling the test) there was a failure. There was no log checkpoint, but one of the two intentionally corrupted pages was not restored from the doublewrite buffer: st-10.6-merge fc4492a8e160c2d047e156edfc8eebdbcaeb49e7 2024-01-03 10:59:06 0 [Note] InnoDB: Restoring page [page id: space=0, page number=0] of datafile './ibdata1' from the doublewrite buffer. Writing 32768 bytes into file './ibdata1' 2024-01-03 10:59:06 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=59641,59641 2024-01-03 10:59:06 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 1 row operations to undo 2024-01-03 10:59:06 0 [Note] InnoDB: Trx id counter is 31 2024-01-03 10:59:06 0 [Note] InnoDB: To recover: 2 pages 2024-01-03 10:59:06 0 [Note] InnoDB: Set innodb_force_recovery=1 to ignore corrupted pages. 2024-01-03 10:59:06 0 [ERROR] InnoDB: Unable to apply log to corrupted page [page id: space=0, page number=1] The above is from mysql-test/var/16/log/mysqld.1.err in the attached var.tar.gz .

            In var.tar.gz there are 3 copies of page 0 or 1 (of any tablespace) in the doublewrite buffer:

            od -Ax -t x1 -N 0x400000 -j 0x20000 mysql-test/var/log/innodb.doublewrite_debug-32k,innodb,strict_crc32/mysqld.1/data/ibdata1 count=128|
            grep '[80]000 .. .. .. .. 00 00 00 0[10]'
            

            210000 04 4d e3 9b 00 00 00 00 ff ff ff ff ff ff ff ff
            218000 13 99 b0 29 00 00 00 01 ff ff ff ff ff ff ff ff
            248000 d2 c9 ff a0 00 00 00 00 ff ff ff ff ff ff ff ff
            

            All of these are for the system tablespace, as we can see by the following:

            od -Ax -t x1 -N 0x400000 -j 0x20000 mysql-test/var/log/innodb.doublewrite_debug-32k,innodb,strict_crc32/mysqld.1/data/ibdata1 count=128|
            grep '2480[012]0 \|21[80]0[012]0 '
            

            210000 04 4d e3 9b 00 00 00 00 ff ff ff ff ff ff ff ff
            210010 00 00 00 00 00 00 eb 2d 00 08 00 00 00 00 00 00
            210020 e5 b4 00 00 00 00 00 00 00 00 00 00 00 00 00 00
            218000 13 99 b0 29 00 00 00 01 ff ff ff ff ff ff ff ff
            218010 00 00 00 00 00 00 e8 8d 00 05 00 00 00 00 00 00
            218020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
            248000 d2 c9 ff a0 00 00 00 00 ff ff ff ff ff ff ff ff
            248010 00 00 00 00 00 00 e5 53 00 08 00 00 00 00 00 00
            248020 d2 92 00 00 00 00 00 00 00 00 00 00 00 00 00 00
            

            We can ignore the copy of page 0 at 0x248000 because its FIL_SPACE_LSN 0xe553 is smaller than the one starting at 0x210000.

            It is possible that this failure would be fixed by MDEV-32968.

            marko Marko Mäkelä added a comment - In var.tar.gz there are 3 copies of page 0 or 1 (of any tablespace) in the doublewrite buffer: od -Ax -t x1 -N 0x400000 -j 0x20000 mysql-test/var/log/innodb.doublewrite_debug-32k,innodb,strict_crc32/mysqld.1/data/ibdata1 count=128| grep '[80]000 .. .. .. .. 00 00 00 0[10]' 210000 04 4d e3 9b 00 00 00 00 ff ff ff ff ff ff ff ff 218000 13 99 b0 29 00 00 00 01 ff ff ff ff ff ff ff ff 248000 d2 c9 ff a0 00 00 00 00 ff ff ff ff ff ff ff ff All of these are for the system tablespace, as we can see by the following: od -Ax -t x1 -N 0x400000 -j 0x20000 mysql-test/var/log/innodb.doublewrite_debug-32k,innodb,strict_crc32/mysqld.1/data/ibdata1 count=128| grep '2480[012]0 \|21[80]0[012]0 ' 210000 04 4d e3 9b 00 00 00 00 ff ff ff ff ff ff ff ff 210010 00 00 00 00 00 00 eb 2d 00 08 00 00 00 00 00 00 210020 e5 b4 00 00 00 00 00 00 00 00 00 00 00 00 00 00 218000 13 99 b0 29 00 00 00 01 ff ff ff ff ff ff ff ff 218010 00 00 00 00 00 00 e8 8d 00 05 00 00 00 00 00 00 218020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 248000 d2 c9 ff a0 00 00 00 00 ff ff ff ff ff ff ff ff 248010 00 00 00 00 00 00 e5 53 00 08 00 00 00 00 00 00 248020 d2 92 00 00 00 00 00 00 00 00 00 00 00 00 00 00 We can ignore the copy of page 0 at 0x248000 because its FIL_SPACE_LSN 0xe553 is smaller than the one starting at 0x210000. It is possible that this failure would be fixed by MDEV-32968 .

            It turns out that the page 1 had actually been written out slightly before the latest log checkpoint, and there was nothing to be recovered from the doublewrite buffer.

            The main difference between 10.5 and 10.6 with respect to this test is that the global change that had been made to mtr in MDEV-4750 was reverted by me in 10.6. On the revised merge to 10.6, I added an explicit STATS_PERSISTENT=0 to the CREATE TABLE statement to compensate for this. With that fix, the test passed on multiple builders where it had previously failed.

            I do not know the exact mechanism, but if we look at the relevant part of the test, we can see that the automatic recalculation of InnoDB statistics could interfere with it, essentially causing a race condition with no_checkpoint_start.inc:

            # Slow shutdown and restart to make sure ibuf merge is finished
            SET GLOBAL innodb_fast_shutdown = 0;
            let $shutdown_timeout=;
            let $restart_parameters="--debug_dbug=+d,ib_log_checkpoint_avoid_hard --innodb_flush_sync=0";
            --source include/restart_mysqld.inc
            --source ../include/no_checkpoint_start.inc
            

            The test expects that there are absolutely no InnoDB writes between the last two source directives. I assume that the update of statistics as well as a log checkpoint could somehow run concurrently with the no_checkpoint_start.inc and ruin the logic that is supposed to detect unexpected log checkpoints.

            marko Marko Mäkelä added a comment - It turns out that the page 1 had actually been written out slightly before the latest log checkpoint, and there was nothing to be recovered from the doublewrite buffer. The main difference between 10.5 and 10.6 with respect to this test is that the global change that had been made to mtr in MDEV-4750 was reverted by me in 10.6. On the revised merge to 10.6 , I added an explicit STATS_PERSISTENT=0 to the CREATE TABLE statement to compensate for this. With that fix, the test passed on multiple builders where it had previously failed. I do not know the exact mechanism, but if we look at the relevant part of the test, we can see that the automatic recalculation of InnoDB statistics could interfere with it, essentially causing a race condition with no_checkpoint_start.inc : # Slow shutdown and restart to make sure ibuf merge is finished SET GLOBAL innodb_fast_shutdown = 0; let $shutdown_timeout=; let $restart_parameters= "--debug_dbug=+d,ib_log_checkpoint_avoid_hard --innodb_flush_sync=0" ; --source include/restart_mysqld.inc --source ../include/no_checkpoint_start.inc The test expects that there are absolutely no InnoDB writes between the last two source directives. I assume that the update of statistics as well as a log checkpoint could somehow run concurrently with the no_checkpoint_start.inc and ruin the logic that is supposed to detect unexpected log checkpoints.

            I noticed that the test fails rather often on a combined ASAN+UBSAN build. I was able to reproduce the failure on 10.6, first without rr (running --parallel=120, or 3 times what nproc reports) and then with rr (running only 9 tests concurrently, to avoid running out of RAM disk).

            The test turned out to fail due to two bugs in debug instrumentation, which I have now fixed in 10.5 and merged to 10.6. We failed to wait for the write of page 1 to be completed, and to actually prevent a log checkpoint from being executed by the page cleaner thread.

            marko Marko Mäkelä added a comment - I noticed that the test fails rather often on a combined ASAN+UBSAN build. I was able to reproduce the failure on 10.6, first without rr (running --parallel=120 , or 3 times what nproc reports) and then with rr (running only 9 tests concurrently, to avoid running out of RAM disk). The test turned out to fail due to two bugs in debug instrumentation, which I have now fixed in 10.5 and merged to 10.6 . We failed to wait for the write of page 1 to be completed, and to actually prevent a log checkpoint from being executed by the page cleaner thread.

            People

              thiru Thirunarayanan Balathandayuthapani
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.