Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26631

Race condition in the test innodb.doublewrite

Details

    Description

      MDEV-26626 exposed the following failure on my system, when running a 32-bit build:

      ./mtr --rr innodb.doublewrite,4k,strict_full_crc32
      

      10.6 106b16a5af7a8e29889e0ef8930bd2901a9ad00e

      innodb.doublewrite '4k,innodb,strict_full_crc32' [ fail ]
              Test ended at 2021-09-17 08:28:10
       
      CURRENT_TEST: innodb.doublewrite
      mysqltest: At line 192: query 'select f1, f2 from t1' failed: ER_UNKNOWN_STORAGE_ENGINE (1286): Unknown storage engine 'InnoDB'
      

      If I revert the MDEV-26626 fix, the failure goes away. The test does not fail in the 10.5 branch for me. The cause of the failure appears to be that a write of a page that the test is intentionally corrupting was missed before recovery:

      10.6 106b16a5af7a8e29889e0ef8930bd2901a9ad00e

      2021-09-17  8:28:09 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=58220,58446
      2021-09-17  8:28:09 0 [ERROR] InnoDB: Cannot apply log to [page id: space=5, page number=0] of corrupted file './test/t1.ibd'
      

      At first, I was suspecting that the debug variable innodb_buf_flush_list_now is not working reliably, but the function buf_flush_sync() does appear to write out all pages from the buffer pool before the server was killed and restarted.

      The failure has not occured on any CI system yet, and it does not fail reliably for me when not using rr. Initially, it did fail without rr for the strict_full_crc32 combinations, for the 32-bit build.

      Attachments

        Issue Links

          Activity

            During a run of

            ./mtr --big-test --parallel=60 --force
            

            I reliably see all strict_full_crc32 combinations failing on my system (with nproc 40), using a 32-bit build, all in that same test line 192:

            innodb.doublewrite '32k,innodb,strict_full_crc32' w18 [ fail ]
            …
            innodb.doublewrite '64k,innodb,strict_full_crc32' w51 [ fail ]
            …
            innodb.doublewrite '16k,innodb,strict_full_crc32' w21 [ fail ]
            …
            innodb.doublewrite '4k,innodb,strict_full_crc32' w3 [ fail ]
            …
            innodb.doublewrite '8k,innodb,strict_full_crc32' w27 [ fail ]
            

            If I only run ./mtr --parallel=auto innodb.doublewrite (all 2×5 combinations in parallel), they will pass. It definitely is some kind of a race condition.

            marko Marko Mäkelä added a comment - During a run of ./mtr --big-test --parallel=60 --force I reliably see all strict_full_crc32 combinations failing on my system (with nproc 40), using a 32-bit build, all in that same test line 192: innodb.doublewrite '32k,innodb,strict_full_crc32' w18 [ fail ] … innodb.doublewrite '64k,innodb,strict_full_crc32' w51 [ fail ] … innodb.doublewrite '16k,innodb,strict_full_crc32' w21 [ fail ] … innodb.doublewrite '4k,innodb,strict_full_crc32' w3 [ fail ] … innodb.doublewrite '8k,innodb,strict_full_crc32' w27 [ fail ] If I only run ./mtr --parallel=auto innodb.doublewrite (all 2×5 combinations in parallel), they will pass. It definitely is some kind of a race condition.

            The recovery fails here:

            868	  ib::error() << "Cannot apply log to " << first
            (rr) p deferred_spaces
            $1 = {defers = std::map with 1 element = {[5] = {lsn = 58446, file_name = "./test/t1.ibd", deleted = false}}}
            

            In my initial debugging, I failed to notice that the recovered log is not for page number 0 (the page that is being corrupted by the test) but for page number 5. This code and data structure is part of the MDEV-24626 recovery. If we notice a corrupted first page during recovery and the recovered log records will not initialize page 0 of the tablespace, we will fail.

            That log record was written by the following before the server was killed:

            Thread 7 hit Hardware watchpoint 1: -location log_sys.lsn._M_i
             
            Old value = 58431
            New value = 58446
            mtr_t::finish_write (this=<optimized out>, this@entry=0xeebf6744, len=<optimized out>, len@entry=15) at /mariadb/10.6/storage/innobase/include/log0log.ic:290
            290		return lsn;
            (rr) bt
            #0  mtr_t::finish_write (this=<optimized out>, this@entry=0xeebf6744, len=<optimized out>, len@entry=15) at /mariadb/10.6/storage/innobase/include/log0log.ic:290
            #1  0x57425c5e in mtr_t::commit (this=0xeebf6744) at /mariadb/10.6/storage/innobase/mtr/mtr0mtr.cc:410
            #2  0x574a5202 in row_purge_reset_trx_id (node=<optimized out>, mtr=0xeebf6744) at /mariadb/10.6/storage/innobase/row/row0purge.cc:816
            

            Before the first kill, the test did try to ensure that purge will not run until that point:

            connect (stop_purge,localhost,root,,);
            START TRANSACTION WITH CONSISTENT SNAPSHOT;
            connection default;
            

            Adding a wait for purge before the second kill did not fix this failure after another kill:

            diff --git a/mysql-test/suite/innodb/t/doublewrite.test b/mysql-test/suite/innodb/t/doublewrite.test
            index cedd2c9942b..f03d16bd680 100644
            --- a/mysql-test/suite/innodb/t/doublewrite.test
            +++ b/mysql-test/suite/innodb/t/doublewrite.test
            @@ -144,6 +144,9 @@ EOF
             
             --source include/start_mysqld.inc
             
            +SET GLOBAL innodb_purge_rseg_truncate_frequency=1;
            +--source include/wait_all_purged.inc
            +
             check table t1;
             select f1, f2 from t1;
             
            

            If I omit that change and run the test with -mysqld=-innodb-force-recovery=2 to disable purge, it will get past that restart and fail in a different way later:

            2021-09-17  9:30:20 0 [Note] InnoDB: Starting final batch to recover 1 pages from redo log.
            mariadbd: /mariadb/10.6/storage/innobase/log/log0recv.cc:274: log_phys_t::apply_status log_phys_t::apply(const buf_block_t&, uint16_t&) const: Assertion `mach_read_from_4(frame + 4U) == block.page.id().page_no()' failed.
            

            That would be an all-zero frame of the page 1 of table t1 that was corrupted in a subsequent test step. The write of that log record originated from debug instrumentation, as expected:

            Thread 9 hit Hardware watchpoint 1: -location log_sys.lsn
             
            Old value = {<std::__atomic_base<unsigned long long>> = {_M_i = 57715}, <No data fields>}
            New value = {<std::__atomic_base<unsigned long long>> = {_M_i = 57721}, <No data fields>}
            mtr_t::finish_write (this=<optimized out>, this@entry=0xecfa90e0, len=<optimized out>, len@entry=6) at /mariadb/10.6/storage/innobase/include/log0log.ic:290
            290		return lsn;
            (rr) bt
            #0  mtr_t::finish_write (this=<optimized out>, this@entry=0xecfa90e0, len=<optimized out>, len@entry=6) at /mariadb/10.6/storage/innobase/include/log0log.ic:290
            #1  0x573e5c5e in mtr_t::commit (this=0xecfa90e0) at /mariadb/10.6/storage/innobase/mtr/mtr0mtr.cc:410
            #2  0x57359d73 in innodb_make_page_dirty (save=0xf2116fd8) at /mariadb/10.6/storage/innobase/handler/ha_innodb.cc:17447
            

            Before that page was written, there was a doublewrite batch that covered this page. A copy of the page was written to offset 0x104000 in the system tablespace as well as the final location. There were no other data file writes before the server was killed. We did recover a copy of the page in buf_dblwr_t::init_or_load_pages(), but recovery will not find the tablespace for t1 (tablespace ID 5) at this point:

            10.6 a8b16e921e6bb73b1fff5486314b2d89b9d286aa

            0x575459d4 in buf_dblwr_t::recover (this=<optimized out>) at /mariadb/10.6/storage/innobase/buf/buf0dblwr.cc:364
            364	    fil_space_t *space= fil_space_t::get(space_id);
            Value returned is $11 = (fil_space_t *) 0x0
            (rr) bt
            #0  0x575459d4 in buf_dblwr_t::recover (this=<optimized out>) at /mariadb/10.6/storage/innobase/buf/buf0dblwr.cc:364
            #1  0x57413ff2 in recv_recovery_from_checkpoint_start (flush_lsn=<optimized out>) at /mariadb/10.6/storage/innobase/log/log0recv.cc:4061
            #2  0x56b5df4a in srv_start (create_new_db=<optimized out>) at /mariadb/10.6/storage/innobase/srv/srv0start.cc:1441
            #3  0x5739bfad in innodb_init (p=0x5a8fdb88) at /mariadb/10.6/storage/innobase/handler/ha_innodb.cc:4196
            

            There was an earlier call to fil_ibd_load() that returned FIL_LOAD_DEFER, but that could be due to an ‘old sin’ that caused us to fail at line 192 already (if I didn’t disable purge): we would apparently fail to restore a copy of page 0 from the doublewrite buffer.

            My following work-around attempt did not avoid an assertion failure for applying log to page 1 (with page number mismatch on the page):

            diff --git a/mysql-test/suite/innodb/t/doublewrite.test b/mysql-test/suite/innodb/t/doublewrite.test
            index cedd2c9942b..fa08d0e09fc 100644
            --- a/mysql-test/suite/innodb/t/doublewrite.test
            +++ b/mysql-test/suite/innodb/t/doublewrite.test
            @@ -209,6 +209,8 @@ insert into t1 values (6, repeat('%', 400));
             --source ../include/no_checkpoint_start.inc
             
             --echo # Make the 2nd page dirty for table t1
            +set global innodb_saved_page_number_debug = 0;
            +set global innodb_fil_make_page_dirty_debug = @space_id;
             set global innodb_saved_page_number_debug = 1;
             set global innodb_fil_make_page_dirty_debug = @space_id;
             
            

            These failures are possibly related to MDEV-24626, because I could not get this test to fail on 10.5. It looks like we are not restoring the doublewrite buffer correctly in all cases.

            marko Marko Mäkelä added a comment - The recovery fails here: 868 ib::error() << "Cannot apply log to " << first (rr) p deferred_spaces $1 = {defers = std::map with 1 element = {[5] = {lsn = 58446, file_name = "./test/t1.ibd", deleted = false}}} In my initial debugging, I failed to notice that the recovered log is not for page number 0 (the page that is being corrupted by the test) but for page number 5. This code and data structure is part of the MDEV-24626 recovery. If we notice a corrupted first page during recovery and the recovered log records will not initialize page 0 of the tablespace, we will fail. That log record was written by the following before the server was killed: Thread 7 hit Hardware watchpoint 1: -location log_sys.lsn._M_i   Old value = 58431 New value = 58446 mtr_t::finish_write (this=<optimized out>, this@entry=0xeebf6744, len=<optimized out>, len@entry=15) at /mariadb/10.6/storage/innobase/include/log0log.ic:290 290 return lsn; (rr) bt #0 mtr_t::finish_write (this=<optimized out>, this@entry=0xeebf6744, len=<optimized out>, len@entry=15) at /mariadb/10.6/storage/innobase/include/log0log.ic:290 #1 0x57425c5e in mtr_t::commit (this=0xeebf6744) at /mariadb/10.6/storage/innobase/mtr/mtr0mtr.cc:410 #2 0x574a5202 in row_purge_reset_trx_id (node=<optimized out>, mtr=0xeebf6744) at /mariadb/10.6/storage/innobase/row/row0purge.cc:816 Before the first kill, the test did try to ensure that purge will not run until that point: connect (stop_purge,localhost,root,,); START TRANSACTION WITH CONSISTENT SNAPSHOT; connection default ; Adding a wait for purge before the second kill did not fix this failure after another kill: diff --git a/mysql-test/suite/innodb/t/doublewrite.test b/mysql-test/suite/innodb/t/doublewrite.test index cedd2c9942b..f03d16bd680 100644 --- a/mysql-test/suite/innodb/t/doublewrite.test +++ b/mysql-test/suite/innodb/t/doublewrite.test @@ -144,6 +144,9 @@ EOF --source include/start_mysqld.inc +SET GLOBAL innodb_purge_rseg_truncate_frequency=1; +--source include/wait_all_purged.inc + check table t1; select f1, f2 from t1; If I omit that change and run the test with - mysqld= -innodb-force-recovery=2 to disable purge, it will get past that restart and fail in a different way later: 2021-09-17 9:30:20 0 [Note] InnoDB: Starting final batch to recover 1 pages from redo log. mariadbd: /mariadb/10.6/storage/innobase/log/log0recv.cc:274: log_phys_t::apply_status log_phys_t::apply(const buf_block_t&, uint16_t&) const: Assertion `mach_read_from_4(frame + 4U) == block.page.id().page_no()' failed. That would be an all-zero frame of the page 1 of table t1 that was corrupted in a subsequent test step. The write of that log record originated from debug instrumentation, as expected: Thread 9 hit Hardware watchpoint 1: -location log_sys.lsn   Old value = {<std::__atomic_base<unsigned long long>> = {_M_i = 57715}, <No data fields>} New value = {<std::__atomic_base<unsigned long long>> = {_M_i = 57721}, <No data fields>} mtr_t::finish_write (this=<optimized out>, this@entry=0xecfa90e0, len=<optimized out>, len@entry=6) at /mariadb/10.6/storage/innobase/include/log0log.ic:290 290 return lsn; (rr) bt #0 mtr_t::finish_write (this=<optimized out>, this@entry=0xecfa90e0, len=<optimized out>, len@entry=6) at /mariadb/10.6/storage/innobase/include/log0log.ic:290 #1 0x573e5c5e in mtr_t::commit (this=0xecfa90e0) at /mariadb/10.6/storage/innobase/mtr/mtr0mtr.cc:410 #2 0x57359d73 in innodb_make_page_dirty (save=0xf2116fd8) at /mariadb/10.6/storage/innobase/handler/ha_innodb.cc:17447 Before that page was written, there was a doublewrite batch that covered this page. A copy of the page was written to offset 0x104000 in the system tablespace as well as the final location. There were no other data file writes before the server was killed. We did recover a copy of the page in buf_dblwr_t::init_or_load_pages() , but recovery will not find the tablespace for t1 (tablespace ID 5) at this point: 10.6 a8b16e921e6bb73b1fff5486314b2d89b9d286aa 0x575459d4 in buf_dblwr_t::recover (this=<optimized out>) at /mariadb/10.6/storage/innobase/buf/buf0dblwr.cc:364 364 fil_space_t *space= fil_space_t::get(space_id); Value returned is $11 = (fil_space_t *) 0x0 (rr) bt #0 0x575459d4 in buf_dblwr_t::recover (this=<optimized out>) at /mariadb/10.6/storage/innobase/buf/buf0dblwr.cc:364 #1 0x57413ff2 in recv_recovery_from_checkpoint_start (flush_lsn=<optimized out>) at /mariadb/10.6/storage/innobase/log/log0recv.cc:4061 #2 0x56b5df4a in srv_start (create_new_db=<optimized out>) at /mariadb/10.6/storage/innobase/srv/srv0start.cc:1441 #3 0x5739bfad in innodb_init (p=0x5a8fdb88) at /mariadb/10.6/storage/innobase/handler/ha_innodb.cc:4196 There was an earlier call to fil_ibd_load() that returned FIL_LOAD_DEFER , but that could be due to an ‘old sin’ that caused us to fail at line 192 already (if I didn’t disable purge): we would apparently fail to restore a copy of page 0 from the doublewrite buffer. My following work-around attempt did not avoid an assertion failure for applying log to page 1 (with page number mismatch on the page): diff --git a/mysql-test/suite/innodb/t/doublewrite.test b/mysql-test/suite/innodb/t/doublewrite.test index cedd2c9942b..fa08d0e09fc 100644 --- a/mysql-test/suite/innodb/t/doublewrite.test +++ b/mysql-test/suite/innodb/t/doublewrite.test @@ -209,6 +209,8 @@ insert into t1 values (6, repeat('%', 400)); --source ../include/no_checkpoint_start.inc --echo # Make the 2nd page dirty for table t1 +set global innodb_saved_page_number_debug = 0; +set global innodb_fil_make_page_dirty_debug = @space_id; set global innodb_saved_page_number_debug = 1; set global innodb_fil_make_page_dirty_debug = @space_id; These failures are possibly related to MDEV-24626 , because I could not get this test to fail on 10.5. It looks like we are not restoring the doublewrite buffer correctly in all cases.

            Removing the following fault injection makes the test pass, both with ./mtr --rr and when running ./mtr --big-test --parallel=60 --force:

            diff --git a/mysql-test/suite/innodb/t/doublewrite.test b/mysql-test/suite/innodb/t/doublewrite.test
            index cedd2c9942b..65499f51c3c 100644
            --- a/mysql-test/suite/innodb/t/doublewrite.test
            +++ b/mysql-test/suite/innodb/t/doublewrite.test
            @@ -182,7 +182,7 @@ open(FILE, "+<", $fname) or die;
             sysread(FILE, $page, $page_size)==$page_size||die "Unable to read $name\n";
             substr($page, 28, 4) = pack("N", 1000);
             sysseek(FILE, 0, 0)||die "Unable to seek $fname\n";
            -die unless syswrite(FILE, $page, $page_size) == $page_size;
            +#die unless syswrite(FILE, $page, $page_size) == $page_size;
             close FILE;
             EOF
             
            

            The root cause of all failures could be that MDEV-24626 has broken the recovery of page 0 from the doublewrite buffer.

            marko Marko Mäkelä added a comment - Removing the following fault injection makes the test pass, both with ./mtr --rr and when running ./mtr --big-test --parallel=60 --force : diff --git a/mysql-test/suite/innodb/t/doublewrite.test b/mysql-test/suite/innodb/t/doublewrite.test index cedd2c9942b..65499f51c3c 100644 --- a/mysql-test/suite/innodb/t/doublewrite.test +++ b/mysql-test/suite/innodb/t/doublewrite.test @@ -182,7 +182,7 @@ open(FILE, "+<", $fname) or die; sysread(FILE, $page, $page_size)==$page_size||die "Unable to read $name\n"; substr($page, 28, 4) = pack("N", 1000); sysseek(FILE, 0, 0)||die "Unable to seek $fname\n"; -die unless syswrite(FILE, $page, $page_size) == $page_size; +#die unless syswrite(FILE, $page, $page_size) == $page_size; close FILE; EOF The root cause of all failures could be that MDEV-24626 has broken the recovery of page 0 from the doublewrite buffer.

            Patch is in bb-10.6-MDEV-26631

            thiru Thirunarayanan Balathandayuthapani added a comment - Patch is in bb-10.6- MDEV-26631

            I think that this is a step to the right direction. But, a bit more work is needed in order to ensure that we are not giving up recovery in the case that the first page had been partially written without using the doublewrite buffer. (Doublewrite buffer would be skipped when the page was initialized by redo log records.) I suspect that the proposed fix could break the deferred recovery in this case.

            marko Marko Mäkelä added a comment - I think that this is a step to the right direction. But, a bit more work is needed in order to ensure that we are not giving up recovery in the case that the first page had been partially written without using the doublewrite buffer. (Doublewrite buffer would be skipped when the page was initialized by redo log records.) I suspect that the proposed fix could break the deferred recovery in this case.

            People

              thiru Thirunarayanan Balathandayuthapani
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.