Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26933

InnoDB fails to detect page number mismatch

Details

    Description

      InnoDB is not checking that the page number in a buffer pool page is correct. The following would implement such a check:

      diff --git a/storage/innobase/mtr/mtr0mtr.cc b/storage/innobase/mtr/mtr0mtr.cc
      --- a/storage/innobase/mtr/mtr0mtr.cc
      +++ b/storage/innobase/mtr/mtr0mtr.cc
      @@ -1196,6 +1196,8 @@ void mtr_t::page_lock(buf_block_t *block, ulint rw_latch)
       #endif /* BTR_CUR_HASH_ADAPT */
       
       done:
      +  ut_ad(page_id_t(page_get_space_id(block->page.frame),
      +                  page_get_page_no(block->page.frame)) == block->page.id());
         memo_push(block, fix_type);
       }
       
      

      This would be tripped in a number of tests that inject corruption to data files. Those tests will have to be adjusted.

      Furthermore, operations like DROP TABLE need to be fixed so that they will not attempt to load the table definition at all. This would avoids a crash in a test that corrupts the clustered index root page number of a table. The following fix for that test would be needed in any case:

      diff --git a/mysql-test/suite/innodb/t/page_id_innochecksum.test b/mysql-test/suite/innodb/t/page_id_innochecksum.test
      index 2726b3254da..f5166018dd1 100644
      --- a/mysql-test/suite/innodb/t/page_id_innochecksum.test
      +++ b/mysql-test/suite/innodb/t/page_id_innochecksum.test
      @@ -61,5 +61,9 @@ let SEARCH_PATTERN=page id mismatch;
       --source include/search_pattern_in_file.inc
       
       --remove_file $resultlog
      +# prevent purge from crashing on page ID mismatch
      +let $restart_parameters=--innodb-force-recovery=2;
       --source include/start_mysqld.inc
       drop table t1;
      +let $restart_parameters=;
      +--source include/restart_mysqld.inc
      

      Attachments

        Issue Links

          Activity

            I slightly revised this based on a test with a data directory that was archived for MDEV-25683. That test case would still cause a hang of the DDL log recovery if the server is started up with innodb_force_recovery=4. My aim was to get rid of these useless messages, which would be output because rollback of a recovered DDL transaction was disabled by innodb_force_recovery=4:

            2021-05-15  1:39:58 0 [ERROR] InnoDB: Trying to load index `PRIMARY` for table `test`.`#sql-alter-27122f-10`, but the index tree has been freed!
            2021-05-15  1:39:58 0 [Note] InnoDB: Index is corrupt but forcing load into data dictionary
            

            While cleaning up the code, I realized that the InnoDB DDL rewrite in MariaDB Server 10.6 made obsolete the read-only startup parameter innodb_force_load_corrupted. I think that we should deprecate and ignore that parameter, and remove it in later versions.

            marko Marko Mäkelä added a comment - I slightly revised this based on a test with a data directory that was archived for MDEV-25683 . That test case would still cause a hang of the DDL log recovery if the server is started up with innodb_force_recovery=4 . My aim was to get rid of these useless messages, which would be output because rollback of a recovered DDL transaction was disabled by innodb_force_recovery=4 : 2021-05-15 1:39:58 0 [ERROR] InnoDB: Trying to load index `PRIMARY` for table `test`.`#sql-alter-27122f-10`, but the index tree has been freed! 2021-05-15 1:39:58 0 [Note] InnoDB: Index is corrupt but forcing load into data dictionary While cleaning up the code, I realized that the InnoDB DDL rewrite in MariaDB Server 10.6 made obsolete the read-only startup parameter innodb_force_load_corrupted . I think that we should deprecate and ignore that parameter, and remove it in later versions.

            origin/bb-10.6-marko be3425e1b48a6d681a82a5dd039f26f037af16ac 2021-11-03T15:03:19+02:00
            performed quite good with ~ 2000 RQG test for broad range coverage.
            There was no Crashrecovery failure of interest.
            Two new errors were observed
            - 4 times
               mysqld:storage/innobase/btr/btr0sea.cc:415: bool btr_search_update_block_hash_info(btr_search_t*, buf_block_t*): Assertion `buf_pool.is_uncompressed(block)' failed.
               which are per Marko:  Some buffer pool resizing combined with AHI problem.
            - 1 time
               mysqld: storage/innobase/include/srw_lock.h:170: void ssux_lock_impl<spinloop>::init() [with bool spinloop = true]: Assertion `is_vacant()' failed.
               during ANALYZE PARTITION.
               But partitioning is known to be error prone even on non development source trees.
            Both bad effects are quite probably not related to MDEV-26933.
            They are most probably by-catch found because of modifications of the test battery.
               
            

            mleich Matthias Leich added a comment - origin/bb-10.6-marko be3425e1b48a6d681a82a5dd039f26f037af16ac 2021-11-03T15:03:19+02:00 performed quite good with ~ 2000 RQG test for broad range coverage. There was no Crashrecovery failure of interest. Two new errors were observed - 4 times mysqld:storage/innobase/btr/btr0sea.cc:415: bool btr_search_update_block_hash_info(btr_search_t*, buf_block_t*): Assertion `buf_pool.is_uncompressed(block)' failed. which are per Marko: Some buffer pool resizing combined with AHI problem. - 1 time mysqld: storage/innobase/include/srw_lock.h:170: void ssux_lock_impl<spinloop>::init() [with bool spinloop = true]: Assertion `is_vacant()' failed. during ANALYZE PARTITION. But partitioning is known to be error prone even on non development source trees. Both bad effects are quite probably not related to MDEV-26933. They are most probably by-catch found because of modifications of the test battery.

            Thank you. The branch that you tested also included quite a bit of buf_block_t refactoring related to MDEV-26827, and the rarer assertion definitely is related to that (moving buf_block_t::lock to buf_page_t::lock). This change basically added a debug assertion assertion to mtr_t::page_lock(), checking that the page identifier in the buffer matches the one in the block descriptor.

            marko Marko Mäkelä added a comment - Thank you. The branch that you tested also included quite a bit of buf_block_t refactoring related to MDEV-26827 , and the rarer assertion definitely is related to that (moving buf_block_t::lock to buf_page_t::lock ). This change basically added a debug assertion assertion to mtr_t::page_lock() , checking that the page identifier in the buffer matches the one in the block descriptor.

            A motivation for implementing this debug check was the mystery MDEV-12434, which might finally have found its explanation in MDEV-31347.

            marko Marko Mäkelä added a comment - A motivation for implementing this debug check was the mystery MDEV-12434 , which might finally have found its explanation in MDEV-31347 .

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.