[MDEV-26933] InnoDB fails to detect page number mismatch Created: 2021-10-29  Updated: 2023-05-29  Resolved: 2021-11-03

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.2, 10.3, 10.4, 10.5, 10.6
Fix Version/s: 10.6.6

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: recovery

Issue Links:
Relates
relates to MDEV-12434 Database page corruptions Closed
relates to MDEV-26966 The parameter innodb_force_load_corru... Closed
relates to MDEV-31347 fil_ibd_create() may hijack the file ... Closed

 Description   

InnoDB is not checking that the page number in a buffer pool page is correct. The following would implement such a check:

diff --git a/storage/innobase/mtr/mtr0mtr.cc b/storage/innobase/mtr/mtr0mtr.cc
--- a/storage/innobase/mtr/mtr0mtr.cc
+++ b/storage/innobase/mtr/mtr0mtr.cc
@@ -1196,6 +1196,8 @@ void mtr_t::page_lock(buf_block_t *block, ulint rw_latch)
 #endif /* BTR_CUR_HASH_ADAPT */
 
 done:
+  ut_ad(page_id_t(page_get_space_id(block->page.frame),
+                  page_get_page_no(block->page.frame)) == block->page.id());
   memo_push(block, fix_type);
 }
 

This would be tripped in a number of tests that inject corruption to data files. Those tests will have to be adjusted.

Furthermore, operations like DROP TABLE need to be fixed so that they will not attempt to load the table definition at all. This would avoids a crash in a test that corrupts the clustered index root page number of a table. The following fix for that test would be needed in any case:

diff --git a/mysql-test/suite/innodb/t/page_id_innochecksum.test b/mysql-test/suite/innodb/t/page_id_innochecksum.test
index 2726b3254da..f5166018dd1 100644
--- a/mysql-test/suite/innodb/t/page_id_innochecksum.test
+++ b/mysql-test/suite/innodb/t/page_id_innochecksum.test
@@ -61,5 +61,9 @@ let SEARCH_PATTERN=page id mismatch;
 --source include/search_pattern_in_file.inc
 
 --remove_file $resultlog
+# prevent purge from crashing on page ID mismatch
+let $restart_parameters=--innodb-force-recovery=2;
 --source include/start_mysqld.inc
 drop table t1;
+let $restart_parameters=;
+--source include/restart_mysqld.inc



 Comments   
Comment by Marko Mäkelä [ 2021-11-02 ]

I slightly revised this based on a test with a data directory that was archived for MDEV-25683. That test case would still cause a hang of the DDL log recovery if the server is started up with innodb_force_recovery=4. My aim was to get rid of these useless messages, which would be output because rollback of a recovered DDL transaction was disabled by innodb_force_recovery=4:

2021-05-15  1:39:58 0 [ERROR] InnoDB: Trying to load index `PRIMARY` for table `test`.`#sql-alter-27122f-10`, but the index tree has been freed!
2021-05-15  1:39:58 0 [Note] InnoDB: Index is corrupt but forcing load into data dictionary

While cleaning up the code, I realized that the InnoDB DDL rewrite in MariaDB Server 10.6 made obsolete the read-only startup parameter innodb_force_load_corrupted. I think that we should deprecate and ignore that parameter, and remove it in later versions.

Comment by Matthias Leich [ 2021-11-03 ]

origin/bb-10.6-marko be3425e1b48a6d681a82a5dd039f26f037af16ac 2021-11-03T15:03:19+02:00
performed quite good with ~ 2000 RQG test for broad range coverage.
There was no Crashrecovery failure of interest.
Two new errors were observed
- 4 times
   mysqld:storage/innobase/btr/btr0sea.cc:415: bool btr_search_update_block_hash_info(btr_search_t*, buf_block_t*): Assertion `buf_pool.is_uncompressed(block)' failed.
   which are per Marko:  Some buffer pool resizing combined with AHI problem.
- 1 time
   mysqld: storage/innobase/include/srw_lock.h:170: void ssux_lock_impl<spinloop>::init() [with bool spinloop = true]: Assertion `is_vacant()' failed.
   during ANALYZE PARTITION.
   But partitioning is known to be error prone even on non development source trees.
Both bad effects are quite probably not related to MDEV-26933.
They are most probably by-catch found because of modifications of the test battery.
   

Comment by Marko Mäkelä [ 2021-11-03 ]

Thank you. The branch that you tested also included quite a bit of buf_block_t refactoring related to MDEV-26827, and the rarer assertion definitely is related to that (moving buf_block_t::lock to buf_page_t::lock). This change basically added a debug assertion assertion to mtr_t::page_lock(), checking that the page identifier in the buffer matches the one in the block descriptor.

Comment by Marko Mäkelä [ 2023-05-29 ]

A motivation for implementing this debug check was the mystery MDEV-12434, which might finally have found its explanation in MDEV-31347.

Generated at Thu Feb 08 09:49:03 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.