[MDEV-12699] Improve crash recovery of corrupted data pages Created: 2017-05-05  Updated: 2023-11-27  Resolved: 2019-04-17

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 5.5, 10.0, 10.1, 10.2, 10.3, 10.4, 10.5
Fix Version/s: 10.2.24, 10.3.15, 10.4.5

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: affects-tests

Issue Links:
Blocks
blocks MDEV-19738 Doublewrite buffer is unnecessarily u... Closed
is blocked by MDEV-19241 InnoDB fails to write MLOG_INDEX_LOAD... Closed
Duplicate
is duplicated by MDEV-12898 encryption.innodb-redo-badkey failed ... Closed
is duplicated by MDEV-14079 encryption.innodb-force-corrupt fail... Closed
Problem/Incident
causes MDEV-20688 Recovery crashes after unnecessarily ... Closed
Relates
relates to MDEV-12700 Allow innodb_read_only startup withou... Closed
relates to MDEV-12750 Fix crash recovery of key rotation Closed
relates to MDEV-15528 Avoid writing freed InnoDB pages Closed
relates to MDEV-18976 Implement a CHECKSUM redo log record ... Closed
relates to MDEV-19541 InnoDB crashes when trying to recover... Closed
relates to MDEV-19586 Replace recv_sys_t::addr_hash with a ... Closed
relates to MDEV-19739 Avoid unnecessary page reads on recov... Closed
relates to MDEV-20755 InnoDB: Database page corruption on d... Closed
relates to MDEV-21572 buf_page_get_gen() should apply buffe... Closed
relates to MDEV-23971 add the ability to fix corrupted page... Closed
relates to MDEV-10217 innodb.innodb_bug59641 fails sporadic... Closed
relates to MDEV-11125 Introduce a reduced doublewrite mode,... Closed
relates to MDEV-11808 innodb.innodb_bug14147491 failed in b... Closed
relates to MDEV-12253 Buffer pool blocks are accessed after... Closed
relates to MDEV-12353 Efficient InnoDB redo log record format Closed
relates to MDEV-12627 innodb.innodb_bug14147491 does not do... Closed
relates to MDEV-13823 innodb.log_data_file_size fails on bu... Closed
relates to MDEV-13872 innodb.innodb-64k-crash failed in bui... Closed
relates to MDEV-13893 encryption.innodb-redo-badkey failed ... Closed
relates to MDEV-14425 Change the InnoDB redo log format to ... Closed
relates to MDEV-14481 Execute InnoDB crash recovery in the ... Closed
relates to MDEV-16941 innodb_gis.kill_server failed in buil... Closed
relates to MDEV-19335 Removal of encrypted from buf_page_t Closed
relates to MDEV-19374 innodb.table_flags failed in buildbot... Closed
Sprint: 10.2.11

 Description   

If a page cannot be decrypted (or read) during crash recovery, InnoDB should cleanly abort the startup. If innodb_force_recovery is specified, we should ignore the problematic page and apply redo log to other pages.

If there is a MLOG_INIT_FILE_PAGE or MLOG_ZIP_PAGE_COMPRESS record for a corrupted page, we can always safely ignore the previous page contents and apply the redo log. There is no need to read a page from the data file when applying MLOG_INIT_FILE_PAGE or MLOG_ZIP_PAGE_COMPRESS.

Currently in 10.2, the test encryption.innodb-redo-badkey randomly fails, possibly because of this. Also, the test innodb.innodb_bug14147491 needs to be extended.



 Comments   
Comment by Marko Mäkelä [ 2017-09-07 ]

On a related note, if there is unencrypted redo log for an encrypted page, and encryption has not been set up, the crash recovery will hang. This can be demonstrated by modifying a test (tested in 10.2 and 10.3):

diff --git a/mysql-test/suite/encryption/t/innodb_encrypt_log.test b/mysql-test/suite/encryption/t/innodb_encrypt_log.test
index 4bc556ee1d2..7a7f3acf07a 100644
--- a/mysql-test/suite/encryption/t/innodb_encrypt_log.test
+++ b/mysql-test/suite/encryption/t/innodb_encrypt_log.test
@@ -88,7 +88,7 @@ INSERT INTO t0 VALUES(NULL, 5, 5, 'public', 'gossip');
 -- let SEARCH_FILE=$MYSQLD_DATADIR/test/t0.ibd
 -- source include/search_pattern_in_file.inc
 
---let $restart_parameters=
+--let $restart_parameters=--skip-innodb-encrypt-log --skip-file-key-management --skip-innodb-encrypt-tables
 --source include/start_mysqld.inc
 
 SELECT COUNT(*) FROM t0;

The startup will get stuck in recv_apply_hashed_log_recs(last_batch=true):

	while (recv_sys->n_addrs != 0) {
		bool abort = recv_sys->found_corrupt_log;
 
		mutex_exit(&(recv_sys->mutex));
 
		if (abort) {
			return;
		}
 
		os_thread_sleep(500000);
 
		mutex_enter(&(recv_sys->mutex));
	}

The flag recv_sys->found_corrupt_fs holds here.
The correct course of action (in the I/O threads) would seem to be to remove the unreadable pages from recv_sys, so that recv_sys->n_addrs would eventually reach 0.
Additionally, if recv_sys->found_corrupt_fs was set and innodb_force_recovery=0 (the default) is in effect, InnoDB startup should be aborted.

Comment by Marko Mäkelä [ 2017-09-18 ]

The test innodb.log_data_file_size can be used to repeat the problem, as noted in my follow-up fix to MDEV-12893

Comment by Marko Mäkelä [ 2017-10-12 ]

Once the recovery has been made robust, we could also disable the doublewrite buffering for pages that were recently (re)initialized. We must guarantee that at all times, the doublewrite buffer will be used during a page flush unless a MLOG_INIT_FILE_PAGE or MLOG_ZIP_PAGE_COMPRESS record was written for the page after the latest checkpoint. The fuzzy checkpoint mechanism could make this challenging.

Comment by Marko Mäkelä [ 2018-11-20 ]

While fixing MDEV-17680, it occurred to me that in recv_apply_hashed_log_recs() we could easily filter out any log records that precede MLOG_INIT_FILE_PAGE2. If such a record is present for a page, we can simply create the page and apply the log records. There is absolutely no need to read the page, and possibly unnecessarily fail if the to-be-ignored page was corrupted.

Comment by Marko Mäkelä [ 2018-12-04 ]

Testing the fix should include running ./mtr encryption.innodb-redo-badkey, which is currently disabled due to MDEV-13893.

Comment by Thirunarayanan Balathandayuthapani [ 2018-12-13 ]

--source include/have_file_key_management.inc
 
CREATE TABLE t1(c VARCHAR(128)) ENGINE INNODB, encrypted=yes;
insert into t1 select repeat('a',100);
 
let $restart_parameters = --innodb_flush_log_at_trx_commit=1;
--source include/restart_mysqld.inc
 
let $MYSQLD_DATADIR=`select @@datadir`;
let t1_IBD = $MYSQLD_DATADIR/test/t1.ibd;
 
insert into t1 select repeat('b', 100);
 
--source include/kill_mysqld.inc
 
--echo # Corrupt the table
 
perl;
use strict;
use warnings;
use Fcntl qw(:DEFAULT :seek);
 
my $ibd_file = $ENV{'t1_IBD'};
 
my $chunk;
my $len;
 
sysopen IBD_FILE, $ibd_file, O_RDWR || die "Unable to open $ibd_file";
sysseek IBD_FILE, 16384 * 3, SEEK_CUR;
$chunk = '\xAA\xAA\xAA\xAA';
syswrite IBD_FILE, $chunk, 4;
 
close IBD_FILE;
EOF
 
#--exec $INNOCHECKSUM $MYSQLD_DATADIR/test/t1.ibd
 
--source include/start_mysqld.inc
 
SELECT * FROM t1;

The above test case will make the server recovery hang because the read page is corrupted and it fails to decrypt.

Comment by Marko Mäkelä [ 2019-02-08 ]

The MDEV-15528 record MLOG_INIT_FREE_PAGE should be handled in a similar way as MLOG_INIT_FILE_PAGE2 or MLOG_ZIP_PAGE_COMPRESS that is encountered in a complete mini-transaction: discard any preceding records for the page, and do not load the page to the buffer pool.

Comment by Marko Mäkelä [ 2019-04-04 ]

I fixed some concurrency issues that caused intermittent recovery failures, and cleaned up both the existing and new code. I pushed to buildbot for final testing, and expect to push to 10.2 tomorrow.

Comment by Marko Mäkelä [ 2019-04-07 ]

I have pushed preparatory refactoring and cleanup to 10.2 and merged to 10.3.

A merge to 10.4 along with a test of MLOG_INIT_FREE_PAGE revealed corruption, which I think we must fix before pushing the actual fix to main trees. Also, some corruption became more frequent in mariabackup tests. Hopefully these corruptions are related.

Generated at Thu Feb 08 07:59:45 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.