[MDEV-12699] Improve crash recovery of corrupted data pages - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 5.5(EOL), 10.0(EOL), 10.1(EOL), 10.2(EOL), 10.3(EOL), 10.4(EOL), 10.5
Fix Version/s: 10.2.24, 10.3.15, 10.4.5
Component/s: Storage Engine - InnoDB
Labels:
- affects-tests

Sprint:
10.2.11

Description

If a page cannot be decrypted (or read) during crash recovery, InnoDB should cleanly abort the startup. If innodb_force_recovery is specified, we should ignore the problematic page and apply redo log to other pages.

If there is a MLOG_INIT_FILE_PAGE or MLOG_ZIP_PAGE_COMPRESS record for a corrupted page, we can always safely ignore the previous page contents and apply the redo log. There is no need to read a page from the data file when applying MLOG_INIT_FILE_PAGE or MLOG_ZIP_PAGE_COMPRESS.

Currently in 10.2, the test encryption.innodb-redo-badkey randomly fails, possibly because of this. Also, the test innodb.innodb_bug14147491 needs to be extended.

Attachments

Issue Links

blocks

MDEV-19738 Doublewrite buffer is unnecessarily used for newly (re)initialized pages

Closed

causes

MDEV-20688 Recovery crashes after unnecessarily reading a corrupted page

Closed

is blocked by

MDEV-19241 InnoDB fails to write MLOG_INDEX_LOAD upon completing ALTER TABLE

Closed

is duplicated by

MDEV-12898 encryption.innodb-redo-badkey failed in buildbot, server hung on startup

Closed

MDEV-14079 encryption.innodb-force-corrupt failed in buildbot

Closed

relates to

MDEV-12700 Allow innodb_read_only startup without prior slow shutdown

Closed

MDEV-12750 Fix crash recovery of key rotation

Closed

MDEV-15528 Avoid writing freed InnoDB pages

Closed

MDEV-18976 Implement a CHECKSUM redo log record for improved validation

Closed

MDEV-19541 InnoDB crashes when trying to recover a corrupted page

Closed

MDEV-19586 Replace recv_sys_t::addr_hash with a std::map

Closed

MDEV-19739 Avoid unnecessary page reads on recovery with innodb_log_optimize_ddl=ON

Closed

MDEV-20755 InnoDB: Database page corruption on disk or a failed file read of tablespace upon prepare of mariabackup incremental backup

Closed

MDEV-21572 buf_page_get_gen() should apply buffered page initialized redo log during recovery

Closed

MDEV-23971 add the ability to fix corrupted pages on --prepare

Closed

MDEV-10217 innodb.innodb_bug59641 fails sporadically in buildbot: InnoDB: Failing assertion: current_rec != insert_rec in file page0cur.c line 1052

Closed

MDEV-11125 Introduce a reduced doublewrite mode, handling error detection only

Closed

MDEV-11808 innodb.innodb_bug14147491 failed in buildbot

Closed

MDEV-12253 Buffer pool blocks are accessed after they have been freed

Closed

MDEV-12353 Efficient InnoDB redo log record format

Closed

MDEV-12627 innodb.innodb_bug14147491 does not do proper cleanup

Closed

MDEV-13823 innodb.log_data_file_size fails on buildbot with warnings intermittently

Closed

MDEV-13872 innodb.innodb-64k-crash failed in buildbot with error on startup and SIGSEGV

Closed

MDEV-13893 encryption.innodb-redo-badkey failed in buildbot with page cannot be decrypted

Closed

MDEV-14425 Change the InnoDB redo log format to reduce write amplification

Closed

MDEV-14481 Execute InnoDB crash recovery in the background

Closed

MDEV-16941 innodb_gis.kill_server failed in buildbot with checksum mismatch in datafile

Closed

MDEV-19335 Removal of encrypted from buf_page_t

Closed

MDEV-19374 innodb.table_flags failed in buildbot with Server failed to restart

Closed

(24 relates to)

Activity

Ascending order - Click to sort in descending order

Marko Mäkelä added a comment - 2017-09-07 06:36

On a related note, if there is unencrypted redo log for an encrypted page, and encryption has not been set up, the crash recovery will hang. This can be demonstrated by modifying a test (tested in 10.2 and 10.3):

diff --git a/mysql-test/suite/encryption/t/innodb_encrypt_log.test b/mysql-test/suite/encryption/t/innodb_encrypt_log.test

index 4bc556ee1d2..7a7f3acf07a 100644

--- a/mysql-test/suite/encryption/t/innodb_encrypt_log.test

+++ b/mysql-test/suite/encryption/t/innodb_encrypt_log.test

@@ -88,7 +88,7 @@ INSERT INTO t0 VALUES(NULL, 5, 5, 'public', 'gossip');

 -- let SEARCH_FILE=$MYSQLD_DATADIR/test/t0.ibd

 -- source include/search_pattern_in_file.inc

---let $restart_parameters=

+--let $restart_parameters=--skip-innodb-encrypt-log --skip-file-key-management --skip-innodb-encrypt-tables

 --source include/start_mysqld.inc

 SELECT COUNT(*) FROM t0;

The startup will get stuck in recv_apply_hashed_log_recs(last_batch=true):

	while (recv_sys->n_addrs != 0) {

		bool abort = recv_sys->found_corrupt_log;

		mutex_exit(&(recv_sys->mutex));

		if (abort) {

			return;

		os_thread_sleep(500000);

		mutex_enter(&(recv_sys->mutex));

The flag recv_sys->found_corrupt_fs holds here.
The correct course of action (in the I/O threads) would seem to be to remove the unreadable pages from recv_sys, so that recv_sys->n_addrs would eventually reach 0.
Additionally, if recv_sys->found_corrupt_fs was set and innodb_force_recovery=0 (the default) is in effect, InnoDB startup should be aborted.

Marko Mäkelä added a comment - 2017-09-07 06:36 On a related note, if there is unencrypted redo log for an encrypted page, and encryption has not been set up, the crash recovery will hang. This can be demonstrated by modifying a test (tested in 10.2 and 10.3): diff --git a/mysql-test/suite/encryption/t/innodb_encrypt_log.test b/mysql-test/suite/encryption/t/innodb_encrypt_log.test index 4bc556ee1d2..7a7f3acf07a 100644 --- a/mysql-test/suite/encryption/t/innodb_encrypt_log.test +++ b/mysql-test/suite/encryption/t/innodb_encrypt_log.test @@ -88,7 +88,7 @@ INSERT INTO t0 VALUES(NULL, 5, 5, 'public', 'gossip'); -- let SEARCH_FILE=$MYSQLD_DATADIR/test/t0.ibd -- source include/search_pattern_in_file.inc ---let $restart_parameters= +--let $restart_parameters=--skip-innodb-encrypt-log --skip-file-key-management --skip-innodb-encrypt-tables --source include/start_mysqld.inc SELECT COUNT(*) FROM t0; The startup will get stuck in recv_apply_hashed_log_recs(last_batch=true): while (recv_sys->n_addrs != 0) { bool abort = recv_sys->found_corrupt_log; mutex_exit(&(recv_sys->mutex)); if (abort) { return; } os_thread_sleep(500000); mutex_enter(&(recv_sys->mutex)); } The flag recv_sys->found_corrupt_fs holds here. The correct course of action (in the I/O threads) would seem to be to remove the unreadable pages from recv_sys, so that recv_sys->n_addrs would eventually reach 0. Additionally, if recv_sys->found_corrupt_fs was set and innodb_force_recovery=0 (the default) is in effect, InnoDB startup should be aborted.

Marko Mäkelä added a comment - 2017-09-18 14:21

The test innodb.log_data_file_size can be used to repeat the problem, as noted in my follow-up fix to ~~MDEV-12893~~

Marko Mäkelä added a comment - 2017-09-18 14:21 The test innodb.log_data_file_size can be used to repeat the problem, as noted in my follow-up fix to MDEV-12893

Marko Mäkelä added a comment - 2017-10-12 10:05

Once the recovery has been made robust, we could also disable the doublewrite buffering for pages that were recently (re)initialized. We must guarantee that at all times, the doublewrite buffer will be used during a page flush unless a MLOG_INIT_FILE_PAGE or MLOG_ZIP_PAGE_COMPRESS record was written for the page after the latest checkpoint. The fuzzy checkpoint mechanism could make this challenging.

Marko Mäkelä added a comment - 2017-10-12 10:05 Once the recovery has been made robust, we could also disable the doublewrite buffering for pages that were recently (re)initialized. We must guarantee that at all times, the doublewrite buffer will be used during a page flush unless a MLOG_INIT_FILE_PAGE or MLOG_ZIP_PAGE_COMPRESS record was written for the page after the latest checkpoint. The fuzzy checkpoint mechanism could make this challenging.

Marko Mäkelä added a comment - 2018-11-20 14:15

While fixing ~~MDEV-17680~~, it occurred to me that in recv_apply_hashed_log_recs() we could easily filter out any log records that precede MLOG_INIT_FILE_PAGE2. If such a record is present for a page, we can simply create the page and apply the log records. There is absolutely no need to read the page, and possibly unnecessarily fail if the to-be-ignored page was corrupted.

Marko Mäkelä added a comment - 2018-11-20 14:15 While fixing MDEV-17680 , it occurred to me that in recv_apply_hashed_log_recs() we could easily filter out any log records that precede MLOG_INIT_FILE_PAGE2 . If such a record is present for a page, we can simply create the page and apply the log records. There is absolutely no need to read the page, and possibly unnecessarily fail if the to-be-ignored page was corrupted.

Marko Mäkelä added a comment - 2018-12-04 13:32

Testing the fix should include running ./mtr encryption.innodb-redo-badkey, which is currently disabled due to ~~MDEV-13893~~.

Marko Mäkelä added a comment - 2018-12-04 13:32 Testing the fix should include running ./mtr encryption.innodb-redo-badkey , which is currently disabled due to MDEV-13893 .

Thirunarayanan Balathandayuthapani added a comment - 2018-12-13 10:32

--source include/have_file_key_management.inc

CREATE TABLE t1(c VARCHAR(128)) ENGINE INNODB, encrypted=yes;

insert into t1 select repeat('a',100);

let $restart_parameters = --innodb_flush_log_at_trx_commit=1;

--source include/restart_mysqld.inc

let $MYSQLD_DATADIR=`select @@datadir`;

let t1_IBD = $MYSQLD_DATADIR/test/t1.ibd;

insert into t1 select repeat('b', 100);

--source include/kill_mysqld.inc

--echo # Corrupt the table

perl;

use strict;

use warnings;

use Fcntl qw(:DEFAULT :seek);

my $ibd_file = $ENV{'t1_IBD'};

my $chunk;

my $len;

sysopen IBD_FILE, $ibd_file, O_RDWR || die "Unable to open $ibd_file";

sysseek IBD_FILE, 16384 * 3, SEEK_CUR;

$chunk = '\xAA\xAA\xAA\xAA';

syswrite IBD_FILE, $chunk, 4;

close IBD_FILE;

EOF

#--exec $INNOCHECKSUM $MYSQLD_DATADIR/test/t1.ibd

--source include/start_mysqld.inc

SELECT * FROM t1;

The above test case will make the server recovery hang because the read page is corrupted and it fails to decrypt.

Thirunarayanan Balathandayuthapani added a comment - 2018-12-13 10:32 --source include/have_file_key_management.inc CREATE TABLE t1(c VARCHAR(128)) ENGINE INNODB, encrypted=yes; insert into t1 select repeat('a',100); let $restart_parameters = --innodb_flush_log_at_trx_commit=1; --source include/restart_mysqld.inc let $MYSQLD_DATADIR=`select @@datadir`; let t1_IBD = $MYSQLD_DATADIR/test/t1.ibd; insert into t1 select repeat('b', 100); --source include/kill_mysqld.inc --echo # Corrupt the table perl; use strict; use warnings; use Fcntl qw(:DEFAULT :seek); my $ibd_file = $ENV{'t1_IBD'}; my $chunk; my $len; sysopen IBD_FILE, $ibd_file, O_RDWR || die "Unable to open $ibd_file"; sysseek IBD_FILE, 16384 * 3, SEEK_CUR; $chunk = '\xAA\xAA\xAA\xAA'; syswrite IBD_FILE, $chunk, 4; close IBD_FILE; EOF #--exec $INNOCHECKSUM $MYSQLD_DATADIR/test/t1.ibd --source include/start_mysqld.inc SELECT * FROM t1; The above test case will make the server recovery hang because the read page is corrupted and it fails to decrypt.

Marko Mäkelä added a comment - 2019-02-08 17:21

The ~~MDEV-15528~~ record MLOG_INIT_FREE_PAGE should be handled in a similar way as MLOG_INIT_FILE_PAGE2 or MLOG_ZIP_PAGE_COMPRESS that is encountered in a complete mini-transaction: discard any preceding records for the page, and do not load the page to the buffer pool.

Marko Mäkelä added a comment - 2019-02-08 17:21 The MDEV-15528 record MLOG_INIT_FREE_PAGE should be handled in a similar way as MLOG_INIT_FILE_PAGE2 or MLOG_ZIP_PAGE_COMPRESS that is encountered in a complete mini-transaction: discard any preceding records for the page, and do not load the page to the buffer pool.

Marko Mäkelä added a comment - 2019-04-04 19:31

I fixed some concurrency issues that caused intermittent recovery failures, and cleaned up both the existing and new code. I pushed to buildbot for final testing, and expect to push to 10.2 tomorrow.

Marko Mäkelä added a comment - 2019-04-04 19:31 I fixed some concurrency issues that caused intermittent recovery failures, and cleaned up both the existing and new code. I pushed to buildbot for final testing, and expect to push to 10.2 tomorrow.

Marko Mäkelä added a comment - 2019-04-07 09:51

I have pushed preparatory refactoring and cleanup to 10.2 and merged to 10.3.

A merge to 10.4 along with a test of MLOG_INIT_FREE_PAGE revealed corruption, which I think we must fix before pushing the actual fix to main trees. Also, some corruption became more frequent in mariabackup tests. Hopefully these corruptions are related.

Marko Mäkelä added a comment - 2019-04-07 09:51 I have pushed preparatory refactoring and cleanup to 10.2 and merged to 10.3. A merge to 10.4 along with a test of MLOG_INIT_FREE_PAGE revealed corruption , which I think we must fix before pushing the actual fix to main trees. Also, some corruption became more frequent in mariabackup tests. Hopefully these corruptions are related.

People

Assignee:: Marko Mäkelä

Reporter:: Marko Mäkelä

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 2017-05-05 07:55

Updated:: 2024-09-06 10:01

Resolved:: 2019-04-17 14:35

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server