Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-12699

Improve crash recovery of corrupted data pages

Details

    • 10.2.11

    Description

      If a page cannot be decrypted (or read) during crash recovery, InnoDB should cleanly abort the startup. If innodb_force_recovery is specified, we should ignore the problematic page and apply redo log to other pages.

      If there is a MLOG_INIT_FILE_PAGE or MLOG_ZIP_PAGE_COMPRESS record for a corrupted page, we can always safely ignore the previous page contents and apply the redo log. There is no need to read a page from the data file when applying MLOG_INIT_FILE_PAGE or MLOG_ZIP_PAGE_COMPRESS.

      Currently in 10.2, the test encryption.innodb-redo-badkey randomly fails, possibly because of this. Also, the test innodb.innodb_bug14147491 needs to be extended.

      Attachments

        Issue Links

          Activity

            On a related note, if there is unencrypted redo log for an encrypted page, and encryption has not been set up, the crash recovery will hang. This can be demonstrated by modifying a test (tested in 10.2 and 10.3):

            diff --git a/mysql-test/suite/encryption/t/innodb_encrypt_log.test b/mysql-test/suite/encryption/t/innodb_encrypt_log.test
            index 4bc556ee1d2..7a7f3acf07a 100644
            --- a/mysql-test/suite/encryption/t/innodb_encrypt_log.test
            +++ b/mysql-test/suite/encryption/t/innodb_encrypt_log.test
            @@ -88,7 +88,7 @@ INSERT INTO t0 VALUES(NULL, 5, 5, 'public', 'gossip');
             -- let SEARCH_FILE=$MYSQLD_DATADIR/test/t0.ibd
             -- source include/search_pattern_in_file.inc
             
            ---let $restart_parameters=
            +--let $restart_parameters=--skip-innodb-encrypt-log --skip-file-key-management --skip-innodb-encrypt-tables
             --source include/start_mysqld.inc
             
             SELECT COUNT(*) FROM t0;
            

            The startup will get stuck in recv_apply_hashed_log_recs(last_batch=true):

            	while (recv_sys->n_addrs != 0) {
            		bool abort = recv_sys->found_corrupt_log;
             
            		mutex_exit(&(recv_sys->mutex));
             
            		if (abort) {
            			return;
            		}
             
            		os_thread_sleep(500000);
             
            		mutex_enter(&(recv_sys->mutex));
            	}
            

            The flag recv_sys->found_corrupt_fs holds here.
            The correct course of action (in the I/O threads) would seem to be to remove the unreadable pages from recv_sys, so that recv_sys->n_addrs would eventually reach 0.
            Additionally, if recv_sys->found_corrupt_fs was set and innodb_force_recovery=0 (the default) is in effect, InnoDB startup should be aborted.

            marko Marko Mäkelä added a comment - On a related note, if there is unencrypted redo log for an encrypted page, and encryption has not been set up, the crash recovery will hang. This can be demonstrated by modifying a test (tested in 10.2 and 10.3): diff --git a/mysql-test/suite/encryption/t/innodb_encrypt_log.test b/mysql-test/suite/encryption/t/innodb_encrypt_log.test index 4bc556ee1d2..7a7f3acf07a 100644 --- a/mysql-test/suite/encryption/t/innodb_encrypt_log.test +++ b/mysql-test/suite/encryption/t/innodb_encrypt_log.test @@ -88,7 +88,7 @@ INSERT INTO t0 VALUES(NULL, 5, 5, 'public', 'gossip'); -- let SEARCH_FILE=$MYSQLD_DATADIR/test/t0.ibd -- source include/search_pattern_in_file.inc ---let $restart_parameters= +--let $restart_parameters=--skip-innodb-encrypt-log --skip-file-key-management --skip-innodb-encrypt-tables --source include/start_mysqld.inc SELECT COUNT(*) FROM t0; The startup will get stuck in recv_apply_hashed_log_recs(last_batch=true): while (recv_sys->n_addrs != 0) { bool abort = recv_sys->found_corrupt_log;   mutex_exit(&(recv_sys->mutex));   if (abort) { return; }   os_thread_sleep(500000);   mutex_enter(&(recv_sys->mutex)); } The flag recv_sys->found_corrupt_fs holds here. The correct course of action (in the I/O threads) would seem to be to remove the unreadable pages from recv_sys, so that recv_sys->n_addrs would eventually reach 0. Additionally, if recv_sys->found_corrupt_fs was set and innodb_force_recovery=0 (the default) is in effect, InnoDB startup should be aborted.

            The test innodb.log_data_file_size can be used to repeat the problem, as noted in my follow-up fix to MDEV-12893

            marko Marko Mäkelä added a comment - The test innodb.log_data_file_size can be used to repeat the problem, as noted in my follow-up fix to MDEV-12893

            Once the recovery has been made robust, we could also disable the doublewrite buffering for pages that were recently (re)initialized. We must guarantee that at all times, the doublewrite buffer will be used during a page flush unless a MLOG_INIT_FILE_PAGE or MLOG_ZIP_PAGE_COMPRESS record was written for the page after the latest checkpoint. The fuzzy checkpoint mechanism could make this challenging.

            marko Marko Mäkelä added a comment - Once the recovery has been made robust, we could also disable the doublewrite buffering for pages that were recently (re)initialized. We must guarantee that at all times, the doublewrite buffer will be used during a page flush unless a MLOG_INIT_FILE_PAGE or MLOG_ZIP_PAGE_COMPRESS record was written for the page after the latest checkpoint. The fuzzy checkpoint mechanism could make this challenging.

            While fixing MDEV-17680, it occurred to me that in recv_apply_hashed_log_recs() we could easily filter out any log records that precede MLOG_INIT_FILE_PAGE2. If such a record is present for a page, we can simply create the page and apply the log records. There is absolutely no need to read the page, and possibly unnecessarily fail if the to-be-ignored page was corrupted.

            marko Marko Mäkelä added a comment - While fixing MDEV-17680 , it occurred to me that in recv_apply_hashed_log_recs() we could easily filter out any log records that precede MLOG_INIT_FILE_PAGE2 . If such a record is present for a page, we can simply create the page and apply the log records. There is absolutely no need to read the page, and possibly unnecessarily fail if the to-be-ignored page was corrupted.

            Testing the fix should include running ./mtr encryption.innodb-redo-badkey, which is currently disabled due to MDEV-13893.

            marko Marko Mäkelä added a comment - Testing the fix should include running ./mtr encryption.innodb-redo-badkey , which is currently disabled due to MDEV-13893 .

            --source include/have_file_key_management.inc
             
            CREATE TABLE t1(c VARCHAR(128)) ENGINE INNODB, encrypted=yes;
            insert into t1 select repeat('a',100);
             
            let $restart_parameters = --innodb_flush_log_at_trx_commit=1;
            --source include/restart_mysqld.inc
             
            let $MYSQLD_DATADIR=`select @@datadir`;
            let t1_IBD = $MYSQLD_DATADIR/test/t1.ibd;
             
            insert into t1 select repeat('b', 100);
             
            --source include/kill_mysqld.inc
             
            --echo # Corrupt the table
             
            perl;
            use strict;
            use warnings;
            use Fcntl qw(:DEFAULT :seek);
             
            my $ibd_file = $ENV{'t1_IBD'};
             
            my $chunk;
            my $len;
             
            sysopen IBD_FILE, $ibd_file, O_RDWR || die "Unable to open $ibd_file";
            sysseek IBD_FILE, 16384 * 3, SEEK_CUR;
            $chunk = '\xAA\xAA\xAA\xAA';
            syswrite IBD_FILE, $chunk, 4;
             
            close IBD_FILE;
            EOF
             
            #--exec $INNOCHECKSUM $MYSQLD_DATADIR/test/t1.ibd
             
            --source include/start_mysqld.inc
             
            SELECT * FROM t1;
            

            The above test case will make the server recovery hang because the read page is corrupted and it fails to decrypt.

            thiru Thirunarayanan Balathandayuthapani added a comment - --source include/have_file_key_management.inc   CREATE TABLE t1(c VARCHAR(128)) ENGINE INNODB, encrypted=yes; insert into t1 select repeat('a',100);   let $restart_parameters = --innodb_flush_log_at_trx_commit=1; --source include/restart_mysqld.inc   let $MYSQLD_DATADIR=`select @@datadir`; let t1_IBD = $MYSQLD_DATADIR/test/t1.ibd;   insert into t1 select repeat('b', 100);   --source include/kill_mysqld.inc   --echo # Corrupt the table   perl; use strict; use warnings; use Fcntl qw(:DEFAULT :seek);   my $ibd_file = $ENV{'t1_IBD'};   my $chunk; my $len;   sysopen IBD_FILE, $ibd_file, O_RDWR || die "Unable to open $ibd_file"; sysseek IBD_FILE, 16384 * 3, SEEK_CUR; $chunk = '\xAA\xAA\xAA\xAA'; syswrite IBD_FILE, $chunk, 4;   close IBD_FILE; EOF   #--exec $INNOCHECKSUM $MYSQLD_DATADIR/test/t1.ibd   --source include/start_mysqld.inc   SELECT * FROM t1; The above test case will make the server recovery hang because the read page is corrupted and it fails to decrypt.

            The MDEV-15528 record MLOG_INIT_FREE_PAGE should be handled in a similar way as MLOG_INIT_FILE_PAGE2 or MLOG_ZIP_PAGE_COMPRESS that is encountered in a complete mini-transaction: discard any preceding records for the page, and do not load the page to the buffer pool.

            marko Marko Mäkelä added a comment - The MDEV-15528 record MLOG_INIT_FREE_PAGE should be handled in a similar way as MLOG_INIT_FILE_PAGE2 or MLOG_ZIP_PAGE_COMPRESS that is encountered in a complete mini-transaction: discard any preceding records for the page, and do not load the page to the buffer pool.

            I fixed some concurrency issues that caused intermittent recovery failures, and cleaned up both the existing and new code. I pushed to buildbot for final testing, and expect to push to 10.2 tomorrow.

            marko Marko Mäkelä added a comment - I fixed some concurrency issues that caused intermittent recovery failures, and cleaned up both the existing and new code. I pushed to buildbot for final testing, and expect to push to 10.2 tomorrow.

            I have pushed preparatory refactoring and cleanup to 10.2 and merged to 10.3.

            A merge to 10.4 along with a test of MLOG_INIT_FREE_PAGE revealed corruption, which I think we must fix before pushing the actual fix to main trees. Also, some corruption became more frequent in mariabackup tests. Hopefully these corruptions are related.

            marko Marko Mäkelä added a comment - I have pushed preparatory refactoring and cleanup to 10.2 and merged to 10.3. A merge to 10.4 along with a test of MLOG_INIT_FREE_PAGE revealed corruption , which I think we must fix before pushing the actual fix to main trees. Also, some corruption became more frequent in mariabackup tests. Hopefully these corruptions are related.

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.