Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-12699

Improve crash recovery of corrupted data pages

Details

    • 10.2.11

    Description

      If a page cannot be decrypted (or read) during crash recovery, InnoDB should cleanly abort the startup. If innodb_force_recovery is specified, we should ignore the problematic page and apply redo log to other pages.

      If there is a MLOG_INIT_FILE_PAGE or MLOG_ZIP_PAGE_COMPRESS record for a corrupted page, we can always safely ignore the previous page contents and apply the redo log. There is no need to read a page from the data file when applying MLOG_INIT_FILE_PAGE or MLOG_ZIP_PAGE_COMPRESS.

      Currently in 10.2, the test encryption.innodb-redo-badkey randomly fails, possibly because of this. Also, the test innodb.innodb_bug14147491 needs to be extended.

      Attachments

        Issue Links

          Activity

            Testing the fix should include running ./mtr encryption.innodb-redo-badkey, which is currently disabled due to MDEV-13893.

            marko Marko Mäkelä added a comment - Testing the fix should include running ./mtr encryption.innodb-redo-badkey , which is currently disabled due to MDEV-13893 .

            --source include/have_file_key_management.inc
             
            CREATE TABLE t1(c VARCHAR(128)) ENGINE INNODB, encrypted=yes;
            insert into t1 select repeat('a',100);
             
            let $restart_parameters = --innodb_flush_log_at_trx_commit=1;
            --source include/restart_mysqld.inc
             
            let $MYSQLD_DATADIR=`select @@datadir`;
            let t1_IBD = $MYSQLD_DATADIR/test/t1.ibd;
             
            insert into t1 select repeat('b', 100);
             
            --source include/kill_mysqld.inc
             
            --echo # Corrupt the table
             
            perl;
            use strict;
            use warnings;
            use Fcntl qw(:DEFAULT :seek);
             
            my $ibd_file = $ENV{'t1_IBD'};
             
            my $chunk;
            my $len;
             
            sysopen IBD_FILE, $ibd_file, O_RDWR || die "Unable to open $ibd_file";
            sysseek IBD_FILE, 16384 * 3, SEEK_CUR;
            $chunk = '\xAA\xAA\xAA\xAA';
            syswrite IBD_FILE, $chunk, 4;
             
            close IBD_FILE;
            EOF
             
            #--exec $INNOCHECKSUM $MYSQLD_DATADIR/test/t1.ibd
             
            --source include/start_mysqld.inc
             
            SELECT * FROM t1;
            

            The above test case will make the server recovery hang because the read page is corrupted and it fails to decrypt.

            thiru Thirunarayanan Balathandayuthapani added a comment - --source include/have_file_key_management.inc   CREATE TABLE t1(c VARCHAR(128)) ENGINE INNODB, encrypted=yes; insert into t1 select repeat('a',100);   let $restart_parameters = --innodb_flush_log_at_trx_commit=1; --source include/restart_mysqld.inc   let $MYSQLD_DATADIR=`select @@datadir`; let t1_IBD = $MYSQLD_DATADIR/test/t1.ibd;   insert into t1 select repeat('b', 100);   --source include/kill_mysqld.inc   --echo # Corrupt the table   perl; use strict; use warnings; use Fcntl qw(:DEFAULT :seek);   my $ibd_file = $ENV{'t1_IBD'};   my $chunk; my $len;   sysopen IBD_FILE, $ibd_file, O_RDWR || die "Unable to open $ibd_file"; sysseek IBD_FILE, 16384 * 3, SEEK_CUR; $chunk = '\xAA\xAA\xAA\xAA'; syswrite IBD_FILE, $chunk, 4;   close IBD_FILE; EOF   #--exec $INNOCHECKSUM $MYSQLD_DATADIR/test/t1.ibd   --source include/start_mysqld.inc   SELECT * FROM t1; The above test case will make the server recovery hang because the read page is corrupted and it fails to decrypt.

            The MDEV-15528 record MLOG_INIT_FREE_PAGE should be handled in a similar way as MLOG_INIT_FILE_PAGE2 or MLOG_ZIP_PAGE_COMPRESS that is encountered in a complete mini-transaction: discard any preceding records for the page, and do not load the page to the buffer pool.

            marko Marko Mäkelä added a comment - The MDEV-15528 record MLOG_INIT_FREE_PAGE should be handled in a similar way as MLOG_INIT_FILE_PAGE2 or MLOG_ZIP_PAGE_COMPRESS that is encountered in a complete mini-transaction: discard any preceding records for the page, and do not load the page to the buffer pool.

            I fixed some concurrency issues that caused intermittent recovery failures, and cleaned up both the existing and new code. I pushed to buildbot for final testing, and expect to push to 10.2 tomorrow.

            marko Marko Mäkelä added a comment - I fixed some concurrency issues that caused intermittent recovery failures, and cleaned up both the existing and new code. I pushed to buildbot for final testing, and expect to push to 10.2 tomorrow.

            I have pushed preparatory refactoring and cleanup to 10.2 and merged to 10.3.

            A merge to 10.4 along with a test of MLOG_INIT_FREE_PAGE revealed corruption, which I think we must fix before pushing the actual fix to main trees. Also, some corruption became more frequent in mariabackup tests. Hopefully these corruptions are related.

            marko Marko Mäkelä added a comment - I have pushed preparatory refactoring and cleanup to 10.2 and merged to 10.3. A merge to 10.4 along with a test of MLOG_INIT_FREE_PAGE revealed corruption , which I think we must fix before pushing the actual fix to main trees. Also, some corruption became more frequent in mariabackup tests. Hopefully these corruptions are related.

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.