Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-12699

Improve crash recovery of corrupted data pages

Details

    • 10.2.11

    Description

      If a page cannot be decrypted (or read) during crash recovery, InnoDB should cleanly abort the startup. If innodb_force_recovery is specified, we should ignore the problematic page and apply redo log to other pages.

      If there is a MLOG_INIT_FILE_PAGE or MLOG_ZIP_PAGE_COMPRESS record for a corrupted page, we can always safely ignore the previous page contents and apply the redo log. There is no need to read a page from the data file when applying MLOG_INIT_FILE_PAGE or MLOG_ZIP_PAGE_COMPRESS.

      Currently in 10.2, the test encryption.innodb-redo-badkey randomly fails, possibly because of this. Also, the test innodb.innodb_bug14147491 needs to be extended.

      Attachments

        Issue Links

          Activity

            I have pushed preparatory refactoring and cleanup to 10.2 and merged to 10.3.

            A merge to 10.4 along with a test of MLOG_INIT_FREE_PAGE revealed corruption, which I think we must fix before pushing the actual fix to main trees. Also, some corruption became more frequent in mariabackup tests. Hopefully these corruptions are related.

            marko Marko Mäkelä added a comment - I have pushed preparatory refactoring and cleanup to 10.2 and merged to 10.3. A merge to 10.4 along with a test of MLOG_INIT_FREE_PAGE revealed corruption , which I think we must fix before pushing the actual fix to main trees. Also, some corruption became more frequent in mariabackup tests. Hopefully these corruptions are related.

            I fixed some concurrency issues that caused intermittent recovery failures, and cleaned up both the existing and new code. I pushed to buildbot for final testing, and expect to push to 10.2 tomorrow.

            marko Marko Mäkelä added a comment - I fixed some concurrency issues that caused intermittent recovery failures, and cleaned up both the existing and new code. I pushed to buildbot for final testing, and expect to push to 10.2 tomorrow.

            The MDEV-15528 record MLOG_INIT_FREE_PAGE should be handled in a similar way as MLOG_INIT_FILE_PAGE2 or MLOG_ZIP_PAGE_COMPRESS that is encountered in a complete mini-transaction: discard any preceding records for the page, and do not load the page to the buffer pool.

            marko Marko Mäkelä added a comment - The MDEV-15528 record MLOG_INIT_FREE_PAGE should be handled in a similar way as MLOG_INIT_FILE_PAGE2 or MLOG_ZIP_PAGE_COMPRESS that is encountered in a complete mini-transaction: discard any preceding records for the page, and do not load the page to the buffer pool.

            --source include/have_file_key_management.inc
             
            CREATE TABLE t1(c VARCHAR(128)) ENGINE INNODB, encrypted=yes;
            insert into t1 select repeat('a',100);
             
            let $restart_parameters = --innodb_flush_log_at_trx_commit=1;
            --source include/restart_mysqld.inc
             
            let $MYSQLD_DATADIR=`select @@datadir`;
            let t1_IBD = $MYSQLD_DATADIR/test/t1.ibd;
             
            insert into t1 select repeat('b', 100);
             
            --source include/kill_mysqld.inc
             
            --echo # Corrupt the table
             
            perl;
            use strict;
            use warnings;
            use Fcntl qw(:DEFAULT :seek);
             
            my $ibd_file = $ENV{'t1_IBD'};
             
            my $chunk;
            my $len;
             
            sysopen IBD_FILE, $ibd_file, O_RDWR || die "Unable to open $ibd_file";
            sysseek IBD_FILE, 16384 * 3, SEEK_CUR;
            $chunk = '\xAA\xAA\xAA\xAA';
            syswrite IBD_FILE, $chunk, 4;
             
            close IBD_FILE;
            EOF
             
            #--exec $INNOCHECKSUM $MYSQLD_DATADIR/test/t1.ibd
             
            --source include/start_mysqld.inc
             
            SELECT * FROM t1;
            

            The above test case will make the server recovery hang because the read page is corrupted and it fails to decrypt.

            thiru Thirunarayanan Balathandayuthapani added a comment - --source include/have_file_key_management.inc   CREATE TABLE t1(c VARCHAR(128)) ENGINE INNODB, encrypted=yes; insert into t1 select repeat('a',100);   let $restart_parameters = --innodb_flush_log_at_trx_commit=1; --source include/restart_mysqld.inc   let $MYSQLD_DATADIR=`select @@datadir`; let t1_IBD = $MYSQLD_DATADIR/test/t1.ibd;   insert into t1 select repeat('b', 100);   --source include/kill_mysqld.inc   --echo # Corrupt the table   perl; use strict; use warnings; use Fcntl qw(:DEFAULT :seek);   my $ibd_file = $ENV{'t1_IBD'};   my $chunk; my $len;   sysopen IBD_FILE, $ibd_file, O_RDWR || die "Unable to open $ibd_file"; sysseek IBD_FILE, 16384 * 3, SEEK_CUR; $chunk = '\xAA\xAA\xAA\xAA'; syswrite IBD_FILE, $chunk, 4;   close IBD_FILE; EOF   #--exec $INNOCHECKSUM $MYSQLD_DATADIR/test/t1.ibd   --source include/start_mysqld.inc   SELECT * FROM t1; The above test case will make the server recovery hang because the read page is corrupted and it fails to decrypt.

            Testing the fix should include running ./mtr encryption.innodb-redo-badkey, which is currently disabled due to MDEV-13893.

            marko Marko Mäkelä added a comment - Testing the fix should include running ./mtr encryption.innodb-redo-badkey , which is currently disabled due to MDEV-13893 .

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.