Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34898

Doublewrite recovery of innodb_checksum_algorithm=full_crc32 encrypted pages does not work

Details

    Description

      The test encryption.debug_key_management,undo3 sporadically fails during the only server restart of the test due to the following:

      2024-09-09  4:29:09 0 [ERROR] InnoDB: Encrypted page [page id: space=3, page number=13] in file .//undo003 looks corrupted; key_version=10
      

      As far as I can tell, all InnoDB tablespaces should be encrypted with the key version 10 at that point.

      I didn't find recent failures of this test of this kind for the 10.5 branch. This type of failure seems to occur starting with 10.6.

      Attachments

        Issue Links

          Activity

            According to thiru, there are similar failures also in other tests, such as encryption.innodb-encryption-alter. We have been unable to reproduce the issue so far.

            marko Marko Mäkelä added a comment - According to thiru , there are similar failures also in other tests, such as encryption.innodb-encryption-alter . We have been unable to reproduce the issue so far.

            --source include/have_innodb.inc
            --source include/have_debug.inc
            --source include/not_embedded.inc
            --source include/have_example_key_management_plugin.inc
             
            let INNODB_PAGE_SIZE=`select @@innodb_page_size`;
            let MYSQLD_DATADIR=`select @@datadir`;
             
            create table t1 (f1 int primary key, f2 blob)engine=innodb stats_persistent=0;
             
            start transaction;
            insert into t1 values(1, repeat('#',12));
            insert into t1 values(2, repeat('+',12));
            insert into t1 values(3, repeat('/',12));
            insert into t1 values(4, repeat('-',12));
            insert into t1 values(5, repeat('.',12));
            commit work;
             
            # Slow shutdown and restart to make sure ibuf merge is finished
            SET GLOBAL innodb_fast_shutdown = 0;
            let $shutdown_timeout=;
            let $restart_parameters=--debug_dbug=+d,ib_log_checkpoint_avoid_hard --innodb_flush_sync=0;
            --source include/restart_mysqld.inc
            --source ../../suite/innodb/include/no_checkpoint_start.inc
            select space into @space_id from information_schema.innodb_sys_tablespaces where name="test/t1";
            begin; 
            insert into t1 values (6, repeat('%', 400));
             
             
            set global innodb_saved_page_number_debug = 0;
            set global innodb_fil_make_page_dirty_debug = @space_id;
             
            set global innodb_saved_page_number_debug = 1;
            set global innodb_fil_make_page_dirty_debug = @space_id;
             
            set global innodb_buf_flush_list_now = 1;
            --let CLEANUP_IF_CHECKPOINT=drop table t1, unexpected_checkpoint;
            --source ../../suite/innodb/include/no_checkpoint_end.inc
             
            perl;
            use IO::Handle;
            my $fname= "$ENV{'MYSQLD_DATADIR'}test/t1.ibd";
            open(FILE, "+<", $fname) or die;
            FILE->autoflush(1);
            binmode FILE;
            print FILE chr(0) x ($ENV{'INNODB_PAGE_SIZE'});
            seek(FILE, $ENV{'INNODB_PAGE_SIZE'}, SEEK_SET);
            print FILE chr(0) x ($ENV{'INNODB_PAGE_SIZE'});
            close FILE;
            EOF
            let $restart_parameters=;
            --source include/start_mysqld.inc
            check table t1;
            select f1, f2 from t1;
            
            

            Run the above test case fails with

            --innodb-use-atomic-writes=0
            --innodb-encrypt-tables=FORCE
            --innodb_sys_tablespaces
            

            Test case fails with

            2024-12-10 15:16:11 0 [Note] InnoDB: Set innodb_force_recovery=1 to ignore corrupted pages.
            2024-12-10 15:16:11 0 [ERROR] InnoDB: Unable to apply log to corrupted page 1 in file ./test/t1.ibd
            2024-12-10 15:16:11 0 [ERROR] InnoDB: Recovery failed to read page 1 from ./test/t1.ibd
            2024-12-10 15:16:11 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1562] with error Data structure corruption
            2024-12-10 15:16:11 0 [Note] InnoDB: Starting shutdown...
            2024-12-10 15:16:11 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
            

            Doublewrite recovery of full crc32 encrypted page doesn't work.

            thiru Thirunarayanan Balathandayuthapani added a comment - --source include/have_innodb.inc --source include/have_debug.inc --source include/not_embedded.inc --source include/have_example_key_management_plugin.inc   let INNODB_PAGE_SIZE=`select @@innodb_page_size`; let MYSQLD_DATADIR=`select @@datadir`;   create table t1 (f1 int primary key, f2 blob)engine=innodb stats_persistent=0;   start transaction; insert into t1 values(1, repeat('#',12)); insert into t1 values(2, repeat('+',12)); insert into t1 values(3, repeat('/',12)); insert into t1 values(4, repeat('-',12)); insert into t1 values(5, repeat('.',12)); commit work;   # Slow shutdown and restart to make sure ibuf merge is finished SET GLOBAL innodb_fast_shutdown = 0; let $shutdown_timeout=; let $restart_parameters=--debug_dbug=+d,ib_log_checkpoint_avoid_hard --innodb_flush_sync=0; --source include/restart_mysqld.inc --source ../../suite/innodb/include/no_checkpoint_start.inc select space into @space_id from information_schema.innodb_sys_tablespaces where name="test/t1"; begin; insert into t1 values (6, repeat('%', 400));     set global innodb_saved_page_number_debug = 0; set global innodb_fil_make_page_dirty_debug = @space_id;   set global innodb_saved_page_number_debug = 1; set global innodb_fil_make_page_dirty_debug = @space_id;   set global innodb_buf_flush_list_now = 1; --let CLEANUP_IF_CHECKPOINT=drop table t1, unexpected_checkpoint; --source ../../suite/innodb/include/no_checkpoint_end.inc   perl; use IO::Handle; my $fname= "$ENV{'MYSQLD_DATADIR'}test/t1.ibd"; open(FILE, "+<", $fname) or die; FILE->autoflush(1); binmode FILE; print FILE chr(0) x ($ENV{'INNODB_PAGE_SIZE'}); seek(FILE, $ENV{'INNODB_PAGE_SIZE'}, SEEK_SET); print FILE chr(0) x ($ENV{'INNODB_PAGE_SIZE'}); close FILE; EOF let $restart_parameters=; --source include/start_mysqld.inc check table t1; select f1, f2 from t1; Run the above test case fails with --innodb-use-atomic-writes=0 --innodb-encrypt-tables=FORCE --innodb_sys_tablespaces Test case fails with 2024-12-10 15:16:11 0 [Note] InnoDB: Set innodb_force_recovery=1 to ignore corrupted pages. 2024-12-10 15:16:11 0 [ERROR] InnoDB: Unable to apply log to corrupted page 1 in file ./test/t1.ibd 2024-12-10 15:16:11 0 [ERROR] InnoDB: Recovery failed to read page 1 from ./test/t1.ibd 2024-12-10 15:16:11 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1562] with error Data structure corruption 2024-12-10 15:16:11 0 [Note] InnoDB: Starting shutdown... 2024-12-10 15:16:11 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed. Doublewrite recovery of full crc32 encrypted page doesn't work.

            Thank you for the great analysis. My idea for fixing this would be as follows:

            1. In buf_dblwr_t::recover(), preserve any pages whose space_id does not match a known tablespace, or a tablespace with the same parameters. These could be encrypted pages of tablespaces that had been created with innodb_checksum_algorithm=full_crc32.
            2. In buf_page_t::read_complete(), if the page looks corrupted and the tablespace is full_crc32 and encrypted, and crash recovery is active, attempt to restore the page from the above mentioned special portion of the doublewrite buffer. (No need to access the file system; the page should be written back from the buffer pool by recovery.)

            We should probably also remove any zeroing-out of the doublewrite buffer in the system tablespace. After all, we do validate the FIL_PAGE_LSN in each page that we found in the doublewrite buffer. Therefore, any old pages found there should not cause any trouble.

            marko Marko Mäkelä added a comment - Thank you for the great analysis. My idea for fixing this would be as follows: In buf_dblwr_t::recover() , preserve any pages whose space_id does not match a known tablespace, or a tablespace with the same parameters. These could be encrypted pages of tablespaces that had been created with innodb_checksum_algorithm=full_crc32 . In buf_page_t::read_complete() , if the page looks corrupted and the tablespace is full_crc32 and encrypted, and crash recovery is active, attempt to restore the page from the above mentioned special portion of the doublewrite buffer. (No need to access the file system; the page should be written back from the buffer pool by recovery.) We should probably also remove any zeroing-out of the doublewrite buffer in the system tablespace. After all, we do validate the FIL_PAGE_LSN in each page that we found in the doublewrite buffer. Therefore, any old pages found there should not cause any trouble.

            People

              thiru Thirunarayanan Balathandayuthapani
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.