If a page cannot be decrypted (or read) during crash recovery, InnoDB should cleanly abort the startup. If innodb_force_recovery is specified, we should ignore the problematic page and apply redo log to other pages.
If there is a MLOG_INIT_FILE_PAGE or MLOG_ZIP_PAGE_COMPRESS record for a corrupted page, we can always safely ignore the previous page contents and apply the redo log. There is no need to read a page from the data file when applying MLOG_INIT_FILE_PAGE or MLOG_ZIP_PAGE_COMPRESS.
Currently in 10.2, the test encryption.innodb-redo-badkey randomly fails, possibly because of this. Also, the test innodb.innodb_bug14147491 needs to be extended.
Attachments
Issue Links
blocks
MDEV-19738Doublewrite buffer is unnecessarily used for newly (re)initialized pages
Closed
causes
MDEV-20688Recovery crashes after unnecessarily reading a corrupted page
Closed
is blocked by
MDEV-19241InnoDB fails to write MLOG_INDEX_LOAD upon completing ALTER TABLE
Closed
is duplicated by
MDEV-12898encryption.innodb-redo-badkey failed in buildbot, server hung on startup
Closed
MDEV-14079encryption.innodb-force-corrupt failed in buildbot
Closed
relates to
MDEV-12700Allow innodb_read_only startup without prior slow shutdown
I have pushed preparatory refactoring and cleanup to 10.2 and merged to 10.3.
A merge to 10.4 along with a test of MLOG_INIT_FREE_PAGE revealed corruption, which I think we must fix before pushing the actual fix to main trees. Also, some corruption became more frequent in mariabackup tests. Hopefully these corruptions are related.
Marko Mäkelä
added a comment - I have pushed preparatory refactoring and cleanup to 10.2 and merged to 10.3.
A merge to 10.4 along with a test of MLOG_INIT_FREE_PAGE revealed corruption , which I think we must fix before pushing the actual fix to main trees. Also, some corruption became more frequent in mariabackup tests. Hopefully these corruptions are related.
I fixed some concurrency issues that caused intermittent recovery failures, and cleaned up both the existing and new code. I pushed to buildbot for final testing, and expect to push to 10.2 tomorrow.
Marko Mäkelä
added a comment - I fixed some concurrency issues that caused intermittent recovery failures, and cleaned up both the existing and new code. I pushed to buildbot for final testing, and expect to push to 10.2 tomorrow.
The MDEV-15528 record MLOG_INIT_FREE_PAGE should be handled in a similar way as MLOG_INIT_FILE_PAGE2 or MLOG_ZIP_PAGE_COMPRESS that is encountered in a complete mini-transaction: discard any preceding records for the page, and do not load the page to the buffer pool.
Marko Mäkelä
added a comment - The MDEV-15528 record MLOG_INIT_FREE_PAGE should be handled in a similar way as MLOG_INIT_FILE_PAGE2 or MLOG_ZIP_PAGE_COMPRESS that is encountered in a complete mini-transaction: discard any preceding records for the page, and do not load the page to the buffer pool.
let $restart_parameters = --innodb_flush_log_at_trx_commit=1;
--source include/restart_mysqld.inc
let $MYSQLD_DATADIR=`select @@datadir`;
let t1_IBD = $MYSQLD_DATADIR/test/t1.ibd;
insert into t1 select repeat('b', 100);
--source include/kill_mysqld.inc
--echo # Corrupt the table
perl;
use strict;
use warnings;
use Fcntl qw(:DEFAULT :seek);
my $ibd_file = $ENV{'t1_IBD'};
my $chunk;
my $len;
sysopen IBD_FILE, $ibd_file, O_RDWR || die "Unable to open $ibd_file";
sysseek IBD_FILE, 16384 * 3, SEEK_CUR;
$chunk = '\xAA\xAA\xAA\xAA';
syswrite IBD_FILE, $chunk, 4;
close IBD_FILE;
EOF
#--exec $INNOCHECKSUM $MYSQLD_DATADIR/test/t1.ibd
--source include/start_mysqld.inc
SELECT * FROM t1;
The above test case will make the server recovery hang because the read page is corrupted and it fails to decrypt.
Thirunarayanan Balathandayuthapani
added a comment -
--source include/have_file_key_management.inc
CREATE TABLE t1(c VARCHAR(128)) ENGINE INNODB, encrypted=yes;
insert into t1 select repeat('a',100);
let $restart_parameters = --innodb_flush_log_at_trx_commit=1;
--source include/restart_mysqld.inc
let $MYSQLD_DATADIR=`select @@datadir`;
let t1_IBD = $MYSQLD_DATADIR/test/t1.ibd;
insert into t1 select repeat('b', 100);
--source include/kill_mysqld.inc
--echo # Corrupt the table
perl;
use strict;
use warnings;
use Fcntl qw(:DEFAULT :seek);
my $ibd_file = $ENV{'t1_IBD'};
my $chunk;
my $len;
sysopen IBD_FILE, $ibd_file, O_RDWR || die "Unable to open $ibd_file";
sysseek IBD_FILE, 16384 * 3, SEEK_CUR;
$chunk = '\xAA\xAA\xAA\xAA';
syswrite IBD_FILE, $chunk, 4;
close IBD_FILE;
EOF
#--exec $INNOCHECKSUM $MYSQLD_DATADIR/test/t1.ibd
--source include/start_mysqld.inc
SELECT * FROM t1;
The above test case will make the server recovery hang because the read page is corrupted and it fails to decrypt.
Testing the fix should include running ./mtr encryption.innodb-redo-badkey, which is currently disabled due to MDEV-13893.
Marko Mäkelä
added a comment - Testing the fix should include running ./mtr encryption.innodb-redo-badkey , which is currently disabled due to MDEV-13893 .
I have pushed preparatory refactoring and cleanup to 10.2 and merged to 10.3.
A merge to 10.4 along with a test of MLOG_INIT_FREE_PAGE revealed corruption, which I think we must fix before pushing the actual fix to main trees. Also, some corruption became more frequent in mariabackup tests. Hopefully these corruptions are related.