[MDEV-27817] InnoDB recovery of recently created files is not crash-safe Created: 2022-02-11  Updated: 2022-02-22  Resolved: 2022-02-13

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.6.6, 10.7.2, 10.8.1, 10.6.7, 10.7.3, 10.8.2
Fix Version/s: 10.6.8, 10.7.4, 10.8.3

Type: Bug Priority: Blocker
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: corruption, recovery, regression-10.6

Issue Links:
Blocks
Problem/Incident
is caused by MDEV-24626 Remove synchronous write of page0 and... Closed
Relates
relates to MDEV-27022 Buffer pool is being flushed during r... Closed

 Description   

The test encryption.innodb-redo-badkey fails on our CI systems quite often like this:

2022-02-10 21:39:45 0 [Warning] Found 3 prepared XA transactions
2022-02-10 21:39:46 0 [ERROR] InnoDB: Missing FILE_CREATE, FILE_DELETE or FILE_MODIFY before FILE_CHECKPOINT for tablespace 85
2022-02-10 21:39:46 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1445] with error Data structure corruption

I reduced the test to the minimum so that it would still occasionally fail on my system when being run 64 times concurrently with each other:

create table t1(a int primary key, c char(250), b blob) engine=innodb;
create table t2(a int primary key, c char(250), b blob) engine=innodb;
create table t3(a int primary key, c char(250), b blob) engine=innodb;
create table t4(a int primary key, c char(250), b blob) engine=innodb;
set unique_checks=0,foreign_key_checks=0;
set @x=repeat('x',140),@y=repeat('y',140),@b=repeat('b',42000);
begin;
insert into t1 select seq, @x, @b from seq_1_to_20;
insert into t2 select seq, @x, @b from seq_1_to_20;
insert into t3 select seq, @x, @b from seq_1_to_20;
insert into t4 select seq, @x, @b from seq_1_to_20;
update t1 set c = @y;
update t2 set c = @y;
update t3 set c = @y;
update t4 set c = @y;
commit;
# restart
# restart
# restart
drop table t1,t2,t3,t4;

There are 2 failure patterns:

2022-02-11 17:01:59 0 [ERROR] InnoDB: Missing FILE_CREATE, FILE_DELETE or FILE_MODIFY before FILE_CHECKPOINT for tablespace 5

and

2022-02-11 16:52:35 0 [ERROR] InnoDB: Page [page id: space=0, page number=243] log sequence number 10492220 is in the future! Current system log sequence number 10485711.

On 10.5, 64×100 runs of the test completed fine. I tried defining SUX_LOCK_GENERIC on 10.6, but also with that, the test fails.



 Comments   
Comment by Marko Mäkelä [ 2022-02-11 ]

This regression was introduced between MariaDB Server 10.6.5 and 10.6.6. The failure is very sporadic for some commits. Sometimes, there is no failure within 64×100 runs.

The first broken 10.6 commit is a merge of MDEV-27022 from 10.5.14. That change aimed to optimize recovery so that recovered pages will not be immediately written back from the buffer pool to the data files. That extra write had been introduced in MDEV-23399 (10.5.7).

If I revert that MDEV-27022 change from that 10.6 revision, the test case no longer fails. I executed the test at least 4×64×100 times (maybe 5×64×100) to be sure.

I do not think that older versions are really unaffected by this bug; the bug should merely be harder to reproduce. I think that this bug can be fixed by invoking fil_names_dirty(), to guarantee that FILE_MODIFY (or pre-MDEV-12353 MLOG_FILE_NAME) records for the recovered files will be written on a subsequent checkpoint.

This problem was also observed on Microsoft Windows:

bb-10.6-release e3894f5d397722dd50fb91dbbdbad3aaf017a4b0

innodb.innodb_bug59641 'innodb'          w4 [ fail ]
        Test ended at 2022-02-10 21:39:47
 
CURRENT_TEST: innodb.innodb_bug59641
mysqltest: At line 45: query 'SELECT * FROM t' failed: ER_UNKNOWN_STORAGE_ENGINE (1286): Unknown storage engine 'InnoDB'
***Warnings generated in error logs during shutdown after running tests: innodb.innodb_bug59641
 
2022-02-10 21:39:45 0 [Warning] Found 3 prepared XA transactions
2022-02-10 21:39:46 0 [ERROR] InnoDB: Missing FILE_CREATE, FILE_DELETE or FILE_MODIFY before FILE_CHECKPOINT for tablespace 85

Right before the failure, the test had killed and restarted the server twice in a row:

--let $shutdown_timeout=0
--source include/restart_mysqld.inc
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
SELECT * FROM t;
COMMIT;
 
--source include/restart_mysqld.inc
 
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
SELECT * FROM t;

Comment by Marko Mäkelä [ 2022-02-13 ]

The following patch fixes the test for me (tested on 10.8):

diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc
index 5b2a006859d..3c54c584244 100644
--- a/storage/innobase/log/log0recv.cc
+++ b/storage/innobase/log/log0recv.cc
@@ -907,6 +907,7 @@ bool recv_sys_t::recover_deferred(recv_sys_t::map::iterator &p,
       node->deferred= true;
       if (!space->acquire())
         goto fail;
+      fil_names_dirty(space);
       const bool is_compressed= fil_space_t::is_compressed(flags);
 #ifdef _WIN32
       const bool is_sparse= is_compressed;

This code was added in MDEV-24626. Before that change, also the tablespace identifiers of recently created data files would have been guaranteed to be known already in recv_init_crash_recovery_spaces().

Because of the omission, the first checkpoint that is written after the recovery could be incorrect, unless that checkpoint happened to ‘consume’ more of the log. In the failing case, the checkpoint LSN would be somewhere before the last LSN that was recovered from the log.

Comment by Marko Mäkelä [ 2022-02-13 ]

There was no issue until MDEV-27022 removed the write batch at the end of the last recovery batch, because thanks to that extra write batch, the very next log checkpoint after recovery was guaranteed to never be earlier than the last recovered LSN.

Generated at Thu Feb 08 09:55:48 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.