[MDEV-32939] If tables are frequently created, renamed, dropped, a backup cannot be restored Created: 2023-12-04  Updated: 2023-12-14  Resolved: 2023-12-14

Status: Closed
Project: MariaDB Server
Component/s: Backup, Storage Engine - InnoDB
Affects Version/s: 10.6
Fix Version/s: 10.6.17, 10.11.7, 11.0.5, 11.1.4, 11.2.3, 11.3.2

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: None

Attachments: File data-fts.tar.xz     File data.tar.xz     Text File encryption_keys.txt     File fbackup.tar.xz    
Issue Links:
Relates
relates to MDEV-24626 Remove synchronous write of page0 and... Closed

 Description   

mleich produced rr replay traces of a server and a backup session that leads to a situation where restoring the backup fails:

10.6 768a736174d6caf09df43e84b0c1b9ec52f1a301

2023-12-04 15:18:52 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=22001723,22042169
2023-12-04 15:18:52 0 [ERROR] InnoDB: Missing FILE_CREATE, FILE_DELETE or FILE_MODIFY before FILE_CHECKPOINT for tablespace 13
2023-12-04 15:18:52 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1467] with error Data structure corruption
[00] FATAL ERROR: 2023-12-04 15:18:52 mariabackup: innodb_init() returned 37 (Data structure corruption).

This is easily reproducible with the attached files. I am able to recover the backup if I rename some files before starting the restore:

for i in 1 3 4 5 6; do mv data/test/t$i.new data/cool_down/t$i.ibd; done

Also some encryption related paths in data/backup-my.cnf may need to be adjusted. With the files renamed and the encryption parameters adjusted, the data set will recover:

10.6 768a736174d6caf09df43e84b0c1b9ec52f1a301

2023-12-04 16:24:39 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=22001723,22001723
2023-12-04 16:24:39 0 [Note] InnoDB: To recover: 250 pages
[00] 2023-12-04 16:24:40 Last binlog file , position 0
[00] 2023-12-04 16:24:40 completed OK!

The logging and recovery of DDL operations was rewritten in 10.6. Before MDEV-24626 and other changes, it could be very hard to reproduce this type of failures.

I do not think that crash recovery is affected by this. This problem should be unique to backup and code like the following:

10.6 768a736174d6caf09df43e84b0c1b9ec52f1a301

Thread 1 hit Breakpoint 5, 0x00005653c65d0f90 in unlink@plt ()
1: (char*)$rdi = 0x7fffb6a972d0 "./test/t1.ibd"
(rr) bt
#0  0x00005653c65d0f90 in unlink@plt ()
#1  0x00005653c65448aa in my_delete (name=0x7fffb6a972d0 "./test/t1.ibd", MyFlags=16) at /mariadb/10.6/mysys/my_delete.c:43
#2  0x00005653c5c88562 in rename_force (from=0x7fffb6a972b0 "./test/t1.new", to=0x7fffb6a972d0 "./test/t1.ibd") at /mariadb/10.6/extra/mariabackup/xtrabackup.cc:5689
#3  0x00005653c5c8c191 in prepare_handle_new_files (data_home_dir=<optimized out>, db_name=0x7fffb6a98370 "test", file_name=0x7fffb6a99350 "t1.new", arg=0x0) at /usr/include/c++/13/bits/basic_string.h:222
#4  0x00005653c5c8cade in xb_process_datadir (path=path@entry=0x5653c569c798 ".", suffix=suffix@entry=0x5653c567705e ".new", 
    func=func@entry=0x5653c5c8bdf5 <prepare_handle_new_files(char const*, char const*, char const*, void*)>, func_arg=func_arg@entry=0x0) at /mariadb/10.6/extra/mariabackup/xtrabackup.cc:5911
#5  0x00005653c5c90c85 in xtrabackup_prepare_func (argv=argv@entry=0x5653c8a9da38) at /mariadb/10.6/extra/mariabackup/xtrabackup.cc:6097
#6  0x00005653c5c94ae5 in main_low (argv=0x5653c8a9da38) at /mariadb/10.6/extra/mariabackup/xtrabackup.cc:7135
#7  0x00005653c5c94d33 in main (argc=<optimized out>, argv=<optimized out>) at /mariadb/10.6/extra/mariabackup/xtrabackup.cc:6919



 Comments   
Comment by Marko Mäkelä [ 2023-12-04 ]

I finally found out where the information on the file name is lost. It is in deferred_spaces.add():

    /* The file name must be unique. Keep the one with the latest LSN. */
    auto d= defers.begin();
 
    while (d != defers.end())
    {
      if (d->second.file_name != defer.file_name)
        ++d;
// …
        /* Reset the old tablespace name in recovered spaces list */
        recv_spaces_t::iterator it{recv_spaces.find(d->first)};
        if (it != recv_spaces.end() &&
            it->second.name == d->second.file_name)
          it->second.name = "";
        defers.erase(d++);

This assumption will obviously be violated when preparing a backup where tables by the same name have been created, renamed, and dropped. The input that produced data.tar.xz involved several CREATE TABLE in one schema, RENAME TABLE to another schema, and DROP SCHEMA cool_down, in a loop.

Comment by Marko Mäkelä [ 2023-12-07 ]

fbackup.tar.xz is a data set that will require a more extensive fix: renaming the expected file names when fil_name_process() will be invoked on FILE_RENAME and the file is not found.

Comment by Marko Mäkelä [ 2023-12-07 ]

data-fts.tar.xz is one more data set that fails to recover some backed-up files correctly. At least one file (tablespace 15, test/t2.ibd, created during the backup) would incorrectly be recovered as containing only NUL bytes.

Comment by Marko Mäkelä [ 2023-12-08 ]

I revised the logic so that most tables in data-fts.tar.xz would be recovered, but the tables t5 and t7 would be reported corrupted. I do not know if it is related to incorrect encryption parameters.

Comment by Marko Mäkelä [ 2023-12-14 ]

After a further revision, all tables of data-fts.tar.xz recover, that is, check table t1,t2,t3,t4,t5,t6,t7; reports them OK.

Comment by Matthias Leich [ 2023-12-14 ]

origin/10.6-MDEV-32939 f21a6cbf6ee720b35cf3be011dbc4725ad99a5bb 2023-12-14T13:16:28+02:00
performed well in RQG testing. No new problems

Generated at Thu Feb 08 10:35:10 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.