[MDEV-32939] If tables are frequently created, renamed, dropped, a backup cannot be restored - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.6
Fix Version/s: 10.6.17, 10.11.7, 11.0.5, 11.1.4, 11.2.3, 11.3.2
Component/s: Backup, Storage Engine - InnoDB
Labels:
None

Description

mleich produced rr replay traces of a server and a backup session that leads to a situation where restoring the backup fails:

10.6 768a736174d6caf09df43e84b0c1b9ec52f1a301
2023-12-04 15:18:52 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=22001723,22042169
2023-12-04 15:18:52 0 [ERROR] InnoDB: Missing FILE_CREATE, FILE_DELETE or FILE_MODIFY before FILE_CHECKPOINT for tablespace 13
2023-12-04 15:18:52 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1467] with error Data structure corruption
[00] FATAL ERROR: 2023-12-04 15:18:52 mariabackup: innodb_init() returned 37 (Data structure corruption).

This is easily reproducible with the attached files. I am able to recover the backup if I rename some files before starting the restore:

for i in 1 3 4 5 6; do mv data/test/t$i.new data/cool_down/t$i.ibd; done

Also some encryption related paths in data/backup-my.cnf may need to be adjusted. With the files renamed and the encryption parameters adjusted, the data set will recover:

10.6 768a736174d6caf09df43e84b0c1b9ec52f1a301
2023-12-04 16:24:39 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=22001723,22001723
2023-12-04 16:24:39 0 [Note] InnoDB: To recover: 250 pages
[00] 2023-12-04 16:24:40 Last binlog file , position 0
[00] 2023-12-04 16:24:40 completed OK!

The logging and recovery of DDL operations was rewritten in 10.6. Before ~~MDEV-24626~~ and other changes, it could be very hard to reproduce this type of failures.

I do not think that crash recovery is affected by this. This problem should be unique to backup and code like the following:

10.6 768a736174d6caf09df43e84b0c1b9ec52f1a301
Thread 1 hit Breakpoint 5, 0x00005653c65d0f90 in unlink@plt ()
1: (char*)$rdi = 0x7fffb6a972d0 "./test/t1.ibd"
(rr) bt
#0 0x00005653c65d0f90 in unlink@plt ()
#1 0x00005653c65448aa in my_delete (name=0x7fffb6a972d0 "./test/t1.ibd", MyFlags=16) at /mariadb/10.6/mysys/my_delete.c:43
#2 0x00005653c5c88562 in rename_force (from=0x7fffb6a972b0 "./test/t1.new", to=0x7fffb6a972d0 "./test/t1.ibd") at /mariadb/10.6/extra/mariabackup/xtrabackup.cc:5689
#3 0x00005653c5c8c191 in prepare_handle_new_files (data_home_dir=<optimized out>, db_name=0x7fffb6a98370 "test", file_name=0x7fffb6a99350 "t1.new", arg=0x0) at /usr/include/c++/13/bits/basic_string.h:222
#4 0x00005653c5c8cade in xb_process_datadir (path=path@entry=0x5653c569c798 ".", suffix=suffix@entry=0x5653c567705e ".new",
func=func@entry=0x5653c5c8bdf5 <prepare_handle_new_files(char const, char const, char const, void)>, func_arg=func_arg@entry=0x0) at /mariadb/10.6/extra/mariabackup/xtrabackup.cc:5911
#5 0x00005653c5c90c85 in xtrabackup_prepare_func (argv=argv@entry=0x5653c8a9da38) at /mariadb/10.6/extra/mariabackup/xtrabackup.cc:6097
#6 0x00005653c5c94ae5 in main_low (argv=0x5653c8a9da38) at /mariadb/10.6/extra/mariabackup/xtrabackup.cc:7135
#7 0x00005653c5c94d33 in main (argc=<optimized out>, argv=<optimized out>) at /mariadb/10.6/extra/mariabackup/xtrabackup.cc:6919

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

data.tar.xz
719 kB
2023-12-04 14:32
data-fts.tar.xz
3.97 MB
2023-12-07 14:34
encryption_keys.txt
0.4 kB
2023-12-04 14:32
fbackup.tar.xz
946 kB
2023-12-07 10:03

Issue Links

relates to

MDEV-24626 Remove synchronous write of page0 and flushing file during file creation

Closed

Activity

Ascending order - Click to sort in descending order

Marko Mäkelä added a comment - 2023-12-04 15:29

I finally found out where the information on the file name is lost. It is in deferred_spaces.add():

    /* The file name must be unique. Keep the one with the latest LSN. */

    auto d= defers.begin();

    while (d != defers.end())

      if (d->second.file_name != defer.file_name)

        ++d;

// …

        /* Reset the old tablespace name in recovered spaces list */

        recv_spaces_t::iterator it{recv_spaces.find(d->first)};

        if (it != recv_spaces.end() &&

            it->second.name == d->second.file_name)

          it->second.name = "";

        defers.erase(d++);

This assumption will obviously be violated when preparing a backup where tables by the same name have been created, renamed, and dropped. The input that produced data.tar.xz involved several CREATE TABLE in one schema, RENAME TABLE to another schema, and DROP SCHEMA cool_down, in a loop.

Marko Mäkelä added a comment - 2023-12-04 15:29 I finally found out where the information on the file name is lost. It is in deferred_spaces.add() : /* The file name must be unique. Keep the one with the latest LSN. */ auto d= defers.begin(); while (d != defers.end()) { if (d->second.file_name != defer.file_name) ++d; // … /* Reset the old tablespace name in recovered spaces list */ recv_spaces_t::iterator it{recv_spaces.find(d->first)}; if (it != recv_spaces.end() && it->second.name == d->second.file_name) it->second.name = "" ; defers.erase(d++); This assumption will obviously be violated when preparing a backup where tables by the same name have been created, renamed, and dropped. The input that produced data.tar.xz involved several CREATE TABLE in one schema, RENAME TABLE to another schema, and DROP SCHEMA cool_down , in a loop.

Marko Mäkelä added a comment - 2023-12-07 10:05

fbackup.tar.xz is a data set that will require a more extensive fix: renaming the expected file names when fil_name_process() will be invoked on FILE_RENAME and the file is not found.

Marko Mäkelä added a comment - 2023-12-07 10:05 fbackup.tar.xz is a data set that will require a more extensive fix: renaming the expected file names when fil_name_process() will be invoked on FILE_RENAME and the file is not found.

Marko Mäkelä added a comment - 2023-12-07 14:35

data-fts.tar.xz is one more data set that fails to recover some backed-up files correctly. At least one file (tablespace 15, test/t2.ibd, created during the backup) would incorrectly be recovered as containing only NUL bytes.

Marko Mäkelä added a comment - 2023-12-07 14:35 data-fts.tar.xz is one more data set that fails to recover some backed-up files correctly. At least one file (tablespace 15, test/t2.ibd , created during the backup) would incorrectly be recovered as containing only NUL bytes.

Marko Mäkelä added a comment - 2023-12-08 14:36

I revised the logic so that most tables in data-fts.tar.xz would be recovered, but the tables t5 and t7 would be reported corrupted. I do not know if it is related to incorrect encryption parameters.

Marko Mäkelä added a comment - 2023-12-08 14:36 I revised the logic so that most tables in data-fts.tar.xz would be recovered, but the tables t5 and t7 would be reported corrupted. I do not know if it is related to incorrect encryption parameters.

Marko Mäkelä added a comment - 2023-12-14 08:11

After a further revision, all tables of data-fts.tar.xz recover, that is, check table t1,t2,t3,t4,t5,t6,t7; reports them OK.

Marko Mäkelä added a comment - 2023-12-14 08:11 After a further revision, all tables of data-fts.tar.xz recover, that is, check table t1,t2,t3,t4,t5,t6,t7; reports them OK.

Matthias Leich added a comment - 2023-12-14 14:52

origin/10.6-~~MDEV-32939~~ f21a6cbf6ee720b35cf3be011dbc4725ad99a5bb 2023-12-14T13:16:28+02:00
performed well in RQG testing. No new problems

Matthias Leich added a comment - 2023-12-14 14:52 origin/10.6- MDEV-32939 f21a6cbf6ee720b35cf3be011dbc4725ad99a5bb 2023-12-14T13:16:28+02:00 performed well in RQG testing. No new problems

MariaDB Server

If tables are frequently created, renamed, dropped, a backup cannot be restored

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration