Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32939

If tables are frequently created, renamed, dropped, a backup cannot be restored

Details

    Description

      mleich produced rr replay traces of a server and a backup session that leads to a situation where restoring the backup fails:

      10.6 768a736174d6caf09df43e84b0c1b9ec52f1a301

      2023-12-04 15:18:52 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=22001723,22042169
      2023-12-04 15:18:52 0 [ERROR] InnoDB: Missing FILE_CREATE, FILE_DELETE or FILE_MODIFY before FILE_CHECKPOINT for tablespace 13
      2023-12-04 15:18:52 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1467] with error Data structure corruption
      [00] FATAL ERROR: 2023-12-04 15:18:52 mariabackup: innodb_init() returned 37 (Data structure corruption).
      

      This is easily reproducible with the attached files. I am able to recover the backup if I rename some files before starting the restore:

      for i in 1 3 4 5 6; do mv data/test/t$i.new data/cool_down/t$i.ibd; done
      

      Also some encryption related paths in data/backup-my.cnf may need to be adjusted. With the files renamed and the encryption parameters adjusted, the data set will recover:

      10.6 768a736174d6caf09df43e84b0c1b9ec52f1a301

      2023-12-04 16:24:39 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=22001723,22001723
      2023-12-04 16:24:39 0 [Note] InnoDB: To recover: 250 pages
      [00] 2023-12-04 16:24:40 Last binlog file , position 0
      [00] 2023-12-04 16:24:40 completed OK!
      

      The logging and recovery of DDL operations was rewritten in 10.6. Before MDEV-24626 and other changes, it could be very hard to reproduce this type of failures.

      I do not think that crash recovery is affected by this. This problem should be unique to backup and code like the following:

      10.6 768a736174d6caf09df43e84b0c1b9ec52f1a301

      Thread 1 hit Breakpoint 5, 0x00005653c65d0f90 in unlink@plt ()
      1: (char*)$rdi = 0x7fffb6a972d0 "./test/t1.ibd"
      (rr) bt
      #0  0x00005653c65d0f90 in unlink@plt ()
      #1  0x00005653c65448aa in my_delete (name=0x7fffb6a972d0 "./test/t1.ibd", MyFlags=16) at /mariadb/10.6/mysys/my_delete.c:43
      #2  0x00005653c5c88562 in rename_force (from=0x7fffb6a972b0 "./test/t1.new", to=0x7fffb6a972d0 "./test/t1.ibd") at /mariadb/10.6/extra/mariabackup/xtrabackup.cc:5689
      #3  0x00005653c5c8c191 in prepare_handle_new_files (data_home_dir=<optimized out>, db_name=0x7fffb6a98370 "test", file_name=0x7fffb6a99350 "t1.new", arg=0x0) at /usr/include/c++/13/bits/basic_string.h:222
      #4  0x00005653c5c8cade in xb_process_datadir (path=path@entry=0x5653c569c798 ".", suffix=suffix@entry=0x5653c567705e ".new", 
          func=func@entry=0x5653c5c8bdf5 <prepare_handle_new_files(char const*, char const*, char const*, void*)>, func_arg=func_arg@entry=0x0) at /mariadb/10.6/extra/mariabackup/xtrabackup.cc:5911
      #5  0x00005653c5c90c85 in xtrabackup_prepare_func (argv=argv@entry=0x5653c8a9da38) at /mariadb/10.6/extra/mariabackup/xtrabackup.cc:6097
      #6  0x00005653c5c94ae5 in main_low (argv=0x5653c8a9da38) at /mariadb/10.6/extra/mariabackup/xtrabackup.cc:7135
      #7  0x00005653c5c94d33 in main (argc=<optimized out>, argv=<optimized out>) at /mariadb/10.6/extra/mariabackup/xtrabackup.cc:6919
      

      Attachments

        1. data.tar.xz
          719 kB
        2. data-fts.tar.xz
          3.97 MB
        3. encryption_keys.txt
          0.4 kB
        4. fbackup.tar.xz
          946 kB

        Issue Links

          Activity

            I finally found out where the information on the file name is lost. It is in deferred_spaces.add():

                /* The file name must be unique. Keep the one with the latest LSN. */
                auto d= defers.begin();
             
                while (d != defers.end())
                {
                  if (d->second.file_name != defer.file_name)
                    ++d;
            // …
                    /* Reset the old tablespace name in recovered spaces list */
                    recv_spaces_t::iterator it{recv_spaces.find(d->first)};
                    if (it != recv_spaces.end() &&
                        it->second.name == d->second.file_name)
                      it->second.name = "";
                    defers.erase(d++);
            

            This assumption will obviously be violated when preparing a backup where tables by the same name have been created, renamed, and dropped. The input that produced data.tar.xz involved several CREATE TABLE in one schema, RENAME TABLE to another schema, and DROP SCHEMA cool_down, in a loop.

            marko Marko Mäkelä added a comment - I finally found out where the information on the file name is lost. It is in deferred_spaces.add() : /* The file name must be unique. Keep the one with the latest LSN. */ auto d= defers.begin();   while (d != defers.end()) { if (d->second.file_name != defer.file_name) ++d; // … /* Reset the old tablespace name in recovered spaces list */ recv_spaces_t::iterator it{recv_spaces.find(d->first)}; if (it != recv_spaces.end() && it->second.name == d->second.file_name) it->second.name = "" ; defers.erase(d++); This assumption will obviously be violated when preparing a backup where tables by the same name have been created, renamed, and dropped. The input that produced data.tar.xz involved several CREATE TABLE in one schema, RENAME TABLE to another schema, and DROP SCHEMA cool_down , in a loop.

            fbackup.tar.xz is a data set that will require a more extensive fix: renaming the expected file names when fil_name_process() will be invoked on FILE_RENAME and the file is not found.

            marko Marko Mäkelä added a comment - fbackup.tar.xz is a data set that will require a more extensive fix: renaming the expected file names when fil_name_process() will be invoked on FILE_RENAME and the file is not found.

            data-fts.tar.xz is one more data set that fails to recover some backed-up files correctly. At least one file (tablespace 15, test/t2.ibd, created during the backup) would incorrectly be recovered as containing only NUL bytes.

            marko Marko Mäkelä added a comment - data-fts.tar.xz is one more data set that fails to recover some backed-up files correctly. At least one file (tablespace 15, test/t2.ibd , created during the backup) would incorrectly be recovered as containing only NUL bytes.

            I revised the logic so that most tables in data-fts.tar.xz would be recovered, but the tables t5 and t7 would be reported corrupted. I do not know if it is related to incorrect encryption parameters.

            marko Marko Mäkelä added a comment - I revised the logic so that most tables in data-fts.tar.xz would be recovered, but the tables t5 and t7 would be reported corrupted. I do not know if it is related to incorrect encryption parameters.

            After a further revision, all tables of data-fts.tar.xz recover, that is, check table t1,t2,t3,t4,t5,t6,t7; reports them OK.

            marko Marko Mäkelä added a comment - After a further revision, all tables of data-fts.tar.xz recover, that is, check table t1,t2,t3,t4,t5,t6,t7; reports them OK.

            origin/10.6-MDEV-32939 f21a6cbf6ee720b35cf3be011dbc4725ad99a5bb 2023-12-14T13:16:28+02:00
            performed well in RQG testing. No new problems

            mleich Matthias Leich added a comment - origin/10.6- MDEV-32939 f21a6cbf6ee720b35cf3be011dbc4725ad99a5bb 2023-12-14T13:16:28+02:00 performed well in RQG testing. No new problems

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.