Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-24184

InnoDB RENAME TABLE recovery failure if names are reused

Details

    Description

      This came up in MDEV-23842. The scenario involves logging multiple RENAME operations without any log checkpoint in between.

      First of all, at least starting with MariaDB 10.5 (and the MDEV-12353 changes), we are writing a duplicate record:

      diff --git a/storage/innobase/fil/fil0fil.cc b/storage/innobase/fil/fil0fil.cc
      index 5e6bd9575b1..d279e77a105 100644
      --- a/storage/innobase/fil/fil0fil.cc
      +++ b/storage/innobase/fil/fil0fil.cc
      @@ -2530,7 +2530,6 @@ fil_rename_tablespace(
       	ut_ad(strchr(new_file_name, OS_PATH_SEPARATOR) != NULL);
       
       	if (!recv_recovery_is_on()) {
      -		fil_name_write_rename(id, old_file_name, new_file_name);
       		log_mutex_enter();
       	}
       
      

      With that fix applied, the recovery would fail on the third rename starting from checkpoint_lsn=77739:

      Thread 1 hit Breakpoint 2, fil_op_replay_rename (space_id=6, name=name@entry=0x7f51bd09f1f1 "./test/t2.ibd", new_name=new_name@entry=0x7f51bd09f1ff "./test/t1.ibd")
          at /mariadb/10.6/storage/innobase/fil/fil0fil.cc:1873
      1873	{
      Current event: 6248Thread 1 hit Breakpoint 2, fil_op_replay_rename (space_id=5, name=name@entry=0x7f51bd09f41d "./test/t5.ibd", new_name=new_name@entry=0x7f51bd09f42b "./test/t2.ibd")
          at /mariadb/10.6/storage/innobase/fil/fil0fil.cc:1873
      1873	{
      Current event: 6308Thread 1 hit Breakpoint 2, fil_op_replay_rename (space_id=5, name=name@entry=0x7f51bd09f60e "./test/t2.ibd", new_name=new_name@entry=0x7f51bd09f61c "./test/t5.ibd")
          at /mariadb/10.6/storage/innobase/fil/fil0fil.cc:1873
      

      We are renaming tablespace 5 from t2 to t1, then renaming tablespace 6 from t5 to t2 and back. The tablespace 5 already was named t1, and the tablespace 6 was named t2. So, only the third operation results in an attempted rename. But, there already was a t5 at tablespace 8. It had been successfully renamed before the server was killed:

      (rr) break mtr_t::log_file_op
      Breakpoint 1 at 0x55b02cabd18a: file /mariadb/10.6/storage/innobase/fil/fil0fil.cc, line 1760.
      (rr) cond 1 type==FILE_RENAME
      (rr) display log_sys.lsn._M_i
      1: log_sys.lsn._M_i = 0
      (rr) command
      Type commands for breakpoint(s) 1, one per line.
      End with a line saying just "end".
      >when
      >c
      >end
      (rr) c
      Continuing.
      2020-11-13 14:32:44 0 [Note] /dev/shm/10.6/sql/mariadbd (mysqld 10.6.0-MariaDB-debug-log) starting as process 523956 ...
      2020-11-13 14:32:44 0 [Warning] Could not increase number of max_open_files to more than 1024 (request: 32183)
      2020-11-13 14:32:44 0 [Warning] Changed limits: max_open_files: 1024  max_connections: 151 (was 151)  table_cache: 421 (was 2000)
      [New Thread 523956.523961]
      [New Thread 523956.523957]
      [New Thread 523956.523958]
      [New Thread 523956.523960]
      [New Thread 523956.523962]
      [New Thread 523956.523963]
      [New Thread 523956.523964]
      [New Thread 523956.523966]
      [New Thread 523956.523967]
      Thread 1 hit Breakpoint 1, mtr_t::log_file_op (this=this@entry=0x7fff47c42950, type=type@entry=FILE_RENAME, space_id=space_id@entry=5, path=path@entry=0x55b02f10c5a8 "./test/t2.ibd", 
          new_path=new_path@entry=0x55b02f500de8 "./test/t5.ibd") at /mariadb/10.6/storage/innobase/fil/fil0fil.cc:1760
      1760    {
      1: log_sys.lsn._M_i = 79333
      Current event: 47152
      Thread 1 hit Breakpoint 1, mtr_t::log_file_op (this=this@entry=0x7fff47c42950, type=type@entry=FILE_RENAME, space_id=space_id@entry=6, path=path@entry=0x55b02f0f8568 "./test/t1.ibd", 
          new_path=new_path@entry=0x55b02f5c24f8 "./test/t2.ibd") at /mariadb/10.6/storage/innobase/fil/fil0fil.cc:1760
      1760    {
      1: log_sys.lsn._M_i = 79786
      Current event: 49203
      Thread 1 hit Breakpoint 1, mtr_t::log_file_op (this=this@entry=0x7fff47c42950, type=type@entry=FILE_RENAME, space_id=space_id@entry=5, path=path@entry=0x55b02f4b6248 "./test/t5.ibd", 
          new_path=new_path@entry=0x55b02f5c2528 "./test/t1.ibd") at /mariadb/10.6/storage/innobase/fil/fil0fil.cc:1760
      1760    {
      1: log_sys.lsn._M_i = 80239
      Current event: 52863
      [New Thread 523956.523988]
      [New Thread 523956.523986]
      [New Thread 523956.523987]
      [Switching to Thread 523956.523988]
      Thread 11 hit Breakpoint 1, mtr_t::log_file_op (this=this@entry=0x7fdc6c0a61e0, type=type@entry=FILE_RENAME, space_id=space_id@entry=5, path=path@entry=0x55b02f11ebd8 "./test/t1.ibd", 
          new_path=new_path@entry=0x55b02f495d18 "./test/t5.ibd") at /mariadb/10.6/storage/innobase/fil/fil0fil.cc:1760
      1760    {
      1: log_sys.lsn._M_i = 82059
      Current event: 86144
      Thread 11 hit Breakpoint 1, mtr_t::log_file_op (this=this@entry=0x7fdc6c0a61e0, type=type@entry=FILE_RENAME, space_id=space_id@entry=6, path=path@entry=0x55b02f10c5a8 "./test/t2.ibd", 
          new_path=new_path@entry=0x7fdc5003b648 "./test/t1.ibd") at /mariadb/10.6/storage/innobase/fil/fil0fil.cc:1760
      1760    {
      1: log_sys.lsn._M_i = 82530
      Current event: 87797
      Thread 11 hit Breakpoint 1, mtr_t::log_file_op (this=this@entry=0x7fdc6c0a61e0, type=type@entry=FILE_RENAME, space_id=space_id@entry=5, path=path@entry=0x7fdc5005a168 "./test/t5.ibd", 
          new_path=new_path@entry=0x7fdc50041ee8 "./test/t2.ibd") at /mariadb/10.6/storage/innobase/fil/fil0fil.cc:1760
      1760    {
      1: log_sys.lsn._M_i = 83111
      Current event: 89390
      Thread 11 hit Breakpoint 1, mtr_t::log_file_op (this=this@entry=0x7fdc6c0a61e0, type=type@entry=FILE_RENAME, space_id=space_id@entry=8, path=path@entry=0x55b02f0e4558 "./test/t4.ibd", 
          new_path=new_path@entry=0x7fdc50041f18 "./test/t5.ibd") at /mariadb/10.6/storage/innobase/fil/fil0fil.cc:1760
      1760    {
      1: log_sys.lsn._M_i = 83688
      Current event: 90881
      Thread 13 received signal SIGKILL, Killed.
      [Switching to Thread 523956.523987]
      0x0000000070000002 in ?? ()
      1: log_sys.lsn._M_i = 83764
      

      I think that we should fix the redo log apply as follows:

      1. Merge all recovered FILE_RENAME or MLOG_FILE_RENAME2 records, preserving the tablespace identifier and the final desired name.
      2. Implement a debug check that there are no duplicates of the final desired name.
      3. Remove those records where the fil_space_t::chain.start.name already is correct.
      4. Implement a debug check that at most one rename needs to be performed.
      5. Finally, perform any outstanding renames.

      I believe that by design, we will have to perform at most one rename operation. This should be guaranteed by the log_write_up_to() call in fil_name_write_rename().

      Attachments

        Issue Links

          Activity

            Fixing this could fix the root cause of MDEV-16519 and make the workarounds in mariabackup redundant.

            marko Marko Mäkelä added a comment - Fixing this could fix the root cause of MDEV-16519 and make the workarounds in mariabackup redundant.

            I now have a fix that I tested in the bb-10.6-monty branch.
            It will take a little additional effort to port that to 10.2, but I think that it is doable today.

            marko Marko Mäkelä added a comment - I now have a fix that I tested in the bb-10.6-monty branch. It will take a little additional effort to port that to 10.2, but I think that it is doable today.

            I removed the extra log record write.

            vlad.lesin, I pushed one commit to bb-10.6-monty-MDEV-24184 where this is mostly fixed. It turns out that mariabackup --prepare implements its own RENAME logging, which looks like a work-around of this bug.

            I think that in 10.6, we should remove the .ren files and rename_table_in_prepare() altogether. In GA versions, we should ensure that rename_table_in_prepare() will update fil_space_t. Possibly my above mentioned commit is fine as is for 10.5, and there was no bug in Mariabackup after all (thanks to the rename_table_in_prepare(), which strongly feels like a work-around of this recovery bug).

            marko Marko Mäkelä added a comment - I removed the extra log record write . vlad.lesin , I pushed one commit to bb-10.6-monty-MDEV-24184 where this is mostly fixed. It turns out that mariabackup --prepare implements its own RENAME logging, which looks like a work-around of this bug. I think that in 10.6, we should remove the .ren files and rename_table_in_prepare() altogether. In GA versions, we should ensure that rename_table_in_prepare() will update fil_space_t . Possibly my above mentioned commit is fine as is for 10.5, and there was no bug in Mariabackup after all (thanks to the rename_table_in_prepare() , which strongly feels like a work-around of this recovery bug).
            vlad.lesin Vladislav Lesin added a comment - - edited

            marko, mariabackup processes renames with almost the same way you implemented in this task. I.e. it fills some map (space id)->(space name) during log reading, and then creates special files in backup directory with new names, and does renaming on --prepare basing on those files. So, yes, it contains some kind of workaround for this bug.

            vlad.lesin Vladislav Lesin added a comment - - edited marko , mariabackup processes renames with almost the same way you implemented in this task. I.e. it fills some map (space id)->(space name) during log reading, and then creates special files in backup directory with new names, and does renaming on --prepare basing on those files. So, yes, it contains some kind of workaround for this bug.

            One more reason why mariabackup is not affected is that server can't change backup directory. According to the description the following scenario was possible before Marko's fix:

            1) Rename t1->t2,
            2) Rename t2->t1,
            3) Rename t3->t2,

            So if Server had already executed (3) before crash, recovery would fail trying to execute (1).

            In the case of mariabackup, it enumerates data files before starting log copying, i.e. it will copy:

            t1,t3, and start recovery since (1) or
            t2,t3 and start recovery since (2),

            but, anyway, there will not be a situation when backup directory contains t3 renamed to t2 and recovery fails trying to execute (1).

            vlad.lesin Vladislav Lesin added a comment - One more reason why mariabackup is not affected is that server can't change backup directory. According to the description the following scenario was possible before Marko's fix: 1) Rename t1->t2, 2) Rename t2->t1, 3) Rename t3->t2, So if Server had already executed (3) before crash, recovery would fail trying to execute (1). In the case of mariabackup, it enumerates data files before starting log copying, i.e. it will copy: t1,t3, and start recovery since (1) or t2,t3 and start recovery since (2), but, anyway, there will not be a situation when backup directory contains t3 renamed to t2 and recovery fails trying to execute (1).

            When I tried to remove only "rename" processing from mariabackup, I found that we can't remove it separately from "create" and "drop" processing.

            I would start with the claim that ddl recovery logic itself differs from backup ddl restoring logic.

            For example, if some tablespace was created and the server was crashed, the tablespace will not be created during redo log replaying, it will be loaded instead (see mfile_type_t::FILE_CREATE, recv_sys_t::parse(), fil_name_process(), recv_init_crash_recovery_spaces()). But for backup it must be created. Current ddl fixing logic just copies tablespaces created during backup to backup directory, if we remove it, we need to create tablespace during recovery.

            The same is true for "drop" operation. Redo log records are just removed for such tablespaces (see recv_validate_tablespace()) during recovery, but the files itself are not removed, but they must be removed on the case of backup restoring.

            If we return to "rename" ddl's, then at least one issue exists for backup restoring and does not exist for crash recovery - it's circular renames, when there is rename sequence like the following: a->tmp, b->a, tmp->b (see also rename_during_backup.test). The algorithm, proposed here: https://github.com/MariaDB/server/commit/fb219c7cb05e0e486658de05a11cb775e39ecc3d will not manage such renames in the case of backup restoring.

            So we can't just remove ddl fixing code from mariabackup, we also need to fix recovery code so, that it would also suite for backup restoring. One way to do this is to move backup_fix_ddl() logic from mariabackup to InnoDB recovery code, for example in recv_init_crash_recovery_spaces(). I will describe this idea in details in MDEV-18336 comments. The other idea was proposed by marko - it's replaying DDL records at the sequence of their parsing.

            And if we return to the initial idea within the current issue, i.e. to remove only "rename" ddl processing from mariabackup, we will have the bugs when "renames" are mixed with other ddl's.

            Example. Suppose we have "rename t1 to t2; create t1;" sequence in log file, and t1 was copied to backup directory before "rename" was executed by the server, and backup was finished after "create" is executed by the server. Currently backup directory will contain two files for ddl processing during backup prepare: t1.ibd.ren and t1.ibd.new. Before starting recovery t1.ibd.ren will be processed so that t1 will be renamed to t2, then t1.ibd.new will be renamed to t1.ibd. If we just remove rename_table_in_prepare() call from mariabackup, there will not be t1.ibd.ren file, and mariabackup will just overwrites t1.ibd with t1.ibd.new, and then start recovery, what will cause error.

            So if we need complex approach to get rid of separate ddl's processing in mariabackup, and this task should be separated from the current issue. We already created the task for this earlier: MDEV-18336.

            As for the current fix, i.e. https://github.com/MariaDB/server/commit/fb219c7cb05e0e486658de05a11cb775e39ecc3d commit, it works well for recovery and does not break backup preparing, so it's good enough for fixing this issue.

            vlad.lesin Vladislav Lesin added a comment - When I tried to remove only "rename" processing from mariabackup, I found that we can't remove it separately from "create" and "drop" processing. I would start with the claim that ddl recovery logic itself differs from backup ddl restoring logic. For example, if some tablespace was created and the server was crashed, the tablespace will not be created during redo log replaying, it will be loaded instead (see mfile_type_t::FILE_CREATE, recv_sys_t::parse(), fil_name_process(), recv_init_crash_recovery_spaces()). But for backup it must be created. Current ddl fixing logic just copies tablespaces created during backup to backup directory, if we remove it, we need to create tablespace during recovery. The same is true for "drop" operation. Redo log records are just removed for such tablespaces (see recv_validate_tablespace()) during recovery, but the files itself are not removed, but they must be removed on the case of backup restoring. If we return to "rename" ddl's, then at least one issue exists for backup restoring and does not exist for crash recovery - it's circular renames, when there is rename sequence like the following: a->tmp, b->a, tmp->b (see also rename_during_backup.test). The algorithm, proposed here: https://github.com/MariaDB/server/commit/fb219c7cb05e0e486658de05a11cb775e39ecc3d will not manage such renames in the case of backup restoring. So we can't just remove ddl fixing code from mariabackup, we also need to fix recovery code so, that it would also suite for backup restoring. One way to do this is to move backup_fix_ddl() logic from mariabackup to InnoDB recovery code, for example in recv_init_crash_recovery_spaces(). I will describe this idea in details in MDEV-18336 comments. The other idea was proposed by marko - it's replaying DDL records at the sequence of their parsing. And if we return to the initial idea within the current issue, i.e. to remove only "rename" ddl processing from mariabackup, we will have the bugs when "renames" are mixed with other ddl's. Example. Suppose we have "rename t1 to t2; create t1;" sequence in log file, and t1 was copied to backup directory before "rename" was executed by the server, and backup was finished after "create" is executed by the server. Currently backup directory will contain two files for ddl processing during backup prepare: t1.ibd.ren and t1.ibd.new. Before starting recovery t1.ibd.ren will be processed so that t1 will be renamed to t2, then t1.ibd.new will be renamed to t1.ibd. If we just remove rename_table_in_prepare() call from mariabackup, there will not be t1.ibd.ren file, and mariabackup will just overwrites t1.ibd with t1.ibd.new, and then start recovery, what will cause error. So if we need complex approach to get rid of separate ddl's processing in mariabackup, and this task should be separated from the current issue. We already created the task for this earlier: MDEV-18336 . As for the current fix, i.e. https://github.com/MariaDB/server/commit/fb219c7cb05e0e486658de05a11cb775e39ecc3d commit, it works well for recovery and does not break backup preparing, so it's good enough for fixing this issue.

            Thank you vlad.lesin. MDEV-23842 demonstrates that there is a genuine recovery failure for RENAME TABLE operations. The patch that I wrote fixes it. But, there is no need to change Mariabackup, because it uses a different mechanism for correctly replaying rename, create, and delete operations.

            Can you please analyze and fix the failure of the test innodb.log_file_name and push a revised commit to a 10.5-based branch? I think that it suffices to fix this failure in 10.5 and later versions for now. Before MDEV-17567, DDL operations are not fully crash-safe anyway.

            marko Marko Mäkelä added a comment - Thank you vlad.lesin . MDEV-23842 demonstrates that there is a genuine recovery failure for RENAME TABLE operations. The patch that I wrote fixes it. But, there is no need to change Mariabackup, because it uses a different mechanism for correctly replaying rename, create, and delete operations. Can you please analyze and fix the failure of the test innodb.log_file_name and push a revised commit to a 10.5-based branch? I think that it suffices to fix this failure in 10.5 and later versions for now. Before MDEV-17567 , DDL operations are not fully crash-safe anyway.

            People

              vlad.lesin Vladislav Lesin
              kevg Eugene Kosov (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.