Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-24184

InnoDB RENAME TABLE recovery failure if names are reused

    XMLWordPrintable

Details

    Description

      This came up in MDEV-23842. The scenario involves logging multiple RENAME operations without any log checkpoint in between.

      First of all, at least starting with MariaDB 10.5 (and the MDEV-12353 changes), we are writing a duplicate record:

      diff --git a/storage/innobase/fil/fil0fil.cc b/storage/innobase/fil/fil0fil.cc
      index 5e6bd9575b1..d279e77a105 100644
      --- a/storage/innobase/fil/fil0fil.cc
      +++ b/storage/innobase/fil/fil0fil.cc
      @@ -2530,7 +2530,6 @@ fil_rename_tablespace(
       	ut_ad(strchr(new_file_name, OS_PATH_SEPARATOR) != NULL);
       
       	if (!recv_recovery_is_on()) {
      -		fil_name_write_rename(id, old_file_name, new_file_name);
       		log_mutex_enter();
       	}
       
      

      With that fix applied, the recovery would fail on the third rename starting from checkpoint_lsn=77739:

      Thread 1 hit Breakpoint 2, fil_op_replay_rename (space_id=6, name=name@entry=0x7f51bd09f1f1 "./test/t2.ibd", new_name=new_name@entry=0x7f51bd09f1ff "./test/t1.ibd")
          at /mariadb/10.6/storage/innobase/fil/fil0fil.cc:1873
      1873	{
      Current event: 6248Thread 1 hit Breakpoint 2, fil_op_replay_rename (space_id=5, name=name@entry=0x7f51bd09f41d "./test/t5.ibd", new_name=new_name@entry=0x7f51bd09f42b "./test/t2.ibd")
          at /mariadb/10.6/storage/innobase/fil/fil0fil.cc:1873
      1873	{
      Current event: 6308Thread 1 hit Breakpoint 2, fil_op_replay_rename (space_id=5, name=name@entry=0x7f51bd09f60e "./test/t2.ibd", new_name=new_name@entry=0x7f51bd09f61c "./test/t5.ibd")
          at /mariadb/10.6/storage/innobase/fil/fil0fil.cc:1873
      

      We are renaming tablespace 5 from t2 to t1, then renaming tablespace 6 from t5 to t2 and back. The tablespace 5 already was named t1, and the tablespace 6 was named t2. So, only the third operation results in an attempted rename. But, there already was a t5 at tablespace 8. It had been successfully renamed before the server was killed:

      (rr) break mtr_t::log_file_op
      Breakpoint 1 at 0x55b02cabd18a: file /mariadb/10.6/storage/innobase/fil/fil0fil.cc, line 1760.
      (rr) cond 1 type==FILE_RENAME
      (rr) display log_sys.lsn._M_i
      1: log_sys.lsn._M_i = 0
      (rr) command
      Type commands for breakpoint(s) 1, one per line.
      End with a line saying just "end".
      >when
      >c
      >end
      (rr) c
      Continuing.
      2020-11-13 14:32:44 0 [Note] /dev/shm/10.6/sql/mariadbd (mysqld 10.6.0-MariaDB-debug-log) starting as process 523956 ...
      2020-11-13 14:32:44 0 [Warning] Could not increase number of max_open_files to more than 1024 (request: 32183)
      2020-11-13 14:32:44 0 [Warning] Changed limits: max_open_files: 1024  max_connections: 151 (was 151)  table_cache: 421 (was 2000)
      [New Thread 523956.523961]
      [New Thread 523956.523957]
      [New Thread 523956.523958]
      [New Thread 523956.523960]
      [New Thread 523956.523962]
      [New Thread 523956.523963]
      [New Thread 523956.523964]
      [New Thread 523956.523966]
      [New Thread 523956.523967]
      Thread 1 hit Breakpoint 1, mtr_t::log_file_op (this=this@entry=0x7fff47c42950, type=type@entry=FILE_RENAME, space_id=space_id@entry=5, path=path@entry=0x55b02f10c5a8 "./test/t2.ibd", 
          new_path=new_path@entry=0x55b02f500de8 "./test/t5.ibd") at /mariadb/10.6/storage/innobase/fil/fil0fil.cc:1760
      1760    {
      1: log_sys.lsn._M_i = 79333
      Current event: 47152
      Thread 1 hit Breakpoint 1, mtr_t::log_file_op (this=this@entry=0x7fff47c42950, type=type@entry=FILE_RENAME, space_id=space_id@entry=6, path=path@entry=0x55b02f0f8568 "./test/t1.ibd", 
          new_path=new_path@entry=0x55b02f5c24f8 "./test/t2.ibd") at /mariadb/10.6/storage/innobase/fil/fil0fil.cc:1760
      1760    {
      1: log_sys.lsn._M_i = 79786
      Current event: 49203
      Thread 1 hit Breakpoint 1, mtr_t::log_file_op (this=this@entry=0x7fff47c42950, type=type@entry=FILE_RENAME, space_id=space_id@entry=5, path=path@entry=0x55b02f4b6248 "./test/t5.ibd", 
          new_path=new_path@entry=0x55b02f5c2528 "./test/t1.ibd") at /mariadb/10.6/storage/innobase/fil/fil0fil.cc:1760
      1760    {
      1: log_sys.lsn._M_i = 80239
      Current event: 52863
      [New Thread 523956.523988]
      [New Thread 523956.523986]
      [New Thread 523956.523987]
      [Switching to Thread 523956.523988]
      Thread 11 hit Breakpoint 1, mtr_t::log_file_op (this=this@entry=0x7fdc6c0a61e0, type=type@entry=FILE_RENAME, space_id=space_id@entry=5, path=path@entry=0x55b02f11ebd8 "./test/t1.ibd", 
          new_path=new_path@entry=0x55b02f495d18 "./test/t5.ibd") at /mariadb/10.6/storage/innobase/fil/fil0fil.cc:1760
      1760    {
      1: log_sys.lsn._M_i = 82059
      Current event: 86144
      Thread 11 hit Breakpoint 1, mtr_t::log_file_op (this=this@entry=0x7fdc6c0a61e0, type=type@entry=FILE_RENAME, space_id=space_id@entry=6, path=path@entry=0x55b02f10c5a8 "./test/t2.ibd", 
          new_path=new_path@entry=0x7fdc5003b648 "./test/t1.ibd") at /mariadb/10.6/storage/innobase/fil/fil0fil.cc:1760
      1760    {
      1: log_sys.lsn._M_i = 82530
      Current event: 87797
      Thread 11 hit Breakpoint 1, mtr_t::log_file_op (this=this@entry=0x7fdc6c0a61e0, type=type@entry=FILE_RENAME, space_id=space_id@entry=5, path=path@entry=0x7fdc5005a168 "./test/t5.ibd", 
          new_path=new_path@entry=0x7fdc50041ee8 "./test/t2.ibd") at /mariadb/10.6/storage/innobase/fil/fil0fil.cc:1760
      1760    {
      1: log_sys.lsn._M_i = 83111
      Current event: 89390
      Thread 11 hit Breakpoint 1, mtr_t::log_file_op (this=this@entry=0x7fdc6c0a61e0, type=type@entry=FILE_RENAME, space_id=space_id@entry=8, path=path@entry=0x55b02f0e4558 "./test/t4.ibd", 
          new_path=new_path@entry=0x7fdc50041f18 "./test/t5.ibd") at /mariadb/10.6/storage/innobase/fil/fil0fil.cc:1760
      1760    {
      1: log_sys.lsn._M_i = 83688
      Current event: 90881
      Thread 13 received signal SIGKILL, Killed.
      [Switching to Thread 523956.523987]
      0x0000000070000002 in ?? ()
      1: log_sys.lsn._M_i = 83764
      

      I think that we should fix the redo log apply as follows:

      1. Merge all recovered FILE_RENAME or MLOG_FILE_RENAME2 records, preserving the tablespace identifier and the final desired name.
      2. Implement a debug check that there are no duplicates of the final desired name.
      3. Remove those records where the fil_space_t::chain.start.name already is correct.
      4. Implement a debug check that at most one rename needs to be performed.
      5. Finally, perform any outstanding renames.

      I believe that by design, we will have to perform at most one rename operation. This should be guaranteed by the log_write_up_to() call in fil_name_write_rename().

      Attachments

        Issue Links

          Activity

            People

              vlad.lesin Vladislav Lesin
              kevg Eugene Kosov (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.