Details

    Description

      Atomic rename table

      The purpose of this task is to ensure that RENAME TABLE is atomic.
      Either all renames and logging to binary log will succeed or all
      renames will roll back, even if the server would crash at any point in
      rename process.

      This task also includes atomic RENAME VIEW.

      How rename works in 10.5:

      mysql_rename_tables() checks if tables are log tables and if yes and
      not renamed correctly aborts with an error
      - rename_tables() is are called
         - Loop over all tables in table list, two at a time
           If 'from' table is temporary table call do_rename_temporary()
            else call do_rename()
            - First handler rename is called, then .frm is renamed
       
      - If failure, rename changes tables back to their original names
      

      Notes

      • If a table is discovered during rename, the .frm file will be created.
        We can use this fact to discover if a rename was completely done or not.

      Special cases to consider:

      • renaming temporary tables and normal tables in same query
      • Swapping tables:
        RENAME TABLE t1 TO tmp, t2 TO t1, tmp TO t2;
        RENAME TABLE t1 TO tmp, t2 TO t1, tmp TO t2, t3 TO t4;
      • renaming table over old table (should fail):
        RENAME TABLE t1 TO t2, t3 TO existing_table;
        RENAME TABLE t1 TO t2, t3 TO t2;
      • Renaming not existing table:
        RENAME TABLE t1 TO t2, not_existing_table TO t3;
      • Re-renaming a table:
        RENAME TABLE t1 TO t2, t2 TO t3;

      The code also needs to take care of that a rename can fail in the engine for various reasons
      and we have to be able to back out from any stage in the rename. Some of the reasons for
      failures are:

      • Internal foreign key constraints in the engine, not noticed until rename
      • One of the tables table is on a read-only storage or protected by the engine
      • Some files for an engine is missing (like in MyISAM where there should be an .MYI and MYD file but one of the files has are missing or corrupted
      • Engine doesn't support rename

      How atomic rename should work:

      Normal operation

      - Loop over all tables and remember if they are temporary or not.
      - Loop over all tables and check that there are no renames of not existing
        tables over over existing tables and give errors early (this is a change
        from how things works now, where we try to do all renames and rename back
        things on failure).
        This code also have to take into account all renames that would happen
        before. The above examples should cover those cases.
      - Log to ddl log:
        - Query
         - Information of which tables where temporary
         - Short uuid 
         - Position of binary log (if open)
       
      - Loop over all tables
         - Add rename to be done in ddl log
         - Rename table
         - Mark in the ddl log that the rename was done.
         - Update triggers and table rows in status tables
      - Update xid of query to ddl_log
      - Log query to binary log together with xid.
      - Mark ddl log entry done
      

      Recovery of active atomic rename entry

      In following code, all temporary tables can be ignored as these has
      already been deleted by the server.

      - If last table is marked as 'completed', loop over all entries in
        the binary log, starting from recorded binary log position and check if
        there is an entry with the ddl log uuid.
        - If yes
           - The renames succeed. Mark log entry done end exit
       
      - Start from table list at last table that was marked renamed
      - If not last table and next table exist and is not complete
        (.frm doesn't exist or handler open doesn't work)
         - The crash happened in the middle of the rename or just after
           Do a 'handler force rename' back from 'to' to 'from'.
           Rename 'to'.frm to 'from.frm' if to.frm exists.
      - In reverse order from current table to start of table list
        - Mark in log that we are at start of current table.
        - Rename table 'to' to 'from'
      - Mark log entry done
      

      New handler calls needed

      • We need a new handler call 'force rename'. For engines that can't do atomic
        ddl internally, this should do the rename of all 'from.*' files to 'to.*'
        files relevant for the engine.

      Attachments

        Issue Links

          Activity

            For the record, MDEV-14717 introduced an InnoDB undo log record to guarantee that RENAME operations are transactional within the storage engine. That did not cover .frm files.

            marko Marko Mäkelä added a comment - For the record, MDEV-14717 introduced an InnoDB undo log record to guarantee that RENAME operations are transactional within the storage engine. That did not cover .frm files.

            In atomic.rename_table InnoDB redo log fails on renaming t5 -> t1. t1 already exists.

            On a previous run the following renames have happened: t1 -> t5 and t2 -> t1.

            But that t5 -> t1 probably comes from here (this is a previous daemon run before the crashed run):

            #0  fil_name_write_rename (space_id=5, old_name=0x3b6c320 "./test/t5.ibd", new_name=0x3c99c40 "./test/t1.ibd") at fil0fil.cc:1839
            #1  0x00000000014b634b in fil_rename_tablespace (id=5, old_path=0x3b6c320 "./test/t5.ibd", new_name=0x7ffde7c16af0 "test/t1", new_path_in=0x3b35600 "./test/t1.ibd") at fil0fil.cc:2533
            #2  0x00000000014b7d34 in fil_space_t::rename (this=0x3b6c110, name=0x7ffde7c16af0 "test/t1", path=0x3b35600 "./test/t1.ibd", log=true, replace=false) at fil0fil.cc:2475
            #3  0x0000000001537044 in dict_table_rename_in_cache (table=0x7fb2e80117f0, new_name=0x7ffde7c16af0 "test/t1", rename_also_foreigns=true, replace_new_file=false) at dict0dict.cc:1599
            #4  0x000000000182268a in row_rename_table_for_mysql (old_name=0x7ffde7c168f0 "test/t5", new_name=0x7ffde7c16af0 "test/t1", trx=0x7fb3055751b0, commit=true, use_fk=true) at row0mysql.cc:4493
            #5  0x0000000001499b7c in innobase_rename_table (trx=0x7fb3055751b0, from=0x7ffde7c17c40 "./test/t5", to=0x7ffde7c17a30 "./test/t1", commit=true, use_fk=true) at ha_innodb.cc:13261
            #6  0x000000000147dcac in ha_innobase::rename_table (this=0x3d5e930, from=0x7ffde7c17c40 "./test/t5", to=0x7ffde7c17a30 "./test/t1") at ha_innodb.cc:13452
            #7  0x0000000000bfeb57 in handler::ha_rename_table (this=0x3d5e930, from=0x7ffde7c17c40 "./test/t5", to=0x7ffde7c17a30 "./test/t1") at handler.cc:4967
            #8  0x000000000112d6eb in ddl_log_execute_action (thd=0x3c92fb8, mem_root=0x7ffde7c17e98, ddl_log_entry=0x7ffde7c17ee0) at ddl_log.cc:1042
            #9  0x000000000112b319 in ddl_log_execute_entry_no_lock (thd=0x3c92fb8, first_entry=1) at ddl_log.cc:1197
            #10 0x000000000112b756 in ddl_log_execute_recovery () at ddl_log.cc:1521
            #11 0x0000000000bb8f3d in mysqld_main (argc=164, argv=0x39945f0) at mysqld.cc:5602
            #12 0x0000000000bb6092 in main (argc=23, argv=0x7ffde7c182d8) at main.cc:25
            

            So, recovery from a binlog renames InnoDB data files too! And it writes a redo log for such renamings. Should that code rename InnoDB files at all and generate a redo log?

            kevg Eugene Kosov (Inactive) added a comment - In atomic.rename_table InnoDB redo log fails on renaming t5 -> t1 . t1 already exists. On a previous run the following renames have happened: t1 -> t5 and t2 -> t1 . But that t5 -> t1 probably comes from here (this is a previous daemon run before the crashed run): #0 fil_name_write_rename (space_id=5, old_name=0x3b6c320 "./test/t5.ibd", new_name=0x3c99c40 "./test/t1.ibd") at fil0fil.cc:1839 #1 0x00000000014b634b in fil_rename_tablespace (id=5, old_path=0x3b6c320 "./test/t5.ibd", new_name=0x7ffde7c16af0 "test/t1", new_path_in=0x3b35600 "./test/t1.ibd") at fil0fil.cc:2533 #2 0x00000000014b7d34 in fil_space_t::rename (this=0x3b6c110, name=0x7ffde7c16af0 "test/t1", path=0x3b35600 "./test/t1.ibd", log=true, replace=false) at fil0fil.cc:2475 #3 0x0000000001537044 in dict_table_rename_in_cache (table=0x7fb2e80117f0, new_name=0x7ffde7c16af0 "test/t1", rename_also_foreigns=true, replace_new_file=false) at dict0dict.cc:1599 #4 0x000000000182268a in row_rename_table_for_mysql (old_name=0x7ffde7c168f0 "test/t5", new_name=0x7ffde7c16af0 "test/t1", trx=0x7fb3055751b0, commit=true, use_fk=true) at row0mysql.cc:4493 #5 0x0000000001499b7c in innobase_rename_table (trx=0x7fb3055751b0, from=0x7ffde7c17c40 "./test/t5", to=0x7ffde7c17a30 "./test/t1", commit=true, use_fk=true) at ha_innodb.cc:13261 #6 0x000000000147dcac in ha_innobase::rename_table (this=0x3d5e930, from=0x7ffde7c17c40 "./test/t5", to=0x7ffde7c17a30 "./test/t1") at ha_innodb.cc:13452 #7 0x0000000000bfeb57 in handler::ha_rename_table (this=0x3d5e930, from=0x7ffde7c17c40 "./test/t5", to=0x7ffde7c17a30 "./test/t1") at handler.cc:4967 #8 0x000000000112d6eb in ddl_log_execute_action (thd=0x3c92fb8, mem_root=0x7ffde7c17e98, ddl_log_entry=0x7ffde7c17ee0) at ddl_log.cc:1042 #9 0x000000000112b319 in ddl_log_execute_entry_no_lock (thd=0x3c92fb8, first_entry=1) at ddl_log.cc:1197 #10 0x000000000112b756 in ddl_log_execute_recovery () at ddl_log.cc:1521 #11 0x0000000000bb8f3d in mysqld_main (argc=164, argv=0x39945f0) at mysqld.cc:5602 #12 0x0000000000bb6092 in main (argc=23, argv=0x7ffde7c182d8) at main.cc:25 So, recovery from a binlog renames InnoDB data files too! And it writes a redo log for such renamings. Should that code rename InnoDB files at all and generate a redo log?

            According to my analysis, the InnoDB recovery of RENAME may fail if multiple tables were renamed since the latest log checkpoint, such that the same names are being reused. We will fix that in MDEV-24184.

            marko Marko Mäkelä added a comment - According to my analysis, the InnoDB recovery of RENAME may fail if multiple tables were renamed since the latest log checkpoint, such that the same names are being reused. We will fix that in MDEV-24184 .
            midenok Aleksey Midenkov added a comment - - edited

            next_entry, execute_entry_pos are redundant (they are stored inside list, execute_entry):

            diff --git a/sql/ddl_log.h b/sql/ddl_log.h
            index fa5ef648956..764622aea0c 100644
            --- a/sql/ddl_log.h
            +++ b/sql/ddl_log.h
            @@ -187,7 +187,6 @@ typedef struct st_ddl_log_memory_entry
             
             typedef struct st_ddl_log_state
             {
            -  uint next_entry, execute_entry_pos;
               /* List of ddl log entries */
               DDL_LOG_MEMORY_ENTRY *list;
               /* One execute entry per list */
            @@ -206,9 +208,6 @@ int ddl_log_execute_recovery();
             bool ddl_log_write_entry(DDL_LOG_ENTRY *ddl_log_entry,
                                        DDL_LOG_MEMORY_ENTRY **active_entry);
             
            -bool ddl_log_write_execute_entry_internal(uint first_entry,
            -                                          bool complete,
            -                                          DDL_LOG_MEMORY_ENTRY **active_entry);
             bool ddl_log_write_execute_entry(uint first_entry,
                                              DDL_LOG_MEMORY_ENTRY **active_entry);
             bool ddl_log_disable_execute_entry(DDL_LOG_MEMORY_ENTRY **active_entry);
            

            ddl_log_write_execute_entry_internal() is not used as well.

            midenok Aleksey Midenkov added a comment - - edited next_entry , execute_entry_pos are redundant (they are stored inside list , execute_entry ): diff --git a/sql/ddl_log.h b/sql/ddl_log.h index fa5ef648956..764622aea0c 100644 --- a/sql/ddl_log.h +++ b/sql/ddl_log.h @@ -187,7 +187,6 @@ typedef struct st_ddl_log_memory_entry   typedef struct st_ddl_log_state { - uint next_entry, execute_entry_pos; /* List of ddl log entries */ DDL_LOG_MEMORY_ENTRY *list; /* One execute entry per list */ @@ -206,9 +208,6 @@ int ddl_log_execute_recovery(); bool ddl_log_write_entry(DDL_LOG_ENTRY *ddl_log_entry, DDL_LOG_MEMORY_ENTRY **active_entry);   -bool ddl_log_write_execute_entry_internal(uint first_entry, - bool complete, - DDL_LOG_MEMORY_ENTRY **active_entry); bool ddl_log_write_execute_entry(uint first_entry, DDL_LOG_MEMORY_ENTRY **active_entry); bool ddl_log_disable_execute_entry(DDL_LOG_MEMORY_ENTRY **active_entry); ddl_log_write_execute_entry_internal() is not used as well.

            People

              monty Michael Widenius
              monty Michael Widenius
              Votes:
              1 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.