The purpose of this task is to ensure that RENAME TABLE is atomic.
Either all renames and logging to binary log will succeed or all
renames will roll back, even if the server would crash at any point in
rename process.
This task also includes atomic RENAME VIEW.
How rename works in 10.5:
mysql_rename_tables() checks if tables are log tables and if yes and
not renamed correctly aborts with an error
- rename_tables() is are called
- Loop over all tables in table list, two at a time
If 'from' table is temporary table call do_rename_temporary()
else call do_rename()
- First handler rename is called, then .frm is renamed
- If failure, rename changes tables back to their original names
Notes
If a table is discovered during rename, the .frm file will be created.
We can use this fact to discover if a rename was completely done or not.
Special cases to consider:
renaming temporary tables and normal tables in same query
Swapping tables:
RENAME TABLE t1 TO tmp, t2 TO t1, tmp TO t2;
RENAME TABLE t1 TO tmp, t2 TO t1, tmp TO t2, t3 TO t4;
renaming table over old table (should fail):
RENAME TABLE t1 TO t2, t3 TO existing_table;
RENAME TABLE t1 TO t2, t3 TO t2;
Renaming not existing table:
RENAME TABLE t1 TO t2, not_existing_table TO t3;
Re-renaming a table:
RENAME TABLE t1 TO t2, t2 TO t3;
The code also needs to take care of that a rename can fail in the engine for various reasons
and we have to be able to back out from any stage in the rename. Some of the reasons for
failures are:
Internal foreign key constraints in the engine, not noticed until rename
One of the tables table is on a read-only storage or protected by the engine
Some files for an engine is missing (like in MyISAM where there should be an .MYI and MYD file but one of the files has are missing or corrupted
Engine doesn't support rename
How atomic rename should work:
Normal operation
- Loop over all tables and remember if they are temporary or not.
- Loop over all tables and check that there are no renames of not existing
tables over over existing tables and give errors early (this is a change
from how things works now, where we try to do all renames and rename back
things on failure).
This code also have to take into account all renames that would happen
before. The above examples should cover those cases.
- Log to ddl log:
- Query
- Information of which tables where temporary
- Short uuid
- Position of binary log (if open)
- Loop over all tables
- Add rename to be done in ddl log
- Rename table
- Mark in the ddl log that the rename was done.
- Update triggers and table rows in status tables
- Update xid of query to ddl_log
- Log query to binary log together with xid.
- Mark ddl log entry done
Recovery of active atomic rename entry
In following code, all temporary tables can be ignored as these has
already been deleted by the server.
- If last table is marked as 'completed', loop over all entries in
the binary log, starting from recorded binary log position and check if
there is an entry with the ddl log uuid.
- If yes
- The renames succeed. Mark log entry done end exit
- Start from table list at last table that was marked renamed
- If not last table and next table exist and is not complete
(.frm doesn't exist or handler open doesn't work)
- The crash happened in the middle of the rename or just after
Do a 'handler force rename' back from 'to' to 'from'.
Rename 'to'.frm to 'from.frm' if to.frm exists.
- In reverse order from current table to start of table list
- Mark in log that we are at start of current table.
- Rename table 'to' to 'from'
- Mark log entry done
New handler calls needed
We need a new handler call 'force rename'. For engines that can't do atomic
ddl internally, this should do the rename of all 'from.*' files to 'to.*'
files relevant for the engine.
Attachments
Issue Links
is blocked by
MDEV-24184InnoDB RENAME TABLE recovery failure if names are reused
For the record, MDEV-14717 introduced an InnoDB undo log record to guarantee that RENAME operations are transactional within the storage engine. That did not cover .frm files.
Marko Mäkelä
added a comment - For the record, MDEV-14717 introduced an InnoDB undo log record to guarantee that RENAME operations are transactional within the storage engine. That did not cover .frm files.
In atomic.rename_table InnoDB redo log fails on renaming t5 -> t1. t1 already exists.
On a previous run the following renames have happened: t1 -> t5 and t2 -> t1.
But that t5 -> t1 probably comes from here (this is a previous daemon run before the crashed run):
#0 fil_name_write_rename (space_id=5, old_name=0x3b6c320 "./test/t5.ibd", new_name=0x3c99c40 "./test/t1.ibd") at fil0fil.cc:1839
#1 0x00000000014b634b in fil_rename_tablespace (id=5, old_path=0x3b6c320 "./test/t5.ibd", new_name=0x7ffde7c16af0 "test/t1", new_path_in=0x3b35600 "./test/t1.ibd") at fil0fil.cc:2533
#2 0x00000000014b7d34 in fil_space_t::rename (this=0x3b6c110, name=0x7ffde7c16af0 "test/t1", path=0x3b35600 "./test/t1.ibd", log=true, replace=false) at fil0fil.cc:2475
#3 0x0000000001537044 in dict_table_rename_in_cache (table=0x7fb2e80117f0, new_name=0x7ffde7c16af0 "test/t1", rename_also_foreigns=true, replace_new_file=false) at dict0dict.cc:1599
#4 0x000000000182268a in row_rename_table_for_mysql (old_name=0x7ffde7c168f0 "test/t5", new_name=0x7ffde7c16af0 "test/t1", trx=0x7fb3055751b0, commit=true, use_fk=true) at row0mysql.cc:4493
#5 0x0000000001499b7c in innobase_rename_table (trx=0x7fb3055751b0, from=0x7ffde7c17c40 "./test/t5", to=0x7ffde7c17a30 "./test/t1", commit=true, use_fk=true) at ha_innodb.cc:13261
#6 0x000000000147dcac in ha_innobase::rename_table (this=0x3d5e930, from=0x7ffde7c17c40 "./test/t5", to=0x7ffde7c17a30 "./test/t1") at ha_innodb.cc:13452
#7 0x0000000000bfeb57 in handler::ha_rename_table (this=0x3d5e930, from=0x7ffde7c17c40 "./test/t5", to=0x7ffde7c17a30 "./test/t1") at handler.cc:4967
#8 0x000000000112d6eb in ddl_log_execute_action (thd=0x3c92fb8, mem_root=0x7ffde7c17e98, ddl_log_entry=0x7ffde7c17ee0) at ddl_log.cc:1042
#9 0x000000000112b319 in ddl_log_execute_entry_no_lock (thd=0x3c92fb8, first_entry=1) at ddl_log.cc:1197
#10 0x000000000112b756 in ddl_log_execute_recovery () at ddl_log.cc:1521
#11 0x0000000000bb8f3d in mysqld_main (argc=164, argv=0x39945f0) at mysqld.cc:5602
#12 0x0000000000bb6092 in main (argc=23, argv=0x7ffde7c182d8) at main.cc:25
So, recovery from a binlog renames InnoDB data files too! And it writes a redo log for such renamings. Should that code rename InnoDB files at all and generate a redo log?
Eugene Kosov (Inactive)
added a comment - In atomic.rename_table InnoDB redo log fails on renaming t5 -> t1 . t1 already exists.
On a previous run the following renames have happened: t1 -> t5 and t2 -> t1 .
But that t5 -> t1 probably comes from here (this is a previous daemon run before the crashed run):
#0 fil_name_write_rename (space_id=5, old_name=0x3b6c320 "./test/t5.ibd", new_name=0x3c99c40 "./test/t1.ibd") at fil0fil.cc:1839
#1 0x00000000014b634b in fil_rename_tablespace (id=5, old_path=0x3b6c320 "./test/t5.ibd", new_name=0x7ffde7c16af0 "test/t1", new_path_in=0x3b35600 "./test/t1.ibd") at fil0fil.cc:2533
#2 0x00000000014b7d34 in fil_space_t::rename (this=0x3b6c110, name=0x7ffde7c16af0 "test/t1", path=0x3b35600 "./test/t1.ibd", log=true, replace=false) at fil0fil.cc:2475
#3 0x0000000001537044 in dict_table_rename_in_cache (table=0x7fb2e80117f0, new_name=0x7ffde7c16af0 "test/t1", rename_also_foreigns=true, replace_new_file=false) at dict0dict.cc:1599
#4 0x000000000182268a in row_rename_table_for_mysql (old_name=0x7ffde7c168f0 "test/t5", new_name=0x7ffde7c16af0 "test/t1", trx=0x7fb3055751b0, commit=true, use_fk=true) at row0mysql.cc:4493
#5 0x0000000001499b7c in innobase_rename_table (trx=0x7fb3055751b0, from=0x7ffde7c17c40 "./test/t5", to=0x7ffde7c17a30 "./test/t1", commit=true, use_fk=true) at ha_innodb.cc:13261
#6 0x000000000147dcac in ha_innobase::rename_table (this=0x3d5e930, from=0x7ffde7c17c40 "./test/t5", to=0x7ffde7c17a30 "./test/t1") at ha_innodb.cc:13452
#7 0x0000000000bfeb57 in handler::ha_rename_table (this=0x3d5e930, from=0x7ffde7c17c40 "./test/t5", to=0x7ffde7c17a30 "./test/t1") at handler.cc:4967
#8 0x000000000112d6eb in ddl_log_execute_action (thd=0x3c92fb8, mem_root=0x7ffde7c17e98, ddl_log_entry=0x7ffde7c17ee0) at ddl_log.cc:1042
#9 0x000000000112b319 in ddl_log_execute_entry_no_lock (thd=0x3c92fb8, first_entry=1) at ddl_log.cc:1197
#10 0x000000000112b756 in ddl_log_execute_recovery () at ddl_log.cc:1521
#11 0x0000000000bb8f3d in mysqld_main (argc=164, argv=0x39945f0) at mysqld.cc:5602
#12 0x0000000000bb6092 in main (argc=23, argv=0x7ffde7c182d8) at main.cc:25
So, recovery from a binlog renames InnoDB data files too! And it writes a redo log for such renamings. Should that code rename InnoDB files at all and generate a redo log?
According to my analysis, the InnoDB recovery of RENAME may fail if multiple tables were renamed since the latest log checkpoint, such that the same names are being reused. We will fix that in MDEV-24184.
Marko Mäkelä
added a comment - According to my analysis, the InnoDB recovery of RENAME may fail if multiple tables were renamed since the latest log checkpoint, such that the same names are being reused. We will fix that in MDEV-24184 .
For the record,
MDEV-14717introduced an InnoDB undo log record to guarantee that RENAME operations are transactional within the storage engine. That did not cover .frm files.