Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
5.5(EOL), 10.0(EOL), 10.1(EOL), 10.2(EOL), 10.3(EOL)
Description
The RENAME TABLE operation, which is also internally part of ALTER TABLE when ALGORITHM=COPY is in effect, is not crash-safe within InnoDB.
Starting with MySQL 5.7.5, where I implemented WL#7142 to speed up InnoDB crash recovery and to avoid silently losing redo log entries for InnoDB data files, InnoDB startup may be refused because of a missing file, because no MLOG_FILE_RENAME2 record will be written during RENAME TABLE:
2017-12-20 11:42:44 140737353856896 [Note] InnoDB: Starting crash recovery from checkpoint LSN=1627873
|
2017-12-20 11:42:44 140737353856896 [ERROR] InnoDB: Tablespace 4 was not found at ./test/t1.ibd.
|
2017-12-20 11:42:44 140737353856896 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore this and to permanently lose all changes to the tablespace.
|
2017-12-20 11:42:44 140737353856896 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[2251] with error Tablespace not found
|
2017-12-20 11:42:45 140737353856896 [Note] InnoDB: Starting shutdown...
|
Before MariaDB 10.2.2 or MySQL 5.7.5, the server should always start up, but it could fail to find the table:
Version: '10.2.11-MariaDB-debug-log' socket: '/mariadb/10.2/build/mysql-test/var/tmp/mysqld.1.sock' port: 16000 Source distribution
|
2017-12-20 12:37:45 140491764602624 [ERROR] InnoDB: Failed to find tablespace for table `test`.`t1` in the cache. Attempting to load the tablespace with space id 4
|
2017-12-20 12:37:45 140491764602624 [ERROR] InnoDB: Operating system error number 2 in a file operation.
|
2017-12-20 12:37:45 140491764602624 [ERROR] InnoDB: The error means the system cannot find the path specified.
|
2017-12-20 12:37:45 140491764602624 [ERROR] InnoDB: Cannot open datafile for read-only: './test/t1.ibd' OS error: 71
|
2017-12-20 12:37:45 140491764602624 [ERROR] InnoDB: Operating system error number 2 in a file operation.
|
2017-12-20 12:37:45 140491764602624 [ERROR] InnoDB: The error means the system cannot find the path specified.
|
2017-12-20 12:37:45 140491764602624 [ERROR] InnoDB: Could not find a valid tablespace file for `test/t1`. Please refer to http://dev.mysql.com/doc/refman/5.7/en/innodb-troubleshooting-datadict.html for how to resolve the issue.
|
Both problems can be reproduced with the following instrumentation:
diff --git a/storage/innobase/row/row0mysql.cc b/storage/innobase/row/row0mysql.cc
|
index 4f944fd5c0d..b8ad3a297a2 100644
|
--- a/storage/innobase/row/row0mysql.cc
|
+++ b/storage/innobase/row/row0mysql.cc
|
@@ -4915,6 +4915,7 @@ row_rename_table_for_mysql(
|
}
|
|
if (commit) {
|
+ DEBUG_SYNC(trx->mysql_thd, "before_rename_table_commit");
|
trx_commit_for_mysql(trx);
|
}
|
|
and the following test:
--source include/have_innodb.inc
|
--source include/have_debug.inc
|
--source include/have_debug_sync.inc
|
--source include/not_embedded.inc
|
|
CREATE TABLE t1 (a INT UNSIGNED PRIMARY KEY) ENGINE=InnoDB; |
INSERT INTO t1 VALUES(42); |
--source include/restart_mysqld.inc
|
|
--connect (con1,localhost,root,,test)
|
SET DEBUG_SYNC='before_rename_table_commit SIGNAL renamed WAIT_FOR ever'; |
--send
|
RENAME TABLE t1 TO t2; |
--connection default
|
SET DEBUG_SYNC='now WAIT_FOR renamed'; |
--let $shutdown_timeout=0
|
--source include/restart_mysqld.inc
|
--disconnect con1
|
SELECT * FROM t1; |
DROP TABLE t1; |
Remove the first invocation of restart_mysqld.inc to reproduce the failure to startup.
(In the unlikely event that a log checkpoint occurs between the INSERT and the next restart_mysqld.inc, InnoDB would still be able to start up.)
How to fix this?
- Always write MLOG_FILE_RENAME2 records before renaming any .ibd files. Currently it may be the case that these records are only written during ALTER TABLE…ALGORITHM=INPLACE when the rebuilt table is being swapped.
- Before writing the MLOG_FILE_RENAME2 record, write a new type of an undo log record, so that in case the data dictionary transaction is rolled back, the file will be renamed back too.
How to work around the bug? While the server is offline, manually rename the .ibd files back so that they match the data dictionary (and in this case, the .frm files).
If we introduced a new undo log record type in a GA version of MariaDB, this could prevent a downgrade to an earlier GA version and violate our compatibility rules.
If we started writing MLOG_FILE_RENAME2 redo log records in MariaDB 10.2, then users should not see InnoDB startup failures related to this, but instead they would encounter missing tables. If there was any incomplete transaction that operated on the table, the rollback of that recovered transaction would skip and thus corrupt the table. The status quo would seem better: a startup after manually renaming the .ibd file back should succeed.
Attachments
Issue Links
- blocks
-
MDEV-372 Table gets fatally corrupted if server crashes during ALTER TABLE, "table doesn't exist" is reported
- Closed
-
MDEV-13564 TRUNCATE TABLE and undo tablespace truncation are not compatible with Mariabackup
- Closed
- causes
-
MDEV-17939 Assertion `++loop_count < 2' failed in trx_undo_report_rename
- Closed
-
MDEV-24184 InnoDB RENAME TABLE recovery failure if names are reused
- Closed
- is part of
-
MDEV-14585 Automatically remove #sql- tables in innodb dictionary during recovery
- Closed
- relates to
-
MDEV-17158 TRUNCATE is not atomic after MDEV-13564
- Closed
-
MDEV-18733 MariaDB slow start after crash recovery
- Closed
-
MDEV-20677 Renaming files may not be filesystem-crash-safe
- Open
-
MDEV-10667 InnoDB: Failed to find tablespace for table
- Closed
-
MDEV-11657 Cross-engine transaction metadata
- Open
-
MDEV-11742 [Draft] InnoDB: Failing assertion: mysql_table
- Closed
-
MDEV-14418 Failing assertion: table->data_dir_path in row0mysql.cc line 4038
- Confirmed
-
MDEV-23842 Atomic RENAME TABLE
- Closed