[MDEV-14717] RENAME TABLE in InnoDB is not crash-safe - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 5.5(EOL), 10.0(EOL), 10.1(EOL), 10.2(EOL), 10.3(EOL)
Fix Version/s: 10.3.3, 10.2.19
Component/s: Storage Engine - InnoDB
Labels:
- crash
- ddl

Description

The RENAME TABLE operation, which is also internally part of ALTER TABLE when ALGORITHM=COPY is in effect, is not crash-safe within InnoDB.

Starting with MySQL 5.7.5, where I implemented WL#7142 to speed up InnoDB crash recovery and to avoid silently losing redo log entries for InnoDB data files, InnoDB startup may be refused because of a missing file, because no MLOG_FILE_RENAME2 record will be written during RENAME TABLE:

2017-12-20 11:42:44 140737353856896 [Note] InnoDB: Starting crash recovery from checkpoint LSN=1627873

2017-12-20 11:42:44 140737353856896 [ERROR] InnoDB: Tablespace 4 was not found at ./test/t1.ibd.

2017-12-20 11:42:44 140737353856896 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore this and to permanently lose all changes to the tablespace.

2017-12-20 11:42:44 140737353856896 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[2251] with error Tablespace not found

2017-12-20 11:42:45 140737353856896 [Note] InnoDB: Starting shutdown...

Before MariaDB 10.2.2 or MySQL 5.7.5, the server should always start up, but it could fail to find the table:

Version: '10.2.11-MariaDB-debug-log'  socket: '/mariadb/10.2/build/mysql-test/var/tmp/mysqld.1.sock'  port: 16000  Source distribution

2017-12-20 12:37:45 140491764602624 [ERROR] InnoDB: Failed to find tablespace for table `test`.`t1` in the cache. Attempting to load the tablespace with space id 4

2017-12-20 12:37:45 140491764602624 [ERROR] InnoDB: Operating system error number 2 in a file operation.

2017-12-20 12:37:45 140491764602624 [ERROR] InnoDB: The error means the system cannot find the path specified.

2017-12-20 12:37:45 140491764602624 [ERROR] InnoDB: Cannot open datafile for read-only: './test/t1.ibd' OS error: 71

2017-12-20 12:37:45 140491764602624 [ERROR] InnoDB: Operating system error number 2 in a file operation.

2017-12-20 12:37:45 140491764602624 [ERROR] InnoDB: The error means the system cannot find the path specified.

2017-12-20 12:37:45 140491764602624 [ERROR] InnoDB: Could not find a valid tablespace file for `test/t1`. Please refer to http://dev.mysql.com/doc/refman/5.7/en/innodb-troubleshooting-datadict.html for how to resolve the issue.

Both problems can be reproduced with the following instrumentation:

diff --git a/storage/innobase/row/row0mysql.cc b/storage/innobase/row/row0mysql.cc

index 4f944fd5c0d..b8ad3a297a2 100644

--- a/storage/innobase/row/row0mysql.cc

+++ b/storage/innobase/row/row0mysql.cc

@@ -4915,6 +4915,7 @@ row_rename_table_for_mysql(

 	if (commit) {

+		DEBUG_SYNC(trx->mysql_thd, "before_rename_table_commit");

 		trx_commit_for_mysql(trx);

and the following test:

--source include/have_innodb.inc

--source include/have_debug.inc

--source include/have_debug_sync.inc

--source include/not_embedded.inc

CREATE TABLE t1 (a INT UNSIGNED PRIMARY KEY) ENGINE=InnoDB;

INSERT INTO t1 VALUES(42);

--source include/restart_mysqld.inc

--connect (con1,localhost,root,,test)

SET DEBUG_SYNC='before_rename_table_commit SIGNAL renamed WAIT_FOR ever';

--send

RENAME TABLE t1 TO t2;

--connection default

SET DEBUG_SYNC='now WAIT_FOR renamed';

--let $shutdown_timeout=0

--source include/restart_mysqld.inc

--disconnect con1

SELECT * FROM t1;

DROP TABLE t1;

Remove the first invocation of restart_mysqld.inc to reproduce the failure to startup.
(In the unlikely event that a log checkpoint occurs between the INSERT and the next restart_mysqld.inc, InnoDB would still be able to start up.)

How to fix this?

Always write MLOG_FILE_RENAME2 records before renaming any .ibd files. Currently it may be the case that these records are only written during ALTER TABLE…ALGORITHM=INPLACE when the rebuilt table is being swapped.
Before writing the MLOG_FILE_RENAME2 record, write a new type of an undo log record, so that in case the data dictionary transaction is rolled back, the file will be renamed back too.

How to work around the bug? While the server is offline, manually rename the .ibd files back so that they match the data dictionary (and in this case, the .frm files).

If we introduced a new undo log record type in a GA version of MariaDB, this could prevent a downgrade to an earlier GA version and violate our compatibility rules.

If we started writing MLOG_FILE_RENAME2 redo log records in MariaDB 10.2, then users should not see InnoDB startup failures related to this, but instead they would encounter missing tables. If there was any incomplete transaction that operated on the table, the rollback of that recovered transaction would skip and thus corrupt the table. The status quo would seem better: a startup after manually renaming the .ibd file back should succeed.

Attachments

Issue Links

blocks

MDEV-372 Table gets fatally corrupted if server crashes during ALTER TABLE, "table doesn't exist" is reported

Closed

MDEV-13564 TRUNCATE TABLE and undo tablespace truncation are not compatible with Mariabackup

Closed

causes

MDEV-17939 Assertion `++loop_count < 2' failed in trx_undo_report_rename

Closed

MDEV-24184 InnoDB RENAME TABLE recovery failure if names are reused

Closed

is part of

MDEV-14585 Automatically remove #sql- tables in innodb dictionary during recovery

Closed

relates to

MDEV-17158 TRUNCATE is not atomic after MDEV-13564

Closed

MDEV-18733 MariaDB slow start after crash recovery

Closed

MDEV-20677 Renaming files may not be filesystem-crash-safe

Open

MDEV-10667 InnoDB: Failed to find tablespace for table

Closed

MDEV-11657 Cross-engine transaction metadata

Open

MDEV-11742 [Draft] InnoDB: Failing assertion: mysql_table

Closed

MDEV-14418 Failing assertion: table->data_dir_path in row0mysql.cc line 4038

Confirmed

MDEV-23842 Atomic RENAME TABLE

Closed

(8 relates to)

Activity

People

Assignee:: Marko Mäkelä

Reporter:: Marko Mäkelä

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2017-12-20 10:59

Updated:: 2021-04-28 08:51

Resolved:: 2017-12-20 20:43

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server