Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-14717

RENAME TABLE in InnoDB is not crash-safe

Details

    Description

      The RENAME TABLE operation, which is also internally part of ALTER TABLE when ALGORITHM=COPY is in effect, is not crash-safe within InnoDB.

      Starting with MySQL 5.7.5, where I implemented WL#7142 to speed up InnoDB crash recovery and to avoid silently losing redo log entries for InnoDB data files, InnoDB startup may be refused because of a missing file, because no MLOG_FILE_RENAME2 record will be written during RENAME TABLE:

      2017-12-20 11:42:44 140737353856896 [Note] InnoDB: Starting crash recovery from checkpoint LSN=1627873
      2017-12-20 11:42:44 140737353856896 [ERROR] InnoDB: Tablespace 4 was not found at ./test/t1.ibd.
      2017-12-20 11:42:44 140737353856896 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore this and to permanently lose all changes to the tablespace.
      2017-12-20 11:42:44 140737353856896 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[2251] with error Tablespace not found
      2017-12-20 11:42:45 140737353856896 [Note] InnoDB: Starting shutdown...
      

      Before MariaDB 10.2.2 or MySQL 5.7.5, the server should always start up, but it could fail to find the table:

      Version: '10.2.11-MariaDB-debug-log'  socket: '/mariadb/10.2/build/mysql-test/var/tmp/mysqld.1.sock'  port: 16000  Source distribution
      2017-12-20 12:37:45 140491764602624 [ERROR] InnoDB: Failed to find tablespace for table `test`.`t1` in the cache. Attempting to load the tablespace with space id 4
      2017-12-20 12:37:45 140491764602624 [ERROR] InnoDB: Operating system error number 2 in a file operation.
      2017-12-20 12:37:45 140491764602624 [ERROR] InnoDB: The error means the system cannot find the path specified.
      2017-12-20 12:37:45 140491764602624 [ERROR] InnoDB: Cannot open datafile for read-only: './test/t1.ibd' OS error: 71
      2017-12-20 12:37:45 140491764602624 [ERROR] InnoDB: Operating system error number 2 in a file operation.
      2017-12-20 12:37:45 140491764602624 [ERROR] InnoDB: The error means the system cannot find the path specified.
      2017-12-20 12:37:45 140491764602624 [ERROR] InnoDB: Could not find a valid tablespace file for `test/t1`. Please refer to http://dev.mysql.com/doc/refman/5.7/en/innodb-troubleshooting-datadict.html for how to resolve the issue.
      

      Both problems can be reproduced with the following instrumentation:

      diff --git a/storage/innobase/row/row0mysql.cc b/storage/innobase/row/row0mysql.cc
      index 4f944fd5c0d..b8ad3a297a2 100644
      --- a/storage/innobase/row/row0mysql.cc
      +++ b/storage/innobase/row/row0mysql.cc
      @@ -4915,6 +4915,7 @@ row_rename_table_for_mysql(
       	}
       
       	if (commit) {
      +		DEBUG_SYNC(trx->mysql_thd, "before_rename_table_commit");
       		trx_commit_for_mysql(trx);
       	}
       
      

      and the following test:

      --source include/have_innodb.inc
      --source include/have_debug.inc
      --source include/have_debug_sync.inc
      --source include/not_embedded.inc
       
      CREATE TABLE t1 (a INT UNSIGNED PRIMARY KEY) ENGINE=InnoDB;
      INSERT INTO t1 VALUES(42);
      --source include/restart_mysqld.inc
       
      --connect (con1,localhost,root,,test)
      SET DEBUG_SYNC='before_rename_table_commit SIGNAL renamed WAIT_FOR ever';
      --send
      RENAME TABLE t1 TO t2;
      --connection default
      SET DEBUG_SYNC='now WAIT_FOR renamed';
      --let $shutdown_timeout=0
      --source include/restart_mysqld.inc
      --disconnect con1
      SELECT * FROM t1;
      DROP TABLE t1;
      

      Remove the first invocation of restart_mysqld.inc to reproduce the failure to startup.
      (In the unlikely event that a log checkpoint occurs between the INSERT and the next restart_mysqld.inc, InnoDB would still be able to start up.)

      How to fix this?

      1. Always write MLOG_FILE_RENAME2 records before renaming any .ibd files. Currently it may be the case that these records are only written during ALTER TABLE…ALGORITHM=INPLACE when the rebuilt table is being swapped.
      2. Before writing the MLOG_FILE_RENAME2 record, write a new type of an undo log record, so that in case the data dictionary transaction is rolled back, the file will be renamed back too.

      How to work around the bug? While the server is offline, manually rename the .ibd files back so that they match the data dictionary (and in this case, the .frm files).

      If we introduced a new undo log record type in a GA version of MariaDB, this could prevent a downgrade to an earlier GA version and violate our compatibility rules.

      If we started writing MLOG_FILE_RENAME2 redo log records in MariaDB 10.2, then users should not see InnoDB startup failures related to this, but instead they would encounter missing tables. If there was any incomplete transaction that operated on the table, the rollback of that recovered transaction would skip and thus corrupt the table. The status quo would seem better: a startup after manually renaming the .ibd file back should succeed.

      Attachments

        Issue Links

          Activity

            After I implemented MLOG_FILE_RENAME2 redo logging of all .ibd file rename operations, I ran into a different problem on ha_innobase::open():

            2017-12-20 14:20:25 0x7fa6269d5700  InnoDB: Assertion failure in file /mariadb/10.2-ext/storage/innobase/dict/dict0dict.cc line 1318
            InnoDB: Failing assertion: table2 == NULL
            

            I then proceeded to revert what looks like a work-around: removing the parameters adjust_space, table_id from fil_space_for_table_exists_in_mem(). It did not make any difference.

            The cause of this assertion failure is that at startup, before rolling back any transactions, the table was loaded into the data dictionary with the new name. That is, InnoDB essentially performed READ UNCOMMITTED of SYS_TABLES here:

            #0  dict_table_add_to_cache (table=0x61700000a588, can_be_evicted=true, 
                heap=0x62c000000200)
                at /mariadb/10.2-ext/storage/innobase/dict/dict0dict.cc:1285
            #1  0x0000555557859b7a in dict_load_table_one (name=..., cached=true, 
                ignore_err=DICT_ERR_IGNORE_RECOVER_LOCK, 
                fk_tables=std::deque with 0 elements)
                at /mariadb/10.2-ext/storage/innobase/dict/dict0load.cc:3007
            #2  0x0000555557858386 in dict_load_table (name=0x614000003728 "test/t2", 
                cached=true, ignore_err=DICT_ERR_IGNORE_RECOVER_LOCK)
                at /mariadb/10.2-ext/storage/innobase/dict/dict0load.cc:2805
            #3  0x000055555785b29a in dict_load_table_on_id (table_id=19, 
                ignore_err=DICT_ERR_IGNORE_RECOVER_LOCK)
                at /mariadb/10.2-ext/storage/innobase/dict/dict0load.cc:3214
            #4  0x000055555780822c in dict_table_open_on_id_low (table_id=19, 
                ignore_err=DICT_ERR_IGNORE_RECOVER_LOCK, open_only_if_in_cache=0)
                at /mariadb/10.2-ext/storage/innobase/include/dict0priv.ic:90
            #5  0x000055555780e780 in dict_table_open_on_id (table_id=19, dict_locked=0, 
                table_op=DICT_TABLE_OP_LOAD_TABLESPACE)
                at /mariadb/10.2-ext/storage/innobase/dict/dict0dict.cc:1006
            #6  0x000055555763ae19 in trx_resurrect_table_locks (trx=0x7fffec522078, 
                undo_ptr=0x7fffec5223a0, undo=0x612000022540)
                at /mariadb/10.2-ext/storage/innobase/trx/trx0trx.cc:780
            #7  0x000055555763d1eb in trx_lists_init_at_db_start ()
                at /mariadb/10.2-ext/storage/innobase/trx/trx0trx.cc:1022
            #8  0x000055555762e3df in trx_sys_init_at_db_start ()
                at /mariadb/10.2-ext/storage/innobase/trx/trx0sys.cc:507
            

            Later, after the incomplete data dictionary transaction was rolled back, on ha_innobase::open() for executing the SELECT, the table would be loaded with the correct name. But, a table with the same ID already existed in the cache, and the assertion fails.
            It looks like we should always write an undo log record about RENAME TABLE, so that the cache can be adjusted on rollback.

            marko Marko Mäkelä added a comment - After I implemented MLOG_FILE_RENAME2 redo logging of all .ibd file rename operations, I ran into a different problem on ha_innobase::open() : 2017-12-20 14:20:25 0x7fa6269d5700 InnoDB: Assertion failure in file /mariadb/10.2-ext/storage/innobase/dict/dict0dict.cc line 1318 InnoDB: Failing assertion: table2 == NULL I then proceeded to revert what looks like a work-around: removing the parameters adjust_space, table_id from fil_space_for_table_exists_in_mem() . It did not make any difference. The cause of this assertion failure is that at startup, before rolling back any transactions, the table was loaded into the data dictionary with the new name. That is, InnoDB essentially performed READ UNCOMMITTED of SYS_TABLES  here: #0 dict_table_add_to_cache (table=0x61700000a588, can_be_evicted=true, heap=0x62c000000200) at /mariadb/10.2-ext/storage/innobase/dict/dict0dict.cc:1285 #1 0x0000555557859b7a in dict_load_table_one (name=..., cached=true, ignore_err=DICT_ERR_IGNORE_RECOVER_LOCK, fk_tables=std::deque with 0 elements) at /mariadb/10.2-ext/storage/innobase/dict/dict0load.cc:3007 #2 0x0000555557858386 in dict_load_table (name=0x614000003728 "test/t2", cached=true, ignore_err=DICT_ERR_IGNORE_RECOVER_LOCK) at /mariadb/10.2-ext/storage/innobase/dict/dict0load.cc:2805 #3 0x000055555785b29a in dict_load_table_on_id (table_id=19, ignore_err=DICT_ERR_IGNORE_RECOVER_LOCK) at /mariadb/10.2-ext/storage/innobase/dict/dict0load.cc:3214 #4 0x000055555780822c in dict_table_open_on_id_low (table_id=19, ignore_err=DICT_ERR_IGNORE_RECOVER_LOCK, open_only_if_in_cache=0) at /mariadb/10.2-ext/storage/innobase/include/dict0priv.ic:90 #5 0x000055555780e780 in dict_table_open_on_id (table_id=19, dict_locked=0, table_op=DICT_TABLE_OP_LOAD_TABLESPACE) at /mariadb/10.2-ext/storage/innobase/dict/dict0dict.cc:1006 #6 0x000055555763ae19 in trx_resurrect_table_locks (trx=0x7fffec522078, undo_ptr=0x7fffec5223a0, undo=0x612000022540) at /mariadb/10.2-ext/storage/innobase/trx/trx0trx.cc:780 #7 0x000055555763d1eb in trx_lists_init_at_db_start () at /mariadb/10.2-ext/storage/innobase/trx/trx0trx.cc:1022 #8 0x000055555762e3df in trx_sys_init_at_db_start () at /mariadb/10.2-ext/storage/innobase/trx/trx0sys.cc:507 Later, after the incomplete data dictionary transaction was rolled back, on ha_innobase::open() for executing the SELECT , the table would be loaded with the correct name. But, a table with the same ID already existed in the cache, and the assertion fails. It looks like we should always write an undo log record about RENAME TABLE , so that the cache can be adjusted on rollback.
            marko Marko Mäkelä added a comment - Fixed in bb-10.2-ext and merged to 10.3 .
            marko Marko Mäkelä added a comment - - edited

            While testing MDEV-13564, which implements TRUNCATE TABLE on top of crash-safe InnoDB-internal RENAME TABLE, CREATE TABLE, DROP TABLE, I suspected that I forgot a redo log flush in trx_undo_report_rename(). This would sometimes have the effect that if the server is killed during an operation that renames .ibd files, recovery would not find anything in the undo log, and would fail to roll back the rename operation in the file system.

            Edit: There is nothing wrong with trx_undo_report_rename(). The redo log flush is initiated in each caller of fil_name_write_rename_low().

            There is also nothing wrong with the redo log flushing:

            log_write_up_to(mtr.commit_lsn(), true);
            

            The problem that I observed in MDEV-13564 was due to unwanted COMMIT within the TRUNCATE operation.

            I filed MDEV-17158 for this problem.

            marko Marko Mäkelä added a comment - - edited While testing MDEV-13564 , which implements TRUNCATE TABLE on top of crash-safe InnoDB-internal RENAME TABLE , CREATE TABLE , DROP TABLE , I suspected that I forgot a redo log flush in trx_undo_report_rename() . This would sometimes have the effect that if the server is killed during an operation that renames .ibd files, recovery would not find anything in the undo log, and would fail to roll back the rename operation in the file system. Edit: There is nothing wrong with trx_undo_report_rename() . The redo log flush is initiated in each caller of fil_name_write_rename_low() . There is also nothing wrong with the redo log flushing: log_write_up_to(mtr.commit_lsn(), true ); The problem that I observed in MDEV-13564 was due to unwanted COMMIT within the TRUNCATE operation. I filed MDEV-17158 for this problem.
            marko Marko Mäkelä added a comment - - edited

            In MariaDB 10.2, this fix will remain disabled by default, in order to preserve compatibility with third-party tools.
            To enable, set

            loose_innodb_safe_truncate=ON
            

            in the server configuration or add

            --loose-innodb-safe-truncate
            

            to the mysqld invocation. This option will only be available in the MariaDB Server 10.2 series.

            The reason for this is that the crash-safe rename will require an undo log format change, and we can only change the undo log format by making sure that older versions of MariaDB 10.2 will refuse to start after a crash. This redo log format identifier change could cause some third-party tools to fail.

            marko Marko Mäkelä added a comment - - edited In MariaDB 10.2, this fix will remain disabled by default, in order to preserve compatibility with third-party tools. To enable, set loose_innodb_safe_truncate=ON in the server configuration or add --loose-innodb-safe-truncate to the mysqld invocation. This option will only be available in the MariaDB Server 10.2 series. The reason for this is that the crash-safe rename will require an undo log format change, and we can only change the undo log format by making sure that older versions of MariaDB 10.2 will refuse to start after a crash. This redo log format identifier change could cause some third-party tools to fail.

            While renaming tables inside InnoDB should be safe in the event the server process crashes, renaming the files might not be safe when file system recovery is involved. MDEV-20677 was filed to cover those scenarios.

            marko Marko Mäkelä added a comment - While renaming tables inside InnoDB should be safe in the event the server process crashes, renaming the files might not be safe when file system recovery is involved. MDEV-20677 was filed to cover those scenarios.

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.