[MDEV-14585] Automatically remove #sql- tables in innodb dictionary during recovery Created: 2017-12-05 Updated: 2020-08-25 Due: 2017-12-22 Resolved: 2017-12-20 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Fix Version/s: | 10.3.3, 10.2.19 |
| Type: | Task | Priority: | Major |
| Reporter: | Michael Widenius | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | ddl, recovery | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
During ALTER TABLE MariaDB is creating some temporary #sql- prefixed table to hold the content of the new table. If MariaDB crashes during the ALTER TABLE, the #sql- tables will resist on disk and in the Innodb dictionary. In 10.3 one can delete #sql- tables with DROP TABLE, but in earlier version of MariaDB this is not possible. This will be properly fixed in 10.3 or 10.4 when we make ALTER TABLE, DROP TABLE and CREATE TABLE atomic. This will also ensure that .frm tables and the table definition in the storage engine is always up to date. In the mean time, as part of this task, we will add a purge of #sql- table's during innodb recovery, both from the disk and from the dictionary. This will stop one from getting errors like this in the error log: |
| Comments |
| Comment by Marko Mäkelä [ 2017-12-05 ] | |||||||||||||
|
I think that we must be careful here. These particular names seem to originate from ALTER TABLE…ALGORITHM=COPY. The #sql tables whose names do not start with #sql-ib or #sql2 are probably safe to remove. The #sql2 prefix is what ALGORITHM=COPY is renaming the original table to, right before renaming the altered table from #sql name to the user-specified name. We do not want to risk losing both copies of the table. I will have to carefully review the code to see if it is possible that InnoDB can recover two #sql tables if ALGORITHM=COPY crashes at the right moment. Similarly, I would be careful with tables whose names start with #sql-ib (used by table-rebuilding `ALTER TABLE…ALGORITHM=INPLACE`). Note that after | |||||||||||||
| Comment by Marko Mäkelä [ 2017-12-08 ] | |||||||||||||
|
After some more consideration, I do not think that it is safe to fix this in GA versions (before MariaDB 10.3). There are a few problems, and there exists a solution that can avoid orphan #sql tables in most cases. By implementing transactional logging inside InnoDB, we will be able to guarantee that the InnoDB data dictionary and file system will be in sync. Some mechanism outside InnoDB is needed to remove mismatch between .frm files and InnoDB, especially in CREATE, DROP, and RENAME operations. The current state of Data Definition Language Operations in InnoDBThe highlights of DDL operations in InnoDB are as follows:
Transactional logging of Data Definition Language OperationsTo implement transactional Data Manipulation Language operations, InnoDB implements transactional logging. Each transaction has a log, called the undo log. Actually, the name 'undo log' is slightly misleading, because the log not only facilitates ROLLBACK, but it also covers actions that have to be performed after COMMIT (the purge of no-longer-needed history). There is a clear analogy to DDL operations. Similar to DELETE, DROP TABLE or TRUNCATE TABLE should not actually remove any data before the transaction is committed. Only after transaction commit, the data can be removed. (Because the data dictionary essentially follows the READ COMMITTED isolation, that is, there is no multi-versioning, the data can be removed immediately after commit.) The goal is to allow arbitrary transactional multi-table DDL operations inside InnoDB. This goal can be achieved by introducing and writing undo log records to facilitate the following:
This kind of logging allows any combination of DDL operations within an InnoDB ACID transaction. It also contains a provision to implement For now, we may keep the 'index stubs' for ADD INDEX/DROP INDEX. They must be removed when we remove the InnoDB data dictionary tables (MDEV-11655). ALTER TABLE…ALGORITHM=INPLACEInside InnoDB, we make a distinction whether this operation is rebuilding the table or not. If yes, then after
If the server is killed before all the above steps complete, there are two possibilities:
ALTER TABLE…ALGORITHM=COPYThis is divided into the following steps:
If we (possibly naïvely) assume that the file system operations are crash-safe (with appropriate fsync() or similar in place, also for all metadata), then it seems that we can have the following crash points:
So, it is not safe for InnoDB to drop any #sql2… tables, but #sql and #sql-ib are safe to drop, provided that the renaming inside InnoDB works safely. We should try to implement the RENAMEs as a single transaction. In that way, after InnoDB recovery we would always have a tablename.ibd. For automatic recovery, we would still need an appropriate logging and recovery mechanism for renaming the .frm files. RENAME TABLEInside InnoDB, we will implement RENAME as follows:
On transaction rollback (which can be before or after InnoDB restart), we will do the following provided that the renaming was not already rolled back:
Note: If executing the RENAME operations fails at rollback, we are pretty much out of luck. Hopefully it is likely that if the RENAME originally was possible, it should also be possible to execute the operation in the reverse direction. An I/O error or a change of file system permissions could break this assumption. CREATE TABLE and DROP TABLEFor these, a recovery mechanism (drop an incompletely created table) must be provided outside InnoDB. Currently, it is possible to have a failure where a .frm file exists but the table does not exist inside the storage engine. InnoDB will delete the files on rollback of CREATE TABLE, or after the commit of DROP TABLE. If the deletion fails for some reason (such as missing file system permissions), orphan files will be left behind. | |||||||||||||
| Comment by Marko Mäkelä [ 2017-12-14 ] | |||||||||||||
|
Until we remove the dict_operation_lock and replace it with meta-data lock (MDL) acquisition on the user-visible table name, we must use separate transactions for updating the InnoDB data dictionary tables. For example, if ha_innobase::prepare_inplace_alter_table() modifies some InnoDB dictionary records, currently it would have to hold an exclusive lock on dict_operation_lock until the transaction is committed. We clearly cannot allow a long-running ALTER TABLE to block any concurrent access to the InnoDB data dictionary. So, we must keep using some separate DDL transactions for now. This means that the rollback of CREATE TABLE…SELECT or table-rebuilding ALTER TABLE will probably have to involve renaming the table to an internal name that will be cause the InnoDB data dictionary entries of the table to be dropped on server restart. | |||||||||||||
| Comment by Marko Mäkelä [ 2017-12-15 ] | |||||||||||||
|
Another source of orphan tables in InnoDB is the 'background DROP TABLE queue'. If InnoDB is killed before the queue is emptied, then the tables will never be dropped. As far as I understand, the background DROP TABLE is needed in two scenarios:
I was trying to implement a solution where ha_innobase::delete_table() would recognize the conflicting locks and then mark the table as ‘drop on commit’. But, then I realized that if the user-level transaction does not end in COMMIT but in XA PREPARE, then we would have to delete the records from the InnoDB dictionary tables right before that state change, and consequently hold locks on the dictionary records until the final XA COMMIT or XA ROLLBACK. This is impossible without eliminating the dict_operation_lock. What we can realistically do in the short term is:
This should of course be augmented with transactional logging of the operations, so that (for example) DROP TABLE will not discard any user data before the changes have been committed to the data dictionary tables. | |||||||||||||
| Comment by Marko Mäkelä [ 2017-12-18 ] | |||||||||||||
|
The fix of Removing other #sql- tables (not #sql2) is doable, but I am not convinced that it is safe in all cases. It is trivial to modify the code to do so:
What is not trivial is to prove that this will never lose data. I think that it would require extensive testing of ALTER TABLE operations on a variety of tables, including some that use both FULLTEXT INDEX and PARTITION. And this is only meaningful to do once monty has implemented the recovery for the .frm files. The test innodb.alter_crash could be a starting point for that. I would feel more comfortable after implementing transactional InnoDB logging for DDL operations, because without that, we do not really have crash-safe RENAME operations. Maybe for MariaDB 10.3 it suffices to have an undo log record for TRX_UNDO_RENAME_FILE, while in MariaDB 10.4 we could remove dict_operation_lock and use a single transaction for both DML and DDL changes inside InnoDB. | |||||||||||||
| Comment by Marko Mäkelä [ 2017-12-20 ] | |||||||||||||
|
After | |||||||||||||
| Comment by Marko Mäkelä [ 2017-12-20 ] | |||||||||||||
|
I attached my work-in-progress patches for improving this in MariaDB 10.4. In MariaDB 10.3, I do not think that we can do much more, because it would not be feasible to remove dict_operation_lock this close to the GA release. In 10.3 we could still try to fix | |||||||||||||
| Comment by Marko Mäkelä [ 2018-10-11 ] | |||||||||||||
|
In MariaDB 10.2.19, this fix was enabled by default, but it can be disabled in order to preserve compatibility with third-party tools.
in the server configuration or add
to the mysqld invocation. This option will only be available in the MariaDB Server 10.2 series. In 10.3 and later, TRUNCATE will always be the safe variant. A crash-upgrade from 10.2 to 10.4 will be rejected if an innodb_safe_truncate=OFF style TRUNCATE operation was in progress when the 10.2 server was killed. The reason for this is that in order for the #sql- table removal to be safe, RENAME operations inside InnoDB must be crash-safe ( |