Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-14585

Automatically remove #sql- tables in innodb dictionary during recovery

Details

    Description

      During ALTER TABLE MariaDB is creating some temporary #sql- prefixed table to hold the content of the new table. If MariaDB crashes during the ALTER TABLE, the #sql- tables will resist on disk and in the Innodb dictionary.

      In 10.3 one can delete #sql- tables with DROP TABLE, but in earlier version of MariaDB this is not possible.

      This will be properly fixed in 10.3 or 10.4 when we make ALTER TABLE, DROP TABLE and CREATE TABLE atomic. This will also ensure that .frm tables and the table definition in the storage engine is always up to date.

      In the mean time, as part of this task, we will add a purge of #sql- table's during innodb recovery, both from the disk and from the dictionary.

      This will stop one from getting errors like this in the error log:
      Invalid (old?) table or database name '#sql-1bc30_11fce1'

      Attachments

        Issue Links

          Activity

            I think that we must be careful here.

            These particular names seem to originate from ALTER TABLE…ALGORITHM=COPY. The #sql tables whose names do not start with #sql-ib or #sql2 are probably safe to remove. The #sql2 prefix is what ALGORITHM=COPY is renaming the original table to, right before renaming the altered table from #sql name to the user-specified name. We do not want to risk losing both copies of the table. I will have to carefully review the code to see if it is possible that InnoDB can recover two #sql tables if ALGORITHM=COPY crashes at the right moment.

            Similarly, I would be careful with tables whose names start with #sql-ib (used by table-rebuilding `ALTER TABLE…ALGORITHM=INPLACE`). Note that after MDEV-14378 (in MariaDB 10.3.3), the #sql-ib name prefix will only be used for renaming the original table, right before replacing the original table with the rebuilt one.

            marko Marko Mäkelä added a comment - I think that we must be careful here. These particular names seem to originate from ALTER TABLE…ALGORITHM=COPY. The #sql tables whose names do not start with #sql-ib or #sql2 are probably safe to remove. The #sql2 prefix is what ALGORITHM=COPY is renaming the original table to, right before renaming the altered table from #sql name to the user-specified name. We do not want to risk losing both copies of the table. I will have to carefully review the code to see if it is possible that InnoDB can recover two #sql tables if ALGORITHM=COPY crashes at the right moment. Similarly, I would be careful with tables whose names start with #sql-ib (used by table-rebuilding `ALTER TABLE…ALGORITHM=INPLACE`). Note that after MDEV-14378 (in MariaDB 10.3.3), the #sql-ib name prefix will only be used for renaming the original table, right before replacing the original table with the rebuilt one.
            marko Marko Mäkelä added a comment - - edited

            After some more consideration, I do not think that it is safe to fix this in GA versions (before MariaDB 10.3).

            There are a few problems, and there exists a solution that can avoid orphan #sql tables in most cases. By implementing transactional logging inside InnoDB, we will be able to guarantee that the InnoDB data dictionary and file system will be in sync. Some mechanism outside InnoDB is needed to remove mismatch between .frm files and InnoDB, especially in CREATE, DROP, and RENAME operations.

            The current state of Data Definition Language Operations in InnoDB

            The highlights of DDL operations in InnoDB are as follows:

            1. When a record is inserted in the internal SYS_INDEXES, an index tree will be created.
            2. When a SYS_INDEXES record is delete-marked, the correpsonding index tree will be freed, already before transaction commit. So, it is not really possible to roll back.
            3. The undo log header can store one trx_t::table_id. On recovery, the identified table will be dropped. This allows a crash-safe implementation of a single CREATE/DROP TABLE per transaction.
            4. RENAME TABLE is not crash-safe. If the rename operation was performed in the file system but the transaction was not committed before InnoDB was killed and restarted, the change to the SYS_TABLES.NAME will be rolled back, but the rename in the file system will remain in effect.
            5. If a transaction involves creating, dropping or renaming multiple InnoDB tables (say, a partitioned table or a table with fulltext indexes), internally there may be multiple transactions, and it is not really crash-safe.
            6. For ALTER TABLE…ADD INDEX…ALGORITHM=INPLACE we will create ‘index stubs’ such that SYS_INDEXES.NAME starts with the invalid UTF-8 sequence 0xff. This name prefix is also used for crash-safe DROP INDEX.

            Transactional logging of Data Definition Language Operations

            To implement transactional Data Manipulation Language operations, InnoDB implements transactional logging. Each transaction has a log, called the undo log. Actually, the name 'undo log' is slightly misleading, because the log not only facilitates ROLLBACK, but it also covers actions that have to be performed after COMMIT (the purge of no-longer-needed history).

            There is a clear analogy to DDL operations. Similar to DELETE, DROP TABLE or TRUNCATE TABLE should not actually remove any data before the transaction is committed. Only after transaction commit, the data can be removed. (Because the data dictionary essentially follows the READ COMMITTED isolation, that is, there is no multi-versioning, the data can be removed immediately after commit.)

            The goal is to allow arbitrary transactional multi-table DDL operations inside InnoDB. This goal can be achieved by introducing and writing undo log records to facilitate the following:

            • CREATE file: on rollback, delete the file if it exists
            • CREATE index (outside CREATE file): on rollback, free the index tree if the root page matches
            • DELETE file, DELETE index: similar to CREATE, but the action is executed after commit only
            • RENAME file: on rollback, rename the file back if it had already been renamed
            • TRUNCATE index: after commit, truncate the index tree identified by the root page number
            • BULK-LOAD index: on rollback, truncate the index tree identified by the root page number

            This kind of logging allows any combination of DDL operations within an InnoDB ACID transaction. It also contains a provision to implement MDEV-515 bulk insert and MDEV-13564 backup-friendly TRUNCATE TABLE.

            For now, we may keep the 'index stubs' for ADD INDEX/DROP INDEX. They must be removed when we remove the InnoDB data dictionary tables (MDEV-11655).

            ALTER TABLE…ALGORITHM=INPLACE

            Inside InnoDB, we make a distinction whether this operation is rebuilding the table or not. If yes, then after MDEV-14378, MariaDB will rename the old table to temporary name #sql-ib before replacing the old table.
            With the above-mentioned logging in place, the #sql-ib file will not be visible after InnoDB recovery, and the operation can be observed as follows:

            1. Create a #sql….frm file for the altered table definition.
            2. Invoke prepare_inplace_alter_table(), inplace_alter_table(), commit_inplace_alter_table() inside the storage engine.
            3. Replace the tablename.frm with the #sql….frm file.

            If the server is killed before all the above steps complete, there are two possibilities:

            1. A likely incomplete (corrupted) table #sql… will exist. (It is safe for InnoDB to drop the table, so let us assume that only the #sql….frm file will exist after InnoDB recovery.)
            2. The operation was committed inside InnoDB. #sql….frm exists, and should be renamed to tablename.frm. Otherwise, tablename.frm will be out of sync with InnoDB, and bad things (including crash or corruption) can occur if the table is accessed.

            ALTER TABLE…ALGORITHM=COPY

            This is divided into the following steps:

            1. Create table #sql…. (First create the .frm file, then create the table inside InnoDB.) COMMIT.
            2. Copy the data from tablename to #sql…. COMMIT.
            3. Rename tablename to #sql2…. COMMIT.
            4. Rename #sql… to tablename. COMMIT.
            5. Drop #sql2… COMMIT.

            If we (possibly naïvely) assume that the file system operations are crash-safe (with appropriate fsync() or similar in place, also for all metadata), then it seems that we can have the following crash points:

            1. Table #sql… exists, and the copying was not completed. InnoDB recovery can remove it; something outside InnoDB (a log file or DBA) would cause the .frm file to be removed.
            2. Crash during rename to #sql2…: First we rename the .frm file, then call handler::rename_table(). On crash recovery, we can have #sql2….frm but still tablename.ibd. It is still safe to drop #sql… inside InnoDB. But something outside InnoDB will have to rename #sql2… back to tablename.
            3. Rename from #sql… to tablename: After crash, we can have tablename.frm and both #sql….ibd and #sql2….ibd, and no tablename.ibd.
              3. Drop #sql2…: After crash, we can have #sql2….frm but no #sql2….ibd. The .frm file can be deleted outside InnoDB.

            So, it is not safe for InnoDB to drop any #sql2… tables, but #sql and #sql-ib are safe to drop, provided that the renaming inside InnoDB works safely.

            We should try to implement the RENAMEs as a single transaction. In that way, after InnoDB recovery we would always have a tablename.ibd. For automatic recovery, we would still need an appropriate logging and recovery mechanism for renaming the .frm files.

            RENAME TABLE

            Inside InnoDB, we will implement RENAME as follows:

            1. Write an undo log record for rolling back the rename operation.
            2. Write a MLOG_FILE_RENAME2 record for rolling forward the rename operation.
            3. Rename the file in the file system, and make the change durable.

            On transaction rollback (which can be before or after InnoDB restart), we will do the following provided that the renaming was not already rolled back:

            1. Write a MLOG_FILE_RENAME2 record for rolling back the rename operation.
            2. Rename the file in the file system, and make the change durable.

            Note: If executing the RENAME operations fails at rollback, we are pretty much out of luck. Hopefully it is likely that if the RENAME originally was possible, it should also be possible to execute the operation in the reverse direction. An I/O error or a change of file system permissions could break this assumption.

            CREATE TABLE and DROP TABLE

            For these, a recovery mechanism (drop an incompletely created table) must be provided outside InnoDB. Currently, it is possible to have a failure where a .frm file exists but the table does not exist inside the storage engine.

            InnoDB will delete the files on rollback of CREATE TABLE, or after the commit of DROP TABLE. If the deletion fails for some reason (such as missing file system permissions), orphan files will be left behind.

            marko Marko Mäkelä added a comment - - edited After some more consideration, I do not think that it is safe to fix this in GA versions (before MariaDB 10.3). There are a few problems, and there exists a solution that can avoid orphan #sql tables in most cases. By implementing transactional logging inside InnoDB, we will be able to guarantee that the InnoDB data dictionary and file system will be in sync. Some mechanism outside InnoDB is needed to remove mismatch between .frm files and InnoDB, especially in CREATE, DROP, and RENAME operations. The current state of Data Definition Language Operations in InnoDB The highlights of DDL operations in InnoDB are as follows: When a record is inserted in the internal SYS_INDEXES, an index tree will be created. When a SYS_INDEXES record is delete-marked, the correpsonding index tree will be freed, already before transaction commit. So, it is not really possible to roll back. The undo log header can store one trx_t::table_id. On recovery, the identified table will be dropped. This allows a crash-safe implementation of a single CREATE/DROP TABLE per transaction. RENAME TABLE is not crash-safe . If the rename operation was performed in the file system but the transaction was not committed before InnoDB was killed and restarted, the change to the SYS_TABLES.NAME will be rolled back, but the rename in the file system will remain in effect. If a transaction involves creating, dropping or renaming multiple InnoDB tables (say, a partitioned table or a table with fulltext indexes), internally there may be multiple transactions, and it is not really crash-safe. For ALTER TABLE…ADD INDEX…ALGORITHM=INPLACE we will create ‘index stubs’ such that SYS_INDEXES.NAME starts with the invalid UTF-8 sequence 0xff. This name prefix is also used for crash-safe DROP INDEX. Transactional logging of Data Definition Language Operations To implement transactional Data Manipulation Language operations, InnoDB implements transactional logging. Each transaction has a log, called the undo log. Actually, the name 'undo log' is slightly misleading, because the log not only facilitates ROLLBACK, but it also covers actions that have to be performed after COMMIT (the purge of no-longer-needed history). There is a clear analogy to DDL operations. Similar to DELETE, DROP TABLE or TRUNCATE TABLE should not actually remove any data before the transaction is committed. Only after transaction commit, the data can be removed. (Because the data dictionary essentially follows the READ COMMITTED isolation, that is, there is no multi-versioning, the data can be removed immediately after commit.) The goal is to allow arbitrary transactional multi-table DDL operations inside InnoDB. This goal can be achieved by introducing and writing undo log records to facilitate the following: CREATE file: on rollback, delete the file if it exists CREATE index (outside CREATE file): on rollback, free the index tree if the root page matches DELETE file, DELETE index: similar to CREATE, but the action is executed after commit only RENAME file: on rollback, rename the file back if it had already been renamed TRUNCATE index: after commit, truncate the index tree identified by the root page number BULK-LOAD index: on rollback, truncate the index tree identified by the root page number This kind of logging allows any combination of DDL operations within an InnoDB ACID transaction. It also contains a provision to implement MDEV-515 bulk insert and MDEV-13564 backup-friendly TRUNCATE TABLE. For now, we may keep the 'index stubs' for ADD INDEX/DROP INDEX. They must be removed when we remove the InnoDB data dictionary tables ( MDEV-11655 ). ALTER TABLE…ALGORITHM=INPLACE Inside InnoDB, we make a distinction whether this operation is rebuilding the table or not. If yes, then after MDEV-14378 , MariaDB will rename the old table to temporary name #sql-ib before replacing the old table. With the above-mentioned logging in place, the #sql-ib file will not be visible after InnoDB recovery, and the operation can be observed as follows: Create a #sql….frm file for the altered table definition. Invoke prepare_inplace_alter_table(), inplace_alter_table(), commit_inplace_alter_table() inside the storage engine. Replace the tablename.frm with the #sql….frm file. If the server is killed before all the above steps complete, there are two possibilities: A likely incomplete (corrupted) table #sql… will exist. (It is safe for InnoDB to drop the table, so let us assume that only the #sql….frm file will exist after InnoDB recovery.) The operation was committed inside InnoDB. #sql….frm exists, and should be renamed to tablename.frm. Otherwise, tablename.frm will be out of sync with InnoDB, and bad things (including crash or corruption) can occur if the table is accessed. ALTER TABLE…ALGORITHM=COPY This is divided into the following steps: Create table #sql…. (First create the .frm file, then create the table inside InnoDB.) COMMIT. Copy the data from tablename to #sql…. COMMIT. Rename tablename to #sql2…. COMMIT. Rename #sql… to tablename. COMMIT. Drop #sql2… COMMIT. If we (possibly naïvely) assume that the file system operations are crash-safe (with appropriate fsync() or similar in place, also for all metadata), then it seems that we can have the following crash points: Table #sql… exists, and the copying was not completed. InnoDB recovery can remove it; something outside InnoDB (a log file or DBA) would cause the .frm file to be removed. Crash during rename to #sql2…: First we rename the .frm file, then call handler::rename_table(). On crash recovery, we can have #sql2….frm but still tablename.ibd. It is still safe to drop #sql… inside InnoDB. But something outside InnoDB will have to rename #sql2… back to tablename. Rename from #sql… to tablename: After crash, we can have tablename.frm and both #sql….ibd and #sql2….ibd, and no tablename.ibd. 3. Drop #sql2…: After crash, we can have #sql2….frm but no #sql2….ibd. The .frm file can be deleted outside InnoDB. So, it is not safe for InnoDB to drop any #sql2… tables, but #sql and #sql-ib are safe to drop, provided that the renaming inside InnoDB works safely. We should try to implement the RENAMEs as a single transaction. In that way, after InnoDB recovery we would always have a tablename.ibd. For automatic recovery, we would still need an appropriate logging and recovery mechanism for renaming the .frm files. RENAME TABLE Inside InnoDB, we will implement RENAME as follows: Write an undo log record for rolling back the rename operation. Write a MLOG_FILE_RENAME2 record for rolling forward the rename operation. Rename the file in the file system, and make the change durable. On transaction rollback (which can be before or after InnoDB restart), we will do the following provided that the renaming was not already rolled back: Write a MLOG_FILE_RENAME2 record for rolling back the rename operation. Rename the file in the file system, and make the change durable. Note: If executing the RENAME operations fails at rollback, we are pretty much out of luck. Hopefully it is likely that if the RENAME originally was possible, it should also be possible to execute the operation in the reverse direction. An I/O error or a change of file system permissions could break this assumption. CREATE TABLE and DROP TABLE For these, a recovery mechanism (drop an incompletely created table) must be provided outside InnoDB. Currently, it is possible to have a failure where a .frm file exists but the table does not exist inside the storage engine. InnoDB will delete the files on rollback of CREATE TABLE, or after the commit of DROP TABLE. If the deletion fails for some reason (such as missing file system permissions), orphan files will be left behind.

            Until we remove the dict_operation_lock and replace it with meta-data lock (MDL) acquisition on the user-visible table name, we must use separate transactions for updating the InnoDB data dictionary tables. For example, if ha_innobase::prepare_inplace_alter_table() modifies some InnoDB dictionary records, currently it would have to hold an exclusive lock on dict_operation_lock until the transaction is committed. We clearly cannot allow a long-running ALTER TABLE to block any concurrent access to the InnoDB data dictionary. So, we must keep using some separate DDL transactions for now.

            This means that the rollback of CREATE TABLE…SELECT or table-rebuilding ALTER TABLE will probably have to involve renaming the table to an internal name that will be cause the InnoDB data dictionary entries of the table to be dropped on server restart.

            marko Marko Mäkelä added a comment - Until we remove the dict_operation_lock and replace it with meta-data lock (MDL) acquisition on the user-visible table name, we must use separate transactions for updating the InnoDB data dictionary tables. For example, if ha_innobase::prepare_inplace_alter_table() modifies some InnoDB dictionary records, currently it would have to hold an exclusive lock on dict_operation_lock until the transaction is committed. We clearly cannot allow a long-running ALTER TABLE to block any concurrent access to the InnoDB data dictionary. So, we must keep using some separate DDL transactions for now. This means that the rollback of CREATE TABLE…SELECT or table-rebuilding ALTER TABLE will probably have to involve renaming the table to an internal name that will be cause the InnoDB data dictionary entries of the table to be dropped on server restart.

            Another source of orphan tables in InnoDB is the 'background DROP TABLE queue'. If InnoDB is killed before the queue is emptied, then the tables will never be dropped.

            As far as I understand, the background DROP TABLE is needed in two scenarios:

            1. A FOREIGN KEY check is running on a table that is being dropped, and the check did not acquire proper metadata lock (MDL) on the table. This is a known problem in MySQL 5.5, 5.6, 5.7 until WL#6049 was implemented in 8.0. According to serg MariaDB got the equivalent feature earlier.
            2. A duplicate key error occurs during CREATE TABLE t(PRIMARY KEY a)SELECT 1 a UNION ALL SELECT 1 a, and the SQL layer invokes DROP TABLE t in error handling. Because of the dict_operation_lock in InnoDB, the DROP TABLE would be executed in a separate transaction while the user-visible transaction is still active, holding locks on the table.

            I was trying to implement a solution where ha_innobase::delete_table() would recognize the conflicting locks and then mark the table as ‘drop on commit’. But, then I realized that if the user-level transaction does not end in COMMIT but in XA PREPARE, then we would have to delete the records from the InnoDB dictionary tables right before that state change, and consequently hold locks on the dictionary records until the final XA COMMIT or XA ROLLBACK. This is impossible without eliminating the dict_operation_lock.

            What we can realistically do in the short term is:

            1. Before adding a table to the background DROP TABLE queue, rename it to a temporary name (using #sql-ib prefix).
            2. On InnoDB startup, drop all tables whose name starts with #sql-ib. This will not only make the background queue crash-safe, but also remove any orphan tables from the table-rebuilding ALTER TABLE…ALGORITHM=INPLACE.

            This should of course be augmented with transactional logging of the operations, so that (for example) DROP TABLE will not discard any user data before the changes have been committed to the data dictionary tables.

            marko Marko Mäkelä added a comment - Another source of orphan tables in InnoDB is the 'background DROP TABLE queue'. If InnoDB is killed before the queue is emptied, then the tables will never be dropped. As far as I understand, the background DROP TABLE is needed in two scenarios: A FOREIGN KEY check is running on a table that is being dropped, and the check did not acquire proper metadata lock (MDL) on the table. This is a known problem in MySQL 5.5, 5.6, 5.7 until WL#6049 was implemented in 8.0. According to serg MariaDB got the equivalent feature earlier. A duplicate key error occurs during CREATE TABLE t(PRIMARY KEY a)SELECT 1 a UNION ALL SELECT 1 a , and the SQL layer invokes DROP TABLE t in error handling. Because of the dict_operation_lock in InnoDB, the DROP TABLE would be executed in a separate transaction while the user-visible transaction is still active, holding locks on the table. I was trying to implement a solution where ha_innobase::delete_table() would recognize the conflicting locks and then mark the table as ‘drop on commit’. But, then I realized that if the user-level transaction does not end in COMMIT but in XA PREPARE , then we would have to delete the records from the InnoDB dictionary tables right before that state change, and consequently hold locks on the dictionary records until the final XA COMMIT or XA ROLLBACK . This is impossible without eliminating the dict_operation_lock . What we can realistically do in the short term is: Before adding a table to the background DROP TABLE queue, rename it to a temporary name (using #sql-ib prefix). On InnoDB startup, drop all tables whose name starts with #sql-ib . This will not only make the background queue crash-safe, but also remove any orphan tables from the table-rebuilding ALTER TABLE…ALGORITHM=INPLACE . This should of course be augmented with transactional logging of the operations, so that (for example) DROP TABLE will not discard any user data before the changes have been committed to the data dictionary tables.

            The fix of MDEV-13407 makes the ’background DROP TABLE queue’ crash-safe and causes the #sql-ib tables to be removed on InnoDB startup. This I believe is safe.

            Removing other #sql- tables (not #sql2) is doable, but I am not convinced that it is safe in all cases. It is trivial to modify the code to do so:

            diff --git a/storage/innobase/row/row0mysql.cc b/storage/innobase/row/row0mysql.cc
            index cee05014d7f..9a4ec143e21 100644
            --- a/storage/innobase/row/row0mysql.cc
            +++ b/storage/innobase/row/row0mysql.cc
            @@ -2918,7 +2918,7 @@ row_mysql_drop_garbage_tables()
             		table_name = mem_heap_strdupl(
             			heap,
             			reinterpret_cast<const char*>(field), len);
            -		if (strstr(table_name, "/" TEMP_FILE_PREFIX_INNODB)) {
            +		if (strstr(table_name, "/" TEMP_FILE_PREFIX "-")) {
             			btr_pcur_store_position(&pcur, &mtr);
             			btr_pcur_commit_specify_mtr(&pcur, &mtr);
             
            

            What is not trivial is to prove that this will never lose data. I think that it would require extensive testing of ALTER TABLE operations on a variety of tables, including some that use both FULLTEXT INDEX and PARTITION. And this is only meaningful to do once monty has implemented the recovery for the .frm files. The test innodb.alter_crash could be a starting point for that.

            I would feel more comfortable after implementing transactional InnoDB logging for DDL operations, because without that, we do not really have crash-safe RENAME operations. Maybe for MariaDB 10.3 it suffices to have an undo log record for TRX_UNDO_RENAME_FILE, while in MariaDB 10.4 we could remove dict_operation_lock and use a single transaction for both DML and DDL changes inside InnoDB.

            marko Marko Mäkelä added a comment - The fix of MDEV-13407 makes the ’background DROP TABLE queue’ crash-safe and causes the #sql-ib tables to be removed on InnoDB startup. This I believe is safe. Removing other #sql- tables (not #sql2 ) is doable, but I am not convinced that it is safe in all cases. It is trivial to modify the code to do so: diff --git a/storage/innobase/row/row0mysql.cc b/storage/innobase/row/row0mysql.cc index cee05014d7f..9a4ec143e21 100644 --- a/storage/innobase/row/row0mysql.cc +++ b/storage/innobase/row/row0mysql.cc @@ -2918,7 +2918,7 @@ row_mysql_drop_garbage_tables() table_name = mem_heap_strdupl( heap, reinterpret_cast<const char*>(field), len); - if (strstr(table_name, "/" TEMP_FILE_PREFIX_INNODB)) { + if (strstr(table_name, "/" TEMP_FILE_PREFIX "-")) { btr_pcur_store_position(&pcur, &mtr); btr_pcur_commit_specify_mtr(&pcur, &mtr); What is not trivial is to prove that this will never lose data. I think that it would require extensive testing of ALTER TABLE operations on a variety of tables, including some that use both FULLTEXT INDEX and PARTITION . And this is only meaningful to do once monty has implemented the recovery for the .frm files. The test innodb.alter_crash could be a starting point for that. I would feel more comfortable after implementing transactional InnoDB logging for DDL operations, because without that, we do not really have crash-safe RENAME operations. Maybe for MariaDB 10.3 it suffices to have an undo log record for TRX_UNDO_RENAME_FILE , while in MariaDB 10.4 we could remove dict_operation_lock and use a single transaction for both DML and DDL changes inside InnoDB.

            After MDEV-14717 we should have crash-safe RENAME operations, and it should be safe to remove all #sql- tables (not #sql2) at InnoDB startup.

            marko Marko Mäkelä added a comment - After MDEV-14717 we should have crash-safe RENAME operations, and it should be safe to remove all #sql- tables (not #sql2 ) at InnoDB startup.

            I attached my work-in-progress patches for improving this in MariaDB 10.4.

            In MariaDB 10.3, I do not think that we can do much more, because it would not be feasible to remove dict_operation_lock this close to the GA release.

            In 10.3 we could still try to fix MDEV-13564 by logging TRUNCATE operation in the undo log, and in DROP TABLE we could defer the file deletion operations, similarly by writing undo log records that would be applied after transaction commit (in a purge-like activity).

            marko Marko Mäkelä added a comment - I attached my work-in-progress patches for improving this in MariaDB 10.4. In MariaDB 10.3, I do not think that we can do much more, because it would not be feasible to remove dict_operation_lock this close to the GA release. In 10.3 we could still try to fix MDEV-13564 by logging TRUNCATE operation in the undo log, and in DROP TABLE we could defer the file deletion operations, similarly by writing undo log records that would be applied after transaction commit (in a purge-like activity).
            marko Marko Mäkelä added a comment - - edited

            In MariaDB 10.2.19, this fix was enabled by default, but it can be disabled in order to preserve compatibility with third-party tools.
            To disable, set

            loose_innodb_safe_truncate=OFF
            

            in the server configuration or add

            --loose-skip-innodb-safe-truncate
            

            to the mysqld invocation. This option will only be available in the MariaDB Server 10.2 series. In 10.3 and later, TRUNCATE will always be the safe variant. A crash-upgrade from 10.2 to 10.4 will be rejected if an innodb_safe_truncate=OFF style TRUNCATE operation was in progress when the 10.2 server was killed.

            The reason for this is that in order for the #sql- table removal to be safe, RENAME operations inside InnoDB must be crash-safe (MDEV-14717). The MDEV-14717 fix requires an undo log format change that could break a crash-downgrade to an earlier 10.2 version. To prevent a crash-downgrade we’d change the redo log format identifier. This change could cause some third-party tools to fail.

            marko Marko Mäkelä added a comment - - edited In MariaDB 10.2.19, this fix was enabled by default, but it can be disabled in order to preserve compatibility with third-party tools. To disable, set loose_innodb_safe_truncate=OFF in the server configuration or add --loose-skip-innodb-safe-truncate to the mysqld invocation. This option will only be available in the MariaDB Server 10.2 series. In 10.3 and later, TRUNCATE will always be the safe variant. A crash-upgrade from 10.2 to 10.4 will be rejected if an innodb_safe_truncate=OFF style TRUNCATE operation was in progress when the 10.2 server was killed. The reason for this is that in order for the #sql- table removal to be safe, RENAME operations inside InnoDB must be crash-safe ( MDEV-14717 ). The MDEV-14717 fix requires an undo log format change that could break a crash-downgrade to an earlier 10.2 version. To prevent a crash-downgrade we’d change the redo log format identifier. This change could cause some third-party tools to fail.

            People

              marko Marko Mäkelä
              monty Michael Widenius
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.