Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-18518

Implement atomic multi-table (or multi-partition) CREATE TABLE for InnoDB

Details

    Description

      The way how InnoDB writes undo log about CREATE TABLE and DROP TABLE introduces the limitation that only one table may be created or dropped within a transaction.

      We should replace the current mechanism (around trx->table_id) and introduce new undo log records for the following:

      • delete-if-exists a file on rollback (when rolling back CREATE TABLE)
      • delete-if-exists a file on purge (after DROP TABLE has been committed)

      Applying either operation must write MLOG_FILE_DELETE2 to the redo log before actually deleting any files.

      To avoid name clashes between DROP TABLE and subsequent CREATE TABLE by the same name (before the old files were purged), we should probably rename the files to temporary names before dropping. Also the rename operation must be undo logged and redo logged. Furthermore, during normal operation we could delete the files for the dropped table. The deletion by the purge subsystem would only be a fall-back measure for crash recovery.

      Attachments

        Issue Links

          Activity

            marko Marko Mäkelä created issue -
            marko Marko Mäkelä made changes -
            Field Original Value New Value
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            NRE Projects RM_105_CANDIDATE
            marko Marko Mäkelä made changes -
            Attachment Remove-table_id.patch [ 48094 ]

            Remove-table_id.patch identifies the source code locations that will have to be adjusted. We probably cannot remove all this code; recovery from old undo logs must remain possible.

            marko Marko Mäkelä added a comment - Remove-table_id.patch identifies the source code locations that will have to be adjusted. We probably cannot remove all this code; recovery from old undo logs must remain possible.
            GeoffMontee Geoff Montee (Inactive) made changes -
            serg Sergei Golubchik made changes -
            Fix Version/s 10.5 [ 23123 ]
            marko Marko Mäkelä made changes -

            Given our chosen simpler solution to MDEV-8069, this task is not really related to MDEV-18613 any more. And MDEV-18613 could end up being redundant.

            marko Marko Mäkelä added a comment - Given our chosen simpler solution to MDEV-8069 , this task is not really related to MDEV-18613 any more. And MDEV-18613 could end up being redundant.
            ralf.gebhardt Ralf Gebhardt made changes -
            marko Marko Mäkelä made changes -

            In MDEV-24569, I posted analysis of corruption after the server was killed during DROP TABLE. That type of corruption would be fixed by this task.

            marko Marko Mäkelä added a comment - In MDEV-24569 , I posted analysis of corruption after the server was killed during DROP TABLE . That type of corruption would be fixed by this task.
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -

            This turned out to be simpler than I originally thought. No new undo log record types are needed; we can simply extend the MDEV-24589 logic.

            marko Marko Mäkelä added a comment - This turned out to be simpler than I originally thought. No new undo log record types are needed; we can simply extend the MDEV-24589 logic.
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Fix Version/s 10.6 [ 24028 ]
            Labels ddl performance
            Priority Major [ 3 ] Critical [ 2 ]
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Status Open [ 1 ] In Progress [ 3 ]

            No file format changes were needed for this. The only difference to earlier is that the undo log header field TRX_UNDO_TABLE_ID will always be written as 0, and the special recovery fts_drop_orphaned_tables() was removed altogether, because all internal tables for FULLTEXT INDEX should now be created in the same DDL transaction with everything else in the ALTER TABLE operation.

            marko Marko Mäkelä added a comment - No file format changes were needed for this. The only difference to earlier is that the undo log header field TRX_UNDO_TABLE_ID will always be written as 0, and the special recovery fts_drop_orphaned_tables() was removed altogether, because all internal tables for FULLTEXT INDEX should now be created in the same DDL transaction with everything else in the ALTER TABLE operation.
            marko Marko Mäkelä made changes -
            issue.field.resolutiondate 2021-05-04 10:53:45.0 2021-05-04 10:53:45.641
            marko Marko Mäkelä made changes -
            Fix Version/s 10.6.1 [ 24437 ]
            Fix Version/s 10.6 [ 24028 ]
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Closed [ 6 ]

            It turns out that many operations involving FULLTEXT INDEX still are not atomic.

            marko Marko Mäkelä added a comment - It turns out that many operations involving FULLTEXT INDEX still are not atomic.
            marko Marko Mäkelä made changes -
            Resolution Fixed [ 1 ]
            Status Closed [ 6 ] Stalled [ 10000 ]

            I am raising this to Blocker status, because currently the feature is half finished.

            marko Marko Mäkelä added a comment - I am raising this to Blocker status, because currently the feature is half finished.
            marko Marko Mäkelä made changes -
            Fix Version/s 10.6 [ 24028 ]
            Fix Version/s 10.6.1 [ 24437 ]
            Priority Critical [ 2 ] Blocker [ 1 ]
            marko Marko Mäkelä made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]

            The following test currently leaves behind orphan FTS_ tables for the table t1:

            --source include/have_innodb.inc
            --source include/have_debug.inc
            --source include/have_debug_sync.inc
             
            CREATE TABLE t1(a TEXT,b TEXT,FULLTEXT INDEX(a)) ENGINE=InnoDB;
            CREATE TABLE t2(a TEXT,b TEXT,FULLTEXT INDEX(a)) ENGINE=InnoDB;
            CREATE TABLE t3(a TEXT,b TEXT,FULLTEXT INDEX(a)) ENGINE=InnoDB;
             
            connect(ddl1, localhost, root,,);
            SET DEBUG_SYNC='innodb_inplace_alter_table_enter SIGNAL 1 WAIT_FOR ever';
            send ALTER TABLE t1 ADD FULLTEXT INDEX(b);
             
            connect(ddl2, localhost, root,,);
            SET DEBUG_SYNC='innodb_inplace_alter_table_enter SIGNAL 2 WAIT_FOR ever';
            send ALTER TABLE t2 DROP INDEX a, ADD FULLTEXT INDEX(b), FORCE;
             
            connect(ddl3, localhost, root,,);
            SET DEBUG_SYNC='alter_table_before_rename_result_table SIGNAL 3 WAIT_FOR ever';
            send ALTER TABLE t3 ADD FULLTEXT INDEX(b), ALGORITHM=COPY;
             
            connection default;
            SET DEBUG_SYNC='now WAIT_FOR 1';
            SET DEBUG_SYNC='now WAIT_FOR 2';
            SET DEBUG_SYNC='now WAIT_FOR 3';
             
            --let $shutdown_timeout=0
            --source include/restart_mysqld.inc
            disconnect ddl1;
            disconnect ddl2;
            disconnect ddl3;
             
            SHOW CREATE TABLE t1;
            SHOW CREATE TABLE t2;
            SHOW CREATE TABLE t3;
            DROP TABLE t1,t2,t3;
            SELECT * FROM information_schema.innodb_sys_tables where name like 'test/%';
            # Cleanup until MDEV-25180 (Atomic ALTER TABLE) has been done
            --remove_files_wildcard `select @@datadir`/#sql-*.frm
            

            marko Marko Mäkelä added a comment - The following test currently leaves behind orphan FTS_ tables for the table t1 : --source include/have_innodb.inc --source include/have_debug.inc --source include/have_debug_sync.inc   CREATE TABLE t1(a TEXT,b TEXT,FULLTEXT INDEX (a)) ENGINE=InnoDB; CREATE TABLE t2(a TEXT,b TEXT,FULLTEXT INDEX (a)) ENGINE=InnoDB; CREATE TABLE t3(a TEXT,b TEXT,FULLTEXT INDEX (a)) ENGINE=InnoDB;   connect (ddl1, localhost, root,,); SET DEBUG_SYNC= 'innodb_inplace_alter_table_enter SIGNAL 1 WAIT_FOR ever' ; send ALTER TABLE t1 ADD FULLTEXT INDEX (b);   connect (ddl2, localhost, root,,); SET DEBUG_SYNC= 'innodb_inplace_alter_table_enter SIGNAL 2 WAIT_FOR ever' ; send ALTER TABLE t2 DROP INDEX a, ADD FULLTEXT INDEX (b), FORCE ;   connect (ddl3, localhost, root,,); SET DEBUG_SYNC= 'alter_table_before_rename_result_table SIGNAL 3 WAIT_FOR ever' ; send ALTER TABLE t3 ADD FULLTEXT INDEX (b), ALGORITHM=COPY;   connection default ; SET DEBUG_SYNC= 'now WAIT_FOR 1' ; SET DEBUG_SYNC= 'now WAIT_FOR 2' ; SET DEBUG_SYNC= 'now WAIT_FOR 3' ;   --let $shutdown_timeout=0 --source include/restart_mysqld.inc disconnect ddl1; disconnect ddl2; disconnect ddl3;   SHOW CREATE TABLE t1; SHOW CREATE TABLE t2; SHOW CREATE TABLE t3; DROP TABLE t1,t2,t3; SELECT * FROM information_schema.innodb_sys_tables where name like 'test/%' ; # Cleanup until MDEV-25180 (Atomic ALTER TABLE ) has been done --remove_files_wildcard `select @@datadir`/#sql-*.frm
            marko Marko Mäkelä added a comment - - edited

            The above test will need some improvement for the regression suite:

            1. Especially if the server is built with cmake -DPLUGIN_PERFSCHEMA=NO, then the WAIT_FOR 2 or WAIT_FOR 3 would hang. This is because DEBUG_SYNC supports at most one signal being ‘in transit’. (That was improved in MySQL 5.7, but we can simply interleave the WAIT_FOR execution in the test.)
            2. The .frm file cleanup is incorrect.
            marko Marko Mäkelä added a comment - - edited The above test will need some improvement for the regression suite: Especially if the server is built with cmake -DPLUGIN_PERFSCHEMA=NO , then the WAIT_FOR 2 or WAIT_FOR 3 would hang. This is because DEBUG_SYNC supports at most one signal being ‘in transit’. ( That was improved in MySQL 5.7 , but we can simply interleave the WAIT_FOR execution in the test.) The .frm file cleanup is incorrect.

            The ALTER TABLE operations involving FULLTEXT INDEX are now atomic inside InnoDB. The test innodb_fts.crash_recovery covers that. The innodb_fts.crash_recovery,release variant (covering a server built using cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo) would trigger many different scenarios that are expected to be fixed by MDEV-24626 and MDEV-25180.

            The test variant innodb_fts.crash_recovery,debug kills the server during a deterministic spot of 3 different ALTER TABLE, to exercise the revised recovery that would drop any internal FTS_%INDEX_% tables along with stubs created during prepare_inplace_alter_table_dict().

            marko Marko Mäkelä added a comment - The ALTER TABLE operations involving FULLTEXT INDEX are now atomic inside InnoDB. The test innodb_fts.crash_recovery covers that. The innodb_fts.crash_recovery,release variant (covering a server built using cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo ) would trigger many different scenarios that are expected to be fixed by MDEV-24626 and MDEV-25180 . The test variant innodb_fts.crash_recovery,debug kills the server during a deterministic spot of 3 different ALTER TABLE , to exercise the revised recovery that would drop any internal FTS_%INDEX_% tables along with stubs created during prepare_inplace_alter_table_dict() .
            marko Marko Mäkelä made changes -
            issue.field.resolutiondate 2021-05-06 14:07:22.0 2021-05-06 14:07:22.03
            marko Marko Mäkelä made changes -
            Fix Version/s 10.6.1 [ 24437 ]
            Fix Version/s 10.6 [ 24028 ]
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Closed [ 6 ]
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 92387 ] MariaDB v4 [ 133853 ]
            marko Marko Mäkelä made changes -

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              2 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.