Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-18518

Implement atomic multi-table (or multi-partition) CREATE TABLE for InnoDB

Details

    Description

      The way how InnoDB writes undo log about CREATE TABLE and DROP TABLE introduces the limitation that only one table may be created or dropped within a transaction.

      We should replace the current mechanism (around trx->table_id) and introduce new undo log records for the following:

      • delete-if-exists a file on rollback (when rolling back CREATE TABLE)
      • delete-if-exists a file on purge (after DROP TABLE has been committed)

      Applying either operation must write MLOG_FILE_DELETE2 to the redo log before actually deleting any files.

      To avoid name clashes between DROP TABLE and subsequent CREATE TABLE by the same name (before the old files were purged), we should probably rename the files to temporary names before dropping. Also the rename operation must be undo logged and redo logged. Furthermore, during normal operation we could delete the files for the dropped table. The deletion by the purge subsystem would only be a fall-back measure for crash recovery.

      Attachments

        Issue Links

          Activity

            It turns out that many operations involving FULLTEXT INDEX still are not atomic.

            marko Marko Mäkelä added a comment - It turns out that many operations involving FULLTEXT INDEX still are not atomic.

            I am raising this to Blocker status, because currently the feature is half finished.

            marko Marko Mäkelä added a comment - I am raising this to Blocker status, because currently the feature is half finished.

            The following test currently leaves behind orphan FTS_ tables for the table t1:

            --source include/have_innodb.inc
            --source include/have_debug.inc
            --source include/have_debug_sync.inc
             
            CREATE TABLE t1(a TEXT,b TEXT,FULLTEXT INDEX(a)) ENGINE=InnoDB;
            CREATE TABLE t2(a TEXT,b TEXT,FULLTEXT INDEX(a)) ENGINE=InnoDB;
            CREATE TABLE t3(a TEXT,b TEXT,FULLTEXT INDEX(a)) ENGINE=InnoDB;
             
            connect(ddl1, localhost, root,,);
            SET DEBUG_SYNC='innodb_inplace_alter_table_enter SIGNAL 1 WAIT_FOR ever';
            send ALTER TABLE t1 ADD FULLTEXT INDEX(b);
             
            connect(ddl2, localhost, root,,);
            SET DEBUG_SYNC='innodb_inplace_alter_table_enter SIGNAL 2 WAIT_FOR ever';
            send ALTER TABLE t2 DROP INDEX a, ADD FULLTEXT INDEX(b), FORCE;
             
            connect(ddl3, localhost, root,,);
            SET DEBUG_SYNC='alter_table_before_rename_result_table SIGNAL 3 WAIT_FOR ever';
            send ALTER TABLE t3 ADD FULLTEXT INDEX(b), ALGORITHM=COPY;
             
            connection default;
            SET DEBUG_SYNC='now WAIT_FOR 1';
            SET DEBUG_SYNC='now WAIT_FOR 2';
            SET DEBUG_SYNC='now WAIT_FOR 3';
             
            --let $shutdown_timeout=0
            --source include/restart_mysqld.inc
            disconnect ddl1;
            disconnect ddl2;
            disconnect ddl3;
             
            SHOW CREATE TABLE t1;
            SHOW CREATE TABLE t2;
            SHOW CREATE TABLE t3;
            DROP TABLE t1,t2,t3;
            SELECT * FROM information_schema.innodb_sys_tables where name like 'test/%';
            # Cleanup until MDEV-25180 (Atomic ALTER TABLE) has been done
            --remove_files_wildcard `select @@datadir`/#sql-*.frm
            

            marko Marko Mäkelä added a comment - The following test currently leaves behind orphan FTS_ tables for the table t1 : --source include/have_innodb.inc --source include/have_debug.inc --source include/have_debug_sync.inc   CREATE TABLE t1(a TEXT,b TEXT,FULLTEXT INDEX (a)) ENGINE=InnoDB; CREATE TABLE t2(a TEXT,b TEXT,FULLTEXT INDEX (a)) ENGINE=InnoDB; CREATE TABLE t3(a TEXT,b TEXT,FULLTEXT INDEX (a)) ENGINE=InnoDB;   connect (ddl1, localhost, root,,); SET DEBUG_SYNC= 'innodb_inplace_alter_table_enter SIGNAL 1 WAIT_FOR ever' ; send ALTER TABLE t1 ADD FULLTEXT INDEX (b);   connect (ddl2, localhost, root,,); SET DEBUG_SYNC= 'innodb_inplace_alter_table_enter SIGNAL 2 WAIT_FOR ever' ; send ALTER TABLE t2 DROP INDEX a, ADD FULLTEXT INDEX (b), FORCE ;   connect (ddl3, localhost, root,,); SET DEBUG_SYNC= 'alter_table_before_rename_result_table SIGNAL 3 WAIT_FOR ever' ; send ALTER TABLE t3 ADD FULLTEXT INDEX (b), ALGORITHM=COPY;   connection default ; SET DEBUG_SYNC= 'now WAIT_FOR 1' ; SET DEBUG_SYNC= 'now WAIT_FOR 2' ; SET DEBUG_SYNC= 'now WAIT_FOR 3' ;   --let $shutdown_timeout=0 --source include/restart_mysqld.inc disconnect ddl1; disconnect ddl2; disconnect ddl3;   SHOW CREATE TABLE t1; SHOW CREATE TABLE t2; SHOW CREATE TABLE t3; DROP TABLE t1,t2,t3; SELECT * FROM information_schema.innodb_sys_tables where name like 'test/%' ; # Cleanup until MDEV-25180 (Atomic ALTER TABLE ) has been done --remove_files_wildcard `select @@datadir`/#sql-*.frm
            marko Marko Mäkelä added a comment - - edited

            The above test will need some improvement for the regression suite:

            1. Especially if the server is built with cmake -DPLUGIN_PERFSCHEMA=NO, then the WAIT_FOR 2 or WAIT_FOR 3 would hang. This is because DEBUG_SYNC supports at most one signal being ‘in transit’. (That was improved in MySQL 5.7, but we can simply interleave the WAIT_FOR execution in the test.)
            2. The .frm file cleanup is incorrect.
            marko Marko Mäkelä added a comment - - edited The above test will need some improvement for the regression suite: Especially if the server is built with cmake -DPLUGIN_PERFSCHEMA=NO , then the WAIT_FOR 2 or WAIT_FOR 3 would hang. This is because DEBUG_SYNC supports at most one signal being ‘in transit’. ( That was improved in MySQL 5.7 , but we can simply interleave the WAIT_FOR execution in the test.) The .frm file cleanup is incorrect.

            The ALTER TABLE operations involving FULLTEXT INDEX are now atomic inside InnoDB. The test innodb_fts.crash_recovery covers that. The innodb_fts.crash_recovery,release variant (covering a server built using cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo) would trigger many different scenarios that are expected to be fixed by MDEV-24626 and MDEV-25180.

            The test variant innodb_fts.crash_recovery,debug kills the server during a deterministic spot of 3 different ALTER TABLE, to exercise the revised recovery that would drop any internal FTS_%INDEX_% tables along with stubs created during prepare_inplace_alter_table_dict().

            marko Marko Mäkelä added a comment - The ALTER TABLE operations involving FULLTEXT INDEX are now atomic inside InnoDB. The test innodb_fts.crash_recovery covers that. The innodb_fts.crash_recovery,release variant (covering a server built using cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo ) would trigger many different scenarios that are expected to be fixed by MDEV-24626 and MDEV-25180 . The test variant innodb_fts.crash_recovery,debug kills the server during a deterministic spot of 3 different ALTER TABLE , to exercise the revised recovery that would drop any internal FTS_%INDEX_% tables along with stubs created during prepare_inplace_alter_table_dict() .

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              2 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.