Details

    Description

      Atomic drop table (MDEV-23844) does not recover the replaced table if CREATE OR REPLACE TABLE fails.

      This is because current, as before, CREATE OR REPLACE TABLE foo ... is implemented as:

      DROP TABLE IF EXISTS foo;
      CREATE TABLE foo ..

      Because of that failed CREATE OR REPLACE TABLE or a crash during the create will lead to lost table, if the original table existed.

      This task equips table drop with backing up the old table until the new table was successfully created.

      In other words, this task ensures that if one executes CREATE OR REPLACE TABLE foo on an existing table, the table foo will either be the new or old table after the statement has executed, including in the cases the CREATE fails or the server crashes.

      NOTE: this change was in a preview release for MariaDB Community Server 10.10.0, but could not be added to 10.10 RC

      Attachments

        Issue Links

          Activity

            midenok Aleksey Midenkov added a comment - - edited
            midenok Aleksey Midenkov added a comment - - edited Please review bb-10.7-midenok-MDEV-25292
            midenok Aleksey Midenkov added a comment - - edited

            marko Please review InnoDB part of bb-10.7-midenok-MDEV-25292 in main commit MDEV-25292 Atomic CREATE OR REPLACE TABLE

            midenok Aleksey Midenkov added a comment - - edited marko Please review InnoDB part of bb-10.7-midenok-MDEV-25292 in main commit MDEV-25292 Atomic CREATE OR REPLACE TABLE
            midenok Aleksey Midenkov added a comment - - edited

            elenst Please test bb-10.7-MDEV-25292-atomic-replace

            midenok Aleksey Midenkov added a comment - - edited elenst Please test bb-10.7- MDEV-25292 -atomic-replace

            Sorry, I had missed the request to review the InnoDB changes. I found a branch from March 22 that is 12 commits ahead and 53 commits behind 10.7. I think that this needs to be based on the latest development branch.

            I would appreciate a more detailed description of the changes, and the reasons for changing lower-level InnoDB code. I would suggest you to check the TRUNCATE TABLE implementation once more and consider if a smaller change is possible. If the motivation of the more extensive changes is to support the addition of FOREIGN KEY constraints with something like the following:

            CREATE TABLE parent(a INT PRIMARY KEY) ENGINE=InnoDB;
            CREATE TABLE child(a INT PRIMARY KEY) ENGINE=InnoDB;
            CREATE OR REPLACE TABLE child (a INT PRIMARY KEY REFERENCES parent) ENGINE=InnoDB;
            

            then it should be mentioned in the commit message and there should be such test cases in the same commit.

            marko Marko Mäkelä added a comment - Sorry, I had missed the request to review the InnoDB changes. I found a branch from March 22 that is 12 commits ahead and 53 commits behind 10.7 . I think that this needs to be based on the latest development branch. I would appreciate a more detailed description of the changes, and the reasons for changing lower-level InnoDB code. I would suggest you to check the TRUNCATE TABLE implementation once more and consider if a smaller change is possible. If the motivation of the more extensive changes is to support the addition of FOREIGN KEY constraints with something like the following: CREATE TABLE parent(a INT PRIMARY KEY ) ENGINE=InnoDB; CREATE TABLE child(a INT PRIMARY KEY ) ENGINE=InnoDB; CREATE OR REPLACE TABLE child (a INT PRIMARY KEY REFERENCES parent) ENGINE=InnoDB; then it should be mentioned in the commit message and there should be such test cases in the same commit.
            midenok Aleksey Midenkov added a comment - - edited

            marko "InnoDB changes" article has been added.

            Any of the branches can be reviewed:
            bb-10.7-midenok-MDEV-25292
            preview-10.10-MDEV-25292-atomic-cor

            midenok Aleksey Midenkov added a comment - - edited marko "InnoDB changes" article has been added. Any of the branches can be reviewed: bb-10.7-midenok- MDEV-25292 preview-10.10- MDEV-25292 -atomic-cor

            lstartseva Please test preview-10.10-MDEV-25292-atomic-cor

            midenok Aleksey Midenkov added a comment - lstartseva Please test preview-10.10- MDEV-25292 -atomic-cor

            midenok, I see that you filed a separate ticket MDEV-28753 and assigned to me, to review the InnoDB changes. I do not think that it is helpful to split any code review to multiple tickets. I have added a "blocking" relation, so that this ticket cannot be completed without that review being completed.

            Because no commit hashes were mentioned in earlier comments and the named branches have been overwritten, I cannot tell if any attempt to simplify the InnoDB changes as I requested was made.

            Did someone review the rest of the changes before this entered testing?

            marko Marko Mäkelä added a comment - midenok , I see that you filed a separate ticket MDEV-28753 and assigned to me, to review the InnoDB changes. I do not think that it is helpful to split any code review to multiple tickets. I have added a "blocking" relation, so that this ticket cannot be completed without that review being completed. Because no commit hashes were mentioned in earlier comments and the named branches have been overwritten, I cannot tell if any attempt to simplify the InnoDB changes as I requested was made. Did someone review the rest of the changes before this entered testing?

            marko Since I made it originally I do not see how it can be more simple, the changes are pretty simple. If you know how to improve them, please do the concrete suggestion. That's the point of review: when the other people see what can't be seen by the author. Other code was reviewed and approved by Michael Widenius.

            The new subtask is helpful because more than one people worked on the task simultaneously. To notify you I made a subtask. Blocking relation is not very helpful, the subtask just notifies you to do the review. I will not push that until your approval anyway.

            midenok Aleksey Midenkov added a comment - marko Since I made it originally I do not see how it can be more simple, the changes are pretty simple. If you know how to improve them, please do the concrete suggestion. That's the point of review: when the other people see what can't be seen by the author. Other code was reviewed and approved by Michael Widenius. The new subtask is helpful because more than one people worked on the task simultaneously. To notify you I made a subtask. Blocking relation is not very helpful, the subtask just notifies you to do the review. I will not push that until your approval anyway.

            I think that CREATE OR REPLACE TABLE must be implemented at as high level as possible, similar to how ha_innobase::truncate() currently implements TRUNCATE TABLE as a combination of RENAME, CREATE, DROP. The pseudocode should be something like this:

            1. Start a dictionary transaction.
            2. Try to rename a table from the specified name to a temporary name. If the table does not exist, fine.
            3. If the table existed, delete all its FOREIGN KEY constraints from SYS_FOREIGN and SYS_FOREIGN_COLS (in the same transaction).
            4. CREATE the table (in the same transaction).
            5. If the table existed on rename, drop the original table (in the same transaction).
            6. Commit the dictionary transaction.
            7. On any error, roll back the dictionary transaction. Try to avoid modifying the InnoDB dict_sys cache before the transaction was committed.

            Similar to TRUNCATE TABLE, FOREIGN KEY constraints (which are maintained inside InnoDB until MDEV-16417 has been implemented) will require some additional handling. During TRUNCATE, FOREIGN KEY constraints must simply be preserved; during the RENAME part of CREATE OR REPLACE TABLE, the constraints may have to be renamed or deleted specially in order to avoid duplicated values of databasename/constraintname. Implicit names of foreign key constraints have names like databasename/tablename_ibfk_1. I think that it would be simplest to delete all constraints that were part of the table. If the dictionary transaction is rolled back, the deletion of those records will be rolled back as well. There should be no need to rename any FOREIGN KEY constraints.

            I would like to see a clear description how FOREIGN KEY constraints are supposed to be handled. One commit message mentions "constraints" but does not specifically say FOREIGN KEY. For example, CHECK constraints are handled outside InnoDB. It also says something about "backup" and "tmp table", which might refer to renaming a persistent table to a generated name.

            Furthermore, I would like to see a high level description how CREATE OR REPLACE TABLE…SELECT is intended to work. Currently, CREATE TABLE…SELECT is executing two transactions. First, ha_innobase::create() will create and empty table, and then in a separate transaction, something like INSERT…SELECT will be executed. On any error (deadlock, lock wait timeout, duplicate key error), the table will be dropped. What is expected to happen in the scope of this work? Should an existing table be dropped, or should it be preserved if populating the replacement table fails?

            marko Marko Mäkelä added a comment - I think that CREATE OR REPLACE TABLE must be implemented at as high level as possible, similar to how ha_innobase::truncate() currently implements TRUNCATE TABLE as a combination of RENAME , CREATE , DROP . The pseudocode should be something like this: Start a dictionary transaction. Try to rename a table from the specified name to a temporary name. If the table does not exist, fine. If the table existed, delete all its FOREIGN KEY constraints from SYS_FOREIGN and SYS_FOREIGN_COLS (in the same transaction). CREATE the table (in the same transaction). If the table existed on rename, drop the original table (in the same transaction). Commit the dictionary transaction. On any error, roll back the dictionary transaction. Try to avoid modifying the InnoDB dict_sys cache before the transaction was committed. Similar to TRUNCATE TABLE , FOREIGN KEY constraints (which are maintained inside InnoDB until MDEV-16417 has been implemented) will require some additional handling. During TRUNCATE , FOREIGN KEY constraints must simply be preserved; during the RENAME part of CREATE OR REPLACE TABLE , the constraints may have to be renamed or deleted specially in order to avoid duplicated values of databasename/constraintname . Implicit names of foreign key constraints have names like databasename/tablename_ibfk_1 . I think that it would be simplest to delete all constraints that were part of the table. If the dictionary transaction is rolled back, the deletion of those records will be rolled back as well. There should be no need to rename any FOREIGN KEY constraints. I would like to see a clear description how FOREIGN KEY constraints are supposed to be handled. One commit message mentions "constraints" but does not specifically say FOREIGN KEY . For example, CHECK constraints are handled outside InnoDB. It also says something about "backup" and "tmp table", which might refer to renaming a persistent table to a generated name. Furthermore, I would like to see a high level description how CREATE OR REPLACE TABLE…SELECT is intended to work. Currently, CREATE TABLE…SELECT is executing two transactions. First, ha_innobase::create() will create and empty table, and then in a separate transaction, something like INSERT…SELECT will be executed. On any error (deadlock, lock wait timeout, duplicate key error), the table will be dropped. What is expected to happen in the scope of this work? Should an existing table be dropped, or should it be preserved if populating the replacement table fails?

            First to answer Marko:

            • Everything is done on a high level except a small change to InnoDB to inform it that we are creating
              a temporary table and a change in S3 to inform the top level that renames are very expensive.
            • The code guarantees that either you get a new table with all expected data or the old table will be restored.

            The commit message explains how the code works:

            Atomic replace algorithm

            Two DDL chains are used for CREATE OR REPLACE:
            ddl_log_state_create (C) and ddl_log_state_rm (D).

            1. (C) Log CREATE_TABLE_ACTION of TMP table (drops TMP table);
            2. Create new table as TMP;
            3. Do everything with TMP (like insert data);

            finalize_atomic_replace():
            4. Link chains: (D) is executed only if (C) is closed;
            5. (D) Log DROP_ACTION of BACKUP;
            6. (C) Log RENAME_TABLE_ACTION from ORIG to BACKUP (replays BACKUP -> ORIG);
            7. Rename ORIG to BACKUP;
            8. (C) Log CREATE_TABLE_ACTION of ORIG (drops ORIG);
            9. Rename TMP to ORIG;

            finalize_ddl() in case of success:
            10. Close (C);
            11. Replay (D): BACKUP is dropped.

            finalize_ddl() in case of error:
            10. Close (D);
            11. Replay (C):
            1) ORIG is dropped (only after finalize_atomic_replace());
            2) BACKUP renamed to ORIG (only after finalize_atomic_replace());
            3) drop TMP.

            Temporary table for CREATE OR REPLACE

            Before dropping "old" table, CREATE OR REPLACE creates "tmp" table.
            ddl_log_state_create holds the drop of the "tmp" table. When
            everything is OK (data is inserted, "tmp" is ready) ddl_log_state_rm
            is written to replace "old" with "tmp". Until ddl_log_state_create
            is closed ddl_log_state_rm is not executed.

            After the binlogging is done ddl_log_state_create is closed. At that
            point ddl_log_state_rm is executed and "tmp" is replaced with
            "old". That is: final rename is done by the DDL log.

            With that important role of DDL log for CREATE OR REPLACE operation
            replay of ddl_log_state_rm must fail at the first hit error and
            print the error message if possible. F.ex. foreign key error is
            discovered at this phase: InnoDB rejects to drop the "old" table and
            returns corresponding foreign key error code.

            Additional notes

            • CREATE TABLE without REPLACE is not affected by this commit.
            • Engines having HTON_EXPENSIVE_RENAME flag set are not affected by
              this commit.
            • CREATE TABLE .. SELECT XID usage is fixed and now there is no need
              to log DROP TABLE via DDL_CREATE_TABLE_PHASE_LOG (see comments in
              do_postlock()). XID is now correctly updated so it disables
              DDL_LOG_DROP_TABLE_ACTION. Note that binary log is flushed at the
              final stage when the table is ready. So if we have XID in the
              binary log we don't need to drop the table.
            • Three variations of CREATE OR REPLACE handled:

            1. CREATE OR REPLACE TABLE t1 (..);
            2. CREATE OR REPLACE TABLE t1 LIKE t2;
            3. CREATE OR REPLACE TABLE t1 SELECT ..;

            • Test case uses 5 combinations for engines (aria, aria_notrans,
              myisam, ib, expensive_rename) and 2 combinations for binlog types
              (row, stmt). Combinations help to check differences between the
              results. Error failures are tested for the above three variations.
            • Triggers mechanism is unaffected by this change. This is tested in
              create_replace.test.
            • LOCK TABLES is affected. Lock restoration must be done after "rm"
              chain is replayed.
            • Moved ddl_log_complete() from send_eof() to finalize_ddl(). This
              checkpoint was not executed before for normal CREATE TABLE but is
              executed now.

            This means that this checkpoint was not executed before for normal
            CREATE TABLE but is excuted now.

            Rename and drop via DDL log

            We replay ddl_log_state_rm to drop the old table and rename the
            temporary table. In that case we must throw the correct error
            message if ddl_log_revert() fails (f.ex. on FK error).

            If table is deleted earlier and not via DDL log and the crash
            happened, the create chain is not closed. Linked drop chain is not
            executed and the new table is not installed. But the old table is
            already deleted.

            ddl_log.cc changes

            Now we can place action before DDL_LOG_DROP_INIT_ACTION and it will
            be replayed after DDL_LOG_DROP_TABLE_ACTION.

            report_error parameter for ddl_log_revert() allows to fail at first
            error and print the error message if possible.
            ddl_log_execute_action() now can print error message.

            Since we now can handle errors from ddl_log_execute_action() (in
            case of non-recovery execution) unconditional setting "error= TRUE"
            is wrong (it was wrong anyway because it was overwritten at the end
            of the function).

            On XID usage

            Like with all other atomic DDL operations XID is used to avoid
            inconsistency between master and slave in the case of a crash after
            binary log is written and before ddl_log_state_create is closed. On
            recovery XIDs are taken from binary log and corresponding DDL log
            events get disabled. That is done by
            ddl_log_close_binlogged_events().

            On linking two chains together

            Chains are executed in the ascending order of entry_pos of execute
            entries. But entry_pos assignment order is undefined: it may assign
            bigger number for the first chain and then smaller number for the
            second chain. So the execution order in that case will be reverse:
            second chain will be executed first.

            To avoid that we link one chain to another. While the base chain
            (ddl_log_state_create) is active the secondary chain
            (ddl_log_state_rm) is not executed. That is: only one chain can be
            executed in two linked chains.

            The interface ddl_log_link_chains() was done in "MDEV-22166
            ddl_log_write_execute_entry() extension".

            More on CREATE OR REPLACE .. SELECT

            We use create_and_open_tmp_table() like in ALTER TABLE to create
            temporary TABLE object (tmp_table is (NON_)TRANSACTIONAL_TMP_TABLE).

            After we created such TABLE object we use create_info->tmp_table()
            instead of table->s->tmp_table when we need to check for
            parser-requested tmp-table.

            External locking is required for temporary table created by
            create_and_open_tmp_table(). F.ex. that disables logging for Aria
            transactional tables and without that (when no mysql_lock_tables()
            is done) it cannot work correctly.

            For making external lock the patch requires Aria table to work in
            non-transactional mode. That is usually done by
            ha_enable_transaction(false). But we cannot disable transaction
            completely because: 1. binlog rollback removes pending row events
            (binlog_remove_pending_rows_event()). The row events are added
            during CREATE .. SELECT data insertion phase. 2. replication slave
            highly depends on transaction and cannot work without it.

            So we put temporary Aria table into non-transactional mode with
            "thd->transaction->on hack". See comment for on_save variable.

            Note that Aria table has internal_table mode. But we cannot use it
            because:

            if (!internal_table)

            { mysql_mutex_lock(&THR_LOCK_myisam); old_info= test_if_reopen(name_buff); }

            For internal_table test_if_reopen() is not called and we get a new
            MARIA_SHARE for each file handler. In that case duplicate errors are
            missed because insert and lookup in CREATE .. SELECT is done via two
            different handlers (see create_lookup_handler()).

            For temporary table before dropping TABLE_SHARE by
            drop_temporary_table() we must do ha_reset(). ha_reset() releases
            storage share. Without that the share is kept and the second CREATE
            OR REPLACE .. SELECT fails with:

            HA_ERR_TABLE_EXIST (156): MyISAM table '#sql-create-b5377-4-t2' is
            in use (most likely by a MERGE table). Try FLUSH TABLES.

            HA_EXTRA_PREPARE_FOR_DROP also removes MYISAM_SHARE, but that is
            not needed as ha_reset() does the job.

            ha_reset() is usually done by
            mark_tmp_table_as_free_for_reuse(). But we don't need that mechanism
            for our temporary table.

            Atomic_info in HA_CREATE_INFO

            Many functions in CREATE TABLE pass the same parameters. These
            parameters are part of table creation info and should be in
            HA_CREATE_INFO (or whatever). Passing parameters via single
            structure is much easier for adding new data and
            refactoring.

            monty Michael Widenius added a comment - First to answer Marko: Everything is done on a high level except a small change to InnoDB to inform it that we are creating a temporary table and a change in S3 to inform the top level that renames are very expensive. The code guarantees that either you get a new table with all expected data or the old table will be restored. The commit message explains how the code works: Atomic replace algorithm Two DDL chains are used for CREATE OR REPLACE: ddl_log_state_create (C) and ddl_log_state_rm (D). 1. (C) Log CREATE_TABLE_ACTION of TMP table (drops TMP table); 2. Create new table as TMP; 3. Do everything with TMP (like insert data); finalize_atomic_replace(): 4. Link chains: (D) is executed only if (C) is closed; 5. (D) Log DROP_ACTION of BACKUP; 6. (C) Log RENAME_TABLE_ACTION from ORIG to BACKUP (replays BACKUP -> ORIG); 7. Rename ORIG to BACKUP; 8. (C) Log CREATE_TABLE_ACTION of ORIG (drops ORIG); 9. Rename TMP to ORIG; finalize_ddl() in case of success: 10. Close (C); 11. Replay (D): BACKUP is dropped. finalize_ddl() in case of error: 10. Close (D); 11. Replay (C): 1) ORIG is dropped (only after finalize_atomic_replace()); 2) BACKUP renamed to ORIG (only after finalize_atomic_replace()); 3) drop TMP. Temporary table for CREATE OR REPLACE Before dropping "old" table, CREATE OR REPLACE creates "tmp" table. ddl_log_state_create holds the drop of the "tmp" table. When everything is OK (data is inserted, "tmp" is ready) ddl_log_state_rm is written to replace "old" with "tmp". Until ddl_log_state_create is closed ddl_log_state_rm is not executed. After the binlogging is done ddl_log_state_create is closed. At that point ddl_log_state_rm is executed and "tmp" is replaced with "old". That is: final rename is done by the DDL log. With that important role of DDL log for CREATE OR REPLACE operation replay of ddl_log_state_rm must fail at the first hit error and print the error message if possible. F.ex. foreign key error is discovered at this phase: InnoDB rejects to drop the "old" table and returns corresponding foreign key error code. Additional notes CREATE TABLE without REPLACE is not affected by this commit. Engines having HTON_EXPENSIVE_RENAME flag set are not affected by this commit. CREATE TABLE .. SELECT XID usage is fixed and now there is no need to log DROP TABLE via DDL_CREATE_TABLE_PHASE_LOG (see comments in do_postlock()). XID is now correctly updated so it disables DDL_LOG_DROP_TABLE_ACTION. Note that binary log is flushed at the final stage when the table is ready. So if we have XID in the binary log we don't need to drop the table. Three variations of CREATE OR REPLACE handled: 1. CREATE OR REPLACE TABLE t1 (..); 2. CREATE OR REPLACE TABLE t1 LIKE t2; 3. CREATE OR REPLACE TABLE t1 SELECT ..; Test case uses 5 combinations for engines (aria, aria_notrans, myisam, ib, expensive_rename) and 2 combinations for binlog types (row, stmt). Combinations help to check differences between the results. Error failures are tested for the above three variations. Triggers mechanism is unaffected by this change. This is tested in create_replace.test. LOCK TABLES is affected. Lock restoration must be done after "rm" chain is replayed. Moved ddl_log_complete() from send_eof() to finalize_ddl(). This checkpoint was not executed before for normal CREATE TABLE but is executed now. This means that this checkpoint was not executed before for normal CREATE TABLE but is excuted now. Rename and drop via DDL log We replay ddl_log_state_rm to drop the old table and rename the temporary table. In that case we must throw the correct error message if ddl_log_revert() fails (f.ex. on FK error). If table is deleted earlier and not via DDL log and the crash happened, the create chain is not closed. Linked drop chain is not executed and the new table is not installed. But the old table is already deleted. ddl_log.cc changes Now we can place action before DDL_LOG_DROP_INIT_ACTION and it will be replayed after DDL_LOG_DROP_TABLE_ACTION. report_error parameter for ddl_log_revert() allows to fail at first error and print the error message if possible. ddl_log_execute_action() now can print error message. Since we now can handle errors from ddl_log_execute_action() (in case of non-recovery execution) unconditional setting "error= TRUE" is wrong (it was wrong anyway because it was overwritten at the end of the function). On XID usage Like with all other atomic DDL operations XID is used to avoid inconsistency between master and slave in the case of a crash after binary log is written and before ddl_log_state_create is closed. On recovery XIDs are taken from binary log and corresponding DDL log events get disabled. That is done by ddl_log_close_binlogged_events(). On linking two chains together Chains are executed in the ascending order of entry_pos of execute entries. But entry_pos assignment order is undefined: it may assign bigger number for the first chain and then smaller number for the second chain. So the execution order in that case will be reverse: second chain will be executed first. To avoid that we link one chain to another. While the base chain (ddl_log_state_create) is active the secondary chain (ddl_log_state_rm) is not executed. That is: only one chain can be executed in two linked chains. The interface ddl_log_link_chains() was done in " MDEV-22166 ddl_log_write_execute_entry() extension". More on CREATE OR REPLACE .. SELECT We use create_and_open_tmp_table() like in ALTER TABLE to create temporary TABLE object (tmp_table is (NON_)TRANSACTIONAL_TMP_TABLE). After we created such TABLE object we use create_info->tmp_table() instead of table->s->tmp_table when we need to check for parser-requested tmp-table. External locking is required for temporary table created by create_and_open_tmp_table(). F.ex. that disables logging for Aria transactional tables and without that (when no mysql_lock_tables() is done) it cannot work correctly. For making external lock the patch requires Aria table to work in non-transactional mode. That is usually done by ha_enable_transaction(false). But we cannot disable transaction completely because: 1. binlog rollback removes pending row events (binlog_remove_pending_rows_event()). The row events are added during CREATE .. SELECT data insertion phase. 2. replication slave highly depends on transaction and cannot work without it. So we put temporary Aria table into non-transactional mode with "thd->transaction->on hack". See comment for on_save variable. Note that Aria table has internal_table mode. But we cannot use it because: if (!internal_table) { mysql_mutex_lock(&THR_LOCK_myisam); old_info= test_if_reopen(name_buff); } For internal_table test_if_reopen() is not called and we get a new MARIA_SHARE for each file handler. In that case duplicate errors are missed because insert and lookup in CREATE .. SELECT is done via two different handlers (see create_lookup_handler()). For temporary table before dropping TABLE_SHARE by drop_temporary_table() we must do ha_reset(). ha_reset() releases storage share. Without that the share is kept and the second CREATE OR REPLACE .. SELECT fails with: HA_ERR_TABLE_EXIST (156): MyISAM table '#sql-create-b5377-4-t2' is in use (most likely by a MERGE table). Try FLUSH TABLES. HA_EXTRA_PREPARE_FOR_DROP also removes MYISAM_SHARE, but that is not needed as ha_reset() does the job. ha_reset() is usually done by mark_tmp_table_as_free_for_reuse(). But we don't need that mechanism for our temporary table. Atomic_info in HA_CREATE_INFO Many functions in CREATE TABLE pass the same parameters. These parameters are part of table creation info and should be in HA_CREATE_INFO (or whatever). Passing parameters via single structure is much easier for adding new data and refactoring.

            marko the atomic DDL is engine independent and therefore transactions independent. It has to handle some engine peculiarities though, one of this is the requirement to preserve original foreign keys and restore them back.

            midenok Aleksey Midenkov added a comment - marko the atomic DDL is engine independent and therefore transactions independent. It has to handle some engine peculiarities though, one of this is the requirement to preserve original foreign keys and restore them back.

            Why exactly was the use of InnoDB persistent statistics disabled in several tests? The commit message is very vague. How is that related to this work at all? I think that it should be filed and fixed separately. Failures of those tests seem to be very seldom, and I did not see any test failures even after reverting that change.

            Persistent statistics have been enabled by default since MySQL 5.6.6 or MariaDB Server 10.0. Ever since MDEV-25919 was implemented, no problems should be anticipated.

            More alarming is that in the test main.create_or_replace, the use of InnoDB persistent statistics is disabled. That seems to be totally unnecessary; only the metadata locks on the InnoDB statistics tables need to be filtered out from the expected result. I did not observe any failures after not disabling persistent statistics.

            I will post some suggested InnoDB code changes later.

            marko Marko Mäkelä added a comment - Why exactly was the use of InnoDB persistent statistics disabled in several tests ? The commit message is very vague. How is that related to this work at all? I think that it should be filed and fixed separately. Failures of those tests seem to be very seldom, and I did not see any test failures even after reverting that change. Persistent statistics have been enabled by default since MySQL 5.6.6 or MariaDB Server 10.0. Ever since MDEV-25919 was implemented, no problems should be anticipated. More alarming is that in the test main.create_or_replace , the use of InnoDB persistent statistics is disabled. That seems to be totally unnecessary; only the metadata locks on the InnoDB statistics tables need to be filtered out from the expected result. I did not observe any failures after not disabling persistent statistics . I will post some suggested InnoDB code changes later.

            I refactored dict_get_referenced_table() and its callers. I think that before this refactoring, there was a bug that CREATE OR REPLACE TABLE db.x with REFERENCES some_other_db.x was mishandled.

            As far as I understand, the rename-based approach is needed so that CREATE OR REPLACE TABLE…SELECT can restore the original table in case the INSERT…SELECT part of the statement fails. I think that the following scenarios must be tested (or existing mtr tests expanded):

            • Crash recovery where CREATE OR REPLACE TABLE is replacing a table, such that the same named constraint (say, CONSTRAINT a FOREIGN KEY b(c) REFERENCES parent(c)) is specified for both the old and the replacing table
            • Rollback of CREATE OR REPLACE TABLE…SELECT when named FOREIGN KEY constraints exist
            • Crash recovery and rollback where another table is referencing the table via a named constraint
            • Schema and table names starting with #mysql50#, such that FOREIGN KEY constraints exist between tables
            • CREATE OR REPLACE TABLE that adds or removes a FOREIGN KEY constraint

            I am requesting these, because I suspect that duplicate key errors for constraints like dbname/fk_constraint_name would be issued. I expect anonymous constraints to be fine; a unique internal name like dbname/tablename_ibfk_1 should be created when renaming the table.

            I plan to suggest some changes to row_rename_table_for_mysql() as well. I hope to preserve the assertion ut_ad(err != DB_DUPLICATE_KEY); in some form.

            marko Marko Mäkelä added a comment - I refactored dict_get_referenced_table() and its callers . I think that before this refactoring, there was a bug that CREATE OR REPLACE TABLE db.x with REFERENCES some_other_db.x was mishandled. As far as I understand, the rename-based approach is needed so that CREATE OR REPLACE TABLE…SELECT can restore the original table in case the INSERT…SELECT part of the statement fails. I think that the following scenarios must be tested (or existing mtr tests expanded): Crash recovery where CREATE OR REPLACE TABLE is replacing a table, such that the same named constraint (say, CONSTRAINT a FOREIGN KEY b(c) REFERENCES parent(c) ) is specified for both the old and the replacing table Rollback of CREATE OR REPLACE TABLE…SELECT when named FOREIGN KEY constraints exist Crash recovery and rollback where another table is referencing the table via a named constraint Schema and table names starting with #mysql50# , such that FOREIGN KEY constraints exist between tables CREATE OR REPLACE TABLE that adds or removes a FOREIGN KEY constraint I am requesting these, because I suspect that duplicate key errors for constraints like dbname/fk_constraint_name would be issued. I expect anonymous constraints to be fine; a unique internal name like dbname/tablename_ibfk_1 should be created when renaming the table. I plan to suggest some changes to row_rename_table_for_mysql() as well. I hope to preserve the assertion ut_ad(err != DB_DUPLICATE_KEY); in some form.
            serg Sergei Golubchik added a comment - - edited
            serg Sergei Golubchik added a comment - - edited In the branch preview-10.10-ddl And in bb-10.10-MDEV-25292

            I revised the RENAME changes in InnoDB and restored a removed debug assertion. Now this should be ready for testing.

            marko Marko Mäkelä added a comment - I revised the RENAME changes in InnoDB and restored a removed debug assertion. Now this should be ready for testing.

            marko You can check cross-reference report: multi_source.multi_parallel fails at least in 10.5 - 10.9.

            midenok Aleksey Midenkov added a comment - marko You can check cross-reference report : multi_source.multi_parallel fails at least in 10.5 - 10.9.

            More alarming is that in the test main.create_or_replace, the use of InnoDB persistent statistics is disabled. That seems to be totally unnecessary; only the metadata locks on the InnoDB statistics tables need to be filtered out from the expected result. I did not observe any failures after not disabling persistent statistics.

            I don't like 2-line-long select in create_or_replace.test in such many places. Too much garbage out of business logic. Is it important to have persistent statististics in that test?

            midenok Aleksey Midenkov added a comment - More alarming is that in the test main.create_or_replace, the use of InnoDB persistent statistics is disabled. That seems to be totally unnecessary; only the metadata locks on the InnoDB statistics tables need to be filtered out from the expected result. I did not observe any failures after not disabling persistent statistics. I don't like 2-line-long select in create_or_replace.test in such many places. Too much garbage out of business logic. Is it important to have persistent statististics in that test?

            midenok, yes, it is important to test with default settings. The InnoDB persistent statistics do play a role during DDL, and we must cover it. Work-arounds like MDEV-4750 can significantly reduce coverage in our regression tests. Should there be a race condition or other glitch somewhere, sooner or later a regression test would fail on our CI system. If we disable persistent statistics, we will never catch such failures.

            I amended the main commit with some clarified wording and my version of the InnoDB changes. I also confirmed that there is a regression for a case that I described 5 days ago:

            --source include/have_innodb.inc
            CREATE TABLE t(a INT PRIMARY KEY) ENGINE=InnoDB;
            CREATE OR REPLACE TABLE u(a INT PRIMARY KEY, CONSTRAINT c FOREIGN KEY d (a) REFERENCES t (a)) ENGINE=InnoDB;
            CREATE OR REPLACE TABLE u(a INT PRIMARY KEY, CONSTRAINT c FOREIGN KEY d (a) REFERENCES t (a)) ENGINE=InnoDB;
            DROP TABLE u, t;
            

            This test used to pass, but now it would fail due to a duplicate key on SYS_FOREIGN.NAME on the non-anonymous constraint name:

            mysqltest: At line 4: query 'CREATE OR REPLACE TABLE u(a INT PRIMARY KEY, CONSTRAINT c FOREIGN KEY d (a) REFERENCES t (a)) ENGINE=InnoDB' failed: ER_CANT_CREATE_TABLE (1005): Can't create table `test`.`u` (errno: 121 "Duplicate key on write or update")
            

            For anonymous constraints there is no problem, because the automatically generated names will be of the form databasename/tablename_ibfk_1 and the different table name will make the constraint name unique. With the above test case, assuming that the name of the current database is test, there will be a duplicate on test/c or test/d. I didn’t check which of the names InnoDB will use internally. If either c or d is specified in the CREATE OR REPLACE statement and the statement is duplicated, the test will fail. It will pass if neither c nor c is specified, that is, the constraint will be anonymous.

            This is a regression that needs to be fixed.

            marko Marko Mäkelä added a comment - midenok , yes, it is important to test with default settings. The InnoDB persistent statistics do play a role during DDL, and we must cover it. Work-arounds like MDEV-4750 can significantly reduce coverage in our regression tests. Should there be a race condition or other glitch somewhere, sooner or later a regression test would fail on our CI system. If we disable persistent statistics, we will never catch such failures. I amended the main commit with some clarified wording and my version of the InnoDB changes. I also confirmed that there is a regression for a case that I described 5 days ago : --source include/have_innodb.inc CREATE TABLE t(a INT PRIMARY KEY ) ENGINE=InnoDB; CREATE OR REPLACE TABLE u(a INT PRIMARY KEY , CONSTRAINT c FOREIGN KEY d (a) REFERENCES t (a)) ENGINE=InnoDB; CREATE OR REPLACE TABLE u(a INT PRIMARY KEY , CONSTRAINT c FOREIGN KEY d (a) REFERENCES t (a)) ENGINE=InnoDB; DROP TABLE u, t; This test used to pass, but now it would fail due to a duplicate key on SYS_FOREIGN.NAME on the non-anonymous constraint name: mysqltest: At line 4: query 'CREATE OR REPLACE TABLE u(a INT PRIMARY KEY, CONSTRAINT c FOREIGN KEY d (a) REFERENCES t (a)) ENGINE=InnoDB' failed: ER_CANT_CREATE_TABLE (1005): Can't create table `test`.`u` (errno: 121 "Duplicate key on write or update") For anonymous constraints there is no problem, because the automatically generated names will be of the form databasename/tablename_ibfk_1 and the different table name will make the constraint name unique. With the above test case, assuming that the name of the current database is test , there will be a duplicate on test/c or test/d . I didn’t check which of the names InnoDB will use internally. If either c or d is specified in the CREATE OR REPLACE statement and the statement is duplicated, the test will fail. It will pass if neither c nor c is specified, that is, the constraint will be anonymous. This is a regression that needs to be fixed.

            marko, no, if you enable some innodb functionality for all tests, you do not require to change all tests. And you do not require to add `where table_schema!='mysql' or table_name not like 'innodb_%_stats'` to every `select * from information_schema.metadata_lock_info`! Please, test everything you need in innodb suite.

            midenok Aleksey Midenkov added a comment - marko , no, if you enable some innodb functionality for all tests, you do not require to change all tests. And you do not require to add `where table_schema!='mysql' or table_name not like 'innodb_%_stats'` to every `select * from information_schema.metadata_lock_info`! Please, test everything you need in innodb suite.

            Testing is complete, if marko doesn't have any more objections, I guess it can be merged.

            lstartseva Lena Startseva added a comment - Testing is complete, if marko doesn't have any more objections, I guess it can be merged.

            I will review this when I am ready with MDEV-27180. Because of the extensive changes in the code, the review process does take some time.

            monty Michael Widenius added a comment - I will review this when I am ready with MDEV-27180 . Because of the extensive changes in the code, the review process does take some time.
            midenok Aleksey Midenkov added a comment - - edited

            monty I already closed the bugs. But this ticket waits you. The branch is bb-11.0-midenok-MDEV-25292 (27 top commits)

            midenok Aleksey Midenkov added a comment - - edited monty I already closed the bugs. But this ticket waits you. The branch is bb-11.0-midenok-MDEV-25292 (27 top commits)

            Can we schedule that please or assign to another person? The task waits more than half year.

            midenok Aleksey Midenkov added a comment - Can we schedule that please or assign to another person? The task waits more than half year.

            Updated branch is bb-main-midenok-MDEV-25292

            midenok Aleksey Midenkov added a comment - Updated branch is bb-main-midenok- MDEV-25292

            I filed MDEV-35895 for an anomaly that was found by monty while he was testing this. I didn’t investigate it deeper yet, but I suspect that this anomaly might be avoided if CREATE OR REPLACE TABLE…SELECT downgraded the exclusive meta-data lock (MDL_EXCLUSIVE) on the table name for the duration of copying the data (the INSERT…SELECT part), similar to how handler::inplace_alter_table() is being executed under a non-exclusive MDL. In that way, the purge of committed transaction history could proceed on a possible old copy of the table while a new copy is being created. InnoDB can only purge history in order. Until or unless the DDL operation has been committed, the purge can’t skip any undo log records for the old copy of the table.

            marko Marko Mäkelä added a comment - I filed MDEV-35895 for an anomaly that was found by monty while he was testing this. I didn’t investigate it deeper yet, but I suspect that this anomaly might be avoided if CREATE OR REPLACE TABLE…SELECT downgraded the exclusive meta-data lock ( MDL_EXCLUSIVE ) on the table name for the duration of copying the data (the INSERT…SELECT part), similar to how handler::inplace_alter_table() is being executed under a non-exclusive MDL. In that way, the purge of committed transaction history could proceed on a possible old copy of the table while a new copy is being created. InnoDB can only purge history in order. Until or unless the DDL operation has been committed, the purge can’t skip any undo log records for the old copy of the table.

            bb-11.8-monty pushed for QA testing.

            monty Michael Widenius added a comment - bb-11.8-monty pushed for QA testing.

            bb-11.8-monty updated
            All known things are fixed, except the mdl locks that InnoDB needs.

            monty Michael Widenius added a comment - bb-11.8-monty updated All known things are fixed, except the mdl locks that InnoDB needs.
            serg Sergei Golubchik added a comment - - edited See commits https://github.com/MariaDB/server/commit/6ff731317239 https://github.com/MariaDB/server/commit/5ed14c2fdeff https://github.com/MariaDB/server/commit/a6b0394cffb4 https://github.com/MariaDB/server/commit/59e3b88fbd8e https://github.com/MariaDB/server/commit/9c3281fc9679 https://github.com/MariaDB/server/commit/c0fe20dbf24 Please, consider cherry-picking them into bb-11.8-monty
            monty Michael Widenius added a comment - - edited

            Merge of the above commits are now done. Each change has been added to it's original commit
            pushed to bb-11.8-monty

            monty Michael Widenius added a comment - - edited Merge of the above commits are now done. Each change has been added to it's original commit pushed to bb-11.8-monty
            monty Michael Widenius added a comment -

            bb-main-monty is now updated to latest main.

            monty Michael Widenius added a comment - bb-main-monty is now updated to latest main.

            People

              elenst Elena Stepanova
              midenok Aleksey Midenkov
              Votes:
              2 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.