[MDEV-25292] Atomic CREATE OR REPLACE TABLE Created: 2021-03-29 Updated: 2024-01-31 |
|
| Status: | Stalled |
| Project: | MariaDB Server |
| Component/s: | Data Definition - Create Table |
| Fix Version/s: | 11.5 |
| Type: | New Feature | Priority: | Critical |
| Reporter: | Aleksey Midenkov | Assignee: | Michael Widenius |
| Resolution: | Unresolved | Votes: | 1 |
| Labels: | Preview_10.10 | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Sub-Tasks: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
Atomic drop table ( This is because current, as before, CREATE OR REPLACE TABLE foo ... is implemented as: DROP TABLE IF EXISTS foo; Because of that failed CREATE OR REPLACE TABLE or a crash during the create will lead to lost table, if the original table existed. This task equips table drop with backing up the old table until the new table was successfully created. In other words, this task ensures that if one executes CREATE OR REPLACE TABLE foo on an existing table, the table foo will either be the new or old table after the statement has executed, including in the cases the CREATE fails or the server crashes. NOTE: this change was in a preview release for MariaDB Community Server 10.10.0, but could not be added to 10.10 RC |
| Comments |
| Comment by Aleksey Midenkov [ 2021-11-29 ] | ||||||
|
Please review bb-10.7-midenok-MDEV-25292 | ||||||
| Comment by Aleksey Midenkov [ 2022-03-16 ] | ||||||
|
marko Please review InnoDB part of bb-10.7-midenok-MDEV-25292 in main commit MDEV-25292 Atomic CREATE OR REPLACE TABLE | ||||||
| Comment by Aleksey Midenkov [ 2022-03-21 ] | ||||||
|
elenst Please test bb-10.7-MDEV-25292-atomic-replace | ||||||
| Comment by Marko Mäkelä [ 2022-04-29 ] | ||||||
|
Sorry, I had missed the request to review the InnoDB changes. I found a branch from March 22 that is 12 commits ahead and 53 commits behind 10.7. I think that this needs to be based on the latest development branch. I would appreciate a more detailed description of the changes, and the reasons for changing lower-level InnoDB code. I would suggest you to check the TRUNCATE TABLE implementation once more and consider if a smaller change is possible. If the motivation of the more extensive changes is to support the addition of FOREIGN KEY constraints with something like the following:
then it should be mentioned in the commit message and there should be such test cases in the same commit. | ||||||
| Comment by Aleksey Midenkov [ 2022-05-30 ] | ||||||
|
marko "InnoDB changes" article has been added. Any of the branches can be reviewed: | ||||||
| Comment by Aleksey Midenkov [ 2022-06-04 ] | ||||||
|
lstartseva Please test preview-10.10-MDEV-25292-atomic-cor | ||||||
| Comment by Marko Mäkelä [ 2022-06-06 ] | ||||||
|
midenok, I see that you filed a separate ticket Because no commit hashes were mentioned in earlier comments and the named branches have been overwritten, I cannot tell if any attempt to simplify the InnoDB changes as I requested was made. Did someone review the rest of the changes before this entered testing? | ||||||
| Comment by Aleksey Midenkov [ 2022-06-09 ] | ||||||
|
marko Since I made it originally I do not see how it can be more simple, the changes are pretty simple. If you know how to improve them, please do the concrete suggestion. That's the point of review: when the other people see what can't be seen by the author. Other code was reviewed and approved by Michael Widenius. The new subtask is helpful because more than one people worked on the task simultaneously. To notify you I made a subtask. Blocking relation is not very helpful, the subtask just notifies you to do the review. I will not push that until your approval anyway. | ||||||
| Comment by Marko Mäkelä [ 2022-06-13 ] | ||||||
|
I think that CREATE OR REPLACE TABLE must be implemented at as high level as possible, similar to how ha_innobase::truncate() currently implements TRUNCATE TABLE as a combination of RENAME, CREATE, DROP. The pseudocode should be something like this:
Similar to TRUNCATE TABLE, FOREIGN KEY constraints (which are maintained inside InnoDB until MDEV-16417 has been implemented) will require some additional handling. During TRUNCATE, FOREIGN KEY constraints must simply be preserved; during the RENAME part of CREATE OR REPLACE TABLE, the constraints may have to be renamed or deleted specially in order to avoid duplicated values of databasename/constraintname. Implicit names of foreign key constraints have names like databasename/tablename_ibfk_1. I think that it would be simplest to delete all constraints that were part of the table. If the dictionary transaction is rolled back, the deletion of those records will be rolled back as well. There should be no need to rename any FOREIGN KEY constraints. I would like to see a clear description how FOREIGN KEY constraints are supposed to be handled. One commit message mentions "constraints" but does not specifically say FOREIGN KEY. For example, CHECK constraints are handled outside InnoDB. It also says something about "backup" and "tmp table", which might refer to renaming a persistent table to a generated name. Furthermore, I would like to see a high level description how CREATE OR REPLACE TABLE…SELECT is intended to work. Currently, CREATE TABLE…SELECT is executing two transactions. First, ha_innobase::create() will create and empty table, and then in a separate transaction, something like INSERT…SELECT will be executed. On any error (deadlock, lock wait timeout, duplicate key error), the table will be dropped. What is expected to happen in the scope of this work? Should an existing table be dropped, or should it be preserved if populating the replacement table fails? | ||||||
| Comment by Michael Widenius [ 2022-06-13 ] | ||||||
|
First to answer Marko:
The commit message explains how the code works: Atomic replace algorithm Two DDL chains are used for CREATE OR REPLACE: 1. (C) Log CREATE_TABLE_ACTION of TMP table (drops TMP table); finalize_atomic_replace(): finalize_ddl() in case of success: finalize_ddl() in case of error: Temporary table for CREATE OR REPLACE Before dropping "old" table, CREATE OR REPLACE creates "tmp" table. After the binlogging is done ddl_log_state_create is closed. At that With that important role of DDL log for CREATE OR REPLACE operation Additional notes
1. CREATE OR REPLACE TABLE t1 (..);
This means that this checkpoint was not executed before for normal Rename and drop via DDL log We replay ddl_log_state_rm to drop the old table and rename the If table is deleted earlier and not via DDL log and the crash ddl_log.cc changes Now we can place action before DDL_LOG_DROP_INIT_ACTION and it will report_error parameter for ddl_log_revert() allows to fail at first Since we now can handle errors from ddl_log_execute_action() (in On XID usage Like with all other atomic DDL operations XID is used to avoid On linking two chains together Chains are executed in the ascending order of entry_pos of execute To avoid that we link one chain to another. While the base chain The interface ddl_log_link_chains() was done in " More on CREATE OR REPLACE .. SELECT We use create_and_open_tmp_table() like in ALTER TABLE to create After we created such TABLE object we use create_info->tmp_table() External locking is required for temporary table created by For making external lock the patch requires Aria table to work in So we put temporary Aria table into non-transactional mode with Note that Aria table has internal_table mode. But we cannot use it if (!internal_table) { mysql_mutex_lock(&THR_LOCK_myisam); old_info= test_if_reopen(name_buff); } For internal_table test_if_reopen() is not called and we get a new For temporary table before dropping TABLE_SHARE by HA_ERR_TABLE_EXIST (156): MyISAM table '#sql-create-b5377-4-t2' is HA_EXTRA_PREPARE_FOR_DROP also removes MYISAM_SHARE, but that is ha_reset() is usually done by Atomic_info in HA_CREATE_INFO Many functions in CREATE TABLE pass the same parameters. These | ||||||
| Comment by Aleksey Midenkov [ 2022-06-13 ] | ||||||
|
marko the atomic DDL is engine independent and therefore transactions independent. It has to handle some engine peculiarities though, one of this is the requirement to preserve original foreign keys and restore them back. | ||||||
| Comment by Marko Mäkelä [ 2022-06-17 ] | ||||||
|
Why exactly was the use of InnoDB persistent statistics disabled in several tests? The commit message is very vague. How is that related to this work at all? I think that it should be filed and fixed separately. Failures of those tests seem to be very seldom, and I did not see any test failures even after reverting that change. Persistent statistics have been enabled by default since MySQL 5.6.6 or MariaDB Server 10.0. Ever since More alarming is that in the test main.create_or_replace, the use of InnoDB persistent statistics is disabled. That seems to be totally unnecessary; only the metadata locks on the InnoDB statistics tables need to be filtered out from the expected result. I did not observe any failures after not disabling persistent statistics. I will post some suggested InnoDB code changes later. | ||||||
| Comment by Marko Mäkelä [ 2022-06-17 ] | ||||||
|
I refactored dict_get_referenced_table() and its callers. I think that before this refactoring, there was a bug that CREATE OR REPLACE TABLE db.x with REFERENCES some_other_db.x was mishandled. As far as I understand, the rename-based approach is needed so that CREATE OR REPLACE TABLE…SELECT can restore the original table in case the INSERT…SELECT part of the statement fails. I think that the following scenarios must be tested (or existing mtr tests expanded):
I am requesting these, because I suspect that duplicate key errors for constraints like dbname/fk_constraint_name would be issued. I expect anonymous constraints to be fine; a unique internal name like dbname/tablename_ibfk_1 should be created when renaming the table. I plan to suggest some changes to row_rename_table_for_mysql() as well. I hope to preserve the assertion ut_ad(err != DB_DUPLICATE_KEY); in some form. | ||||||
| Comment by Sergei Golubchik [ 2022-06-18 ] | ||||||
|
In the branch preview-10.10-ddl | ||||||
| Comment by Marko Mäkelä [ 2022-06-21 ] | ||||||
|
I revised the RENAME changes in InnoDB and restored a removed debug assertion. Now this should be ready for testing. | ||||||
| Comment by Aleksey Midenkov [ 2022-06-21 ] | ||||||
|
marko You can check cross-reference report: multi_source.multi_parallel fails at least in 10.5 - 10.9. | ||||||
| Comment by Aleksey Midenkov [ 2022-06-21 ] | ||||||
I don't like 2-line-long select in create_or_replace.test in such many places. Too much garbage out of business logic. Is it important to have persistent statististics in that test? | ||||||
| Comment by Marko Mäkelä [ 2022-06-23 ] | ||||||
|
midenok, yes, it is important to test with default settings. The InnoDB persistent statistics do play a role during DDL, and we must cover it. Work-arounds like I amended the main commit with some clarified wording and my version of the InnoDB changes. I also confirmed that there is a regression for a case that I described 5 days ago:
This test used to pass, but now it would fail due to a duplicate key on SYS_FOREIGN.NAME on the non-anonymous constraint name:
For anonymous constraints there is no problem, because the automatically generated names will be of the form databasename/tablename_ibfk_1 and the different table name will make the constraint name unique. With the above test case, assuming that the name of the current database is test, there will be a duplicate on test/c or test/d. I didn’t check which of the names InnoDB will use internally. If either c or d is specified in the CREATE OR REPLACE statement and the statement is duplicated, the test will fail. It will pass if neither c nor c is specified, that is, the constraint will be anonymous. This is a regression that needs to be fixed. | ||||||
| Comment by Aleksey Midenkov [ 2022-06-27 ] | ||||||
|
marko, no, if you enable some innodb functionality for all tests, you do not require to change all tests. And you do not require to add `where table_schema!='mysql' or table_name not like 'innodb_%_stats'` to every `select * from information_schema.metadata_lock_info`! Please, test everything you need in innodb suite. | ||||||
| Comment by Lena Startseva [ 2022-07-28 ] | ||||||
|
Testing is complete, if marko doesn't have any more objections, I guess it can be merged. | ||||||
| Comment by Michael Widenius [ 2023-01-25 ] | ||||||
|
I will review this when I am ready with MDEV-27180. Because of the extensive changes in the code, the review process does take some time. | ||||||
| Comment by Aleksey Midenkov [ 2023-01-26 ] | ||||||
|
monty I already closed the bugs. But this ticket waits you. The branch is bb-11.0-midenok-MDEV-25292 (27 top commits) | ||||||
| Comment by Aleksey Midenkov [ 2023-09-05 ] | ||||||
|
Can we schedule that please or assign to another person? The task waits more than half year. |