[MDEV-21602] CREATE TABLE…PRIMARY KEY…SELECT workaround causes DROP TABLE to ignore locks Created: 2020-01-30  Updated: 2021-06-28  Resolved: 2021-06-28

Status: Closed
Project: MariaDB Server
Component/s: Data Definition - Create Table, Locking, Storage Engine - InnoDB
Affects Version/s: 5.5, 10.0, 10.1, 10.2, 10.3, 10.4, 10.5
Fix Version/s: 10.6.2

Type: Bug Priority: Critical
Reporter: Andrei Elkin Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: not-10.6

Issue Links:
Blocks
blocks MDEV-21175 Remove dict_table_t::n_foreign_key_ch... Closed
blocks MDEV-21283 InnoDB: MySQL is trying to drop datab... Closed
blocks MDEV-22733 XA PREPARE breaks MDL in pseudo_slave... Stalled
PartOf
is part of MDEV-25506 Atomic DDL: .frm file is removed and ... Closed
Relates
relates to MDEV-742 LP:803649 - Xa recovery failed on cli... Closed
relates to MDEV-23223 If you cancel CTAS Query on MariaDB S... Closed
relates to MDEV-23484 Rollback unnecessarily acquires dict_... Closed
relates to MDEV-25921 Implement CREATE TABLE...SELECT in on... Open
relates to MDEV-21820 create table ... select is metadata l... Closed
relates to MDEV-23234 move MDL from THD to st_transaction Open
relates to MDEV-24225 Long semaphore wait in dict0dict.cc l... Closed

 Description   

A table is allowed to be dropped despite being in use:

 
--connection crash_me
CREATE TABLE t1 (pk INT PRIMARY KEY) ENGINE=InnoDB;
xa start '1'; insert into t1 set pk=1; xa end '1'; xa prepare '1';
 
# KILL server_process & restart
xa recover; # =>
+----------+--------------+--------------+------+
| formatID | gtrid_length | bqual_length | data |
+----------+--------------+--------------+------+
|        1 |            1 |            0 | 1    |
+----------+--------------+--------------+------+
1 row in set (0.00 sec)
 
--connection default
--error 0
DROP TABLE t1;

Notice that use cases of survived prepared user XA have involved MDEV-742 disconnected prepared so instead of the crash a simple --diconnect would leave the trx prepared and offended by the following DROP.



 Comments   
Comment by Marko Mäkelä [ 2020-01-30 ]

Elkin, you are observing the effects of a work-around for a bug that CREATE TABLE…PRIMARY KEY…SELECT is internally invoking handler::delete_table() while the internal INSERT…SELECT part of the work still holds an open table handle or transaction.
I conducted a simple test with this patch, which identifies the location of the work-around:

diff --git a/storage/innobase/row/row0mysql.cc b/storage/innobase/row/row0mysql.cc
index b20dd44f21b..0c7e6c0d9ef 100644
--- a/storage/innobase/row/row0mysql.cc
+++ b/storage/innobase/row/row0mysql.cc
@@ -3545,7 +3545,7 @@ row_drop_table_for_mysql(
 
 	if (table->get_ref_count() > 0 || table->n_rec_locks > 0
 	    || lock_table_has_locks(table)) {
-		goto defer;
+		ut_error;
 	}
 
 	/* The "to_be_dropped" marks table that is to be dropped, but

./mtr binlog.binlog_stm_mix_innodb_myisam

10.5 5defdc382bbf606b83e556c4f0d29dcd7954ebbc (with patch)

CURRENT_TEST: binlog.binlog_stm_mix_innodb_myisam
mysqltest: In included file "/mariadb/10.5/mysql-test/suite/binlog/include/mix_innodb_myisam_binlog.test": 
included from /mariadb/10.5/mysql-test/suite/binlog/t/binlog_stm_mix_innodb_myisam.test at line 9:
At line 253: query 'CREATE TABLE t2 (primary key (a)) engine=innodb select * from t1' failed with wrong errno 2013: 'Lost connection to MySQL server during query', instead of 1062...
#6  0x000055a242989a2e in row_drop_table_for_mysql (name=<optimized out>, trx=<optimized out>, sqlcom=SQLCOM_CREATE_TABLE, create_failed=false, nonatomic=<optimized out>) at /mariadb/10.5/storage/innobase/row/row0mysql.cc:3548
#7  0x000055a2428271ea in ha_innobase::delete_table (this=<optimized out>, name=0x7fb923db28d0 "./test/t2", sqlcom=SQLCOM_CREATE_TABLE) at /mariadb/10.5/storage/innobase/handler/ha_innodb.cc:13479
#8  0x000055a2424ed8d4 in handler::ha_delete_table (this=0x7fb8dc016c18, name=0x7fb923db28d0 "./test/t2") at /mariadb/10.5/sql/handler.cc:4684
#9  ha_delete_table (thd=0x7fb8dc000d38, table_type=<optimized out>, path=0x7fb923db28d0 "./test/t2", db=0x7fb8dc011790, alias=<optimized out>, generate_warning=false) at /mariadb/10.5/sql/handler.cc:2577
#10 0x000055a242331bde in quick_rm_table (thd=0x7fb923db0320, base=<optimized out>, db=0x7fb8dc011790, table_name=0x7fb8dc0117a0, flags=0, table_path=<optimized out>) at /mariadb/10.5/sql/sql_table.cc:2770
#11 0x000055a2421fea7c in drop_open_table (thd=0x7fb8dc000d38, table=0x7fb8dc048d88, db_name=0x7fb8dc011790, table_name=0x7fb8dc0117a0) at /mariadb/10.5/sql/sql_base.cc:1453
#12 0x000055a24224d6ae in select_create::abort_result_set (this=0x7fb8dc0136a0) at /mariadb/10.5/sql/sql_insert.cc:4924
#13 0x000055a2422b4cf6 in handle_select (thd=0x7fb8dc000d38, lex=<optimized out>, result=0x7fb8dc0136a0, setup_tables_done_option=<optimized out>) at /mariadb/10.5/sql/sql_select.cc:434
#14 0x000055a2423446d4 in Sql_cmd_create_table_like::execute (this=<optimized out>, thd=0x7fb8dc000d38) at /mariadb/10.5/sql/sql_table.cc:11469
#15 0x000055a2422819fc in mysql_execute_command (thd=0x7fb8dc000d38) at /mariadb/10.5/sql/sql_parse.cc:5959
#16 0x000055a24227c968 in mysql_parse (thd=0x7fb8dc000d38, rawbuf=0x7fb8dc011630 "CREATE TABLE t2 (primary key (a)) engine=innodb select * from t1", length=<optimized out>, parser_state=<optimized out>, is_com_multi=<optimized out>, is_next_command=<optimized out>) at /mariadb/10.5/sql/sql_parse.cc:7988
#17 0x000055a2422791d2 in dispatch_command (command=COM_QUERY, thd=0x7fb8dc000d38, packet=0x7fb8dc008329 "CREATE TABLE t2 (primary key (a)) engine=innodb select * from t1", packet_length=64, is_com_multi=false, is_next_command=false) at /mariadb/10.5/sql/sql_parse.cc:1845

This demonstrates that the workaround is needed for the incorrect error handling in CREATE TABLE…SELECT, as explained in the comment before the line that the patch is touching.

Please fix the CREATE TABLE…SELECT code so that this assertion will not be hit in the patched InnoDB, and then ask me to correct the InnoDB error handling. I think that both should be done in a single commit as part of this bug fix. But I will need the upper layer fix first.

Comment by Andrei Elkin [ 2020-01-31 ]

marko: To CREATE TABLE…SELECT "workaround" involvement, I have tested it out and here is my results.
Actually C..S handler acquires MDL_EXCLUSIVE lock before the new table is created:

Thread 16 "mysqld" hit Breakpoint 21, MDL_context::acquire_lock (this=0x62b00008c330, mdl_request=0x7fffe5588690, lock_wait_timeout=86400) at mdl.cc:2241
$100 = MDL_EXCLUSIVE
$101 = "\002test\000tt\000
(gdb) bt
#0  MDL_context::acquire_lock (this=0x62b00008c330, mdl_request=0x7fffe5588690, lock_wait_timeout=86400) at mdl.cc:2241
#1  0x0000555556da2082 in MDL_context::upgrade_shared_lock (this=0x62b00008c330, mdl_ticket=0x6060000e2280, new_type=MDL_EXCLUSIVE, lock_wait_timeout=86400) at mdl.cc:2527
#2  0x00005555567d92a0 in upgrade_lock_if_not_exists (thd=0x62b00008c208, create_info=..., create_table=0x61c0000510a0, lock_wait_timeout=86400) at sql_base.cc:3969
#3  0x00005555567d9cac in lock_table_names (thd=0x62b00008c208, options=..., tables_start=0x61c0000510a0, tables_end=0x0, lock_wait_timeout=86400, flags=0) at sql_base.cc:4074
#4  0x00005555567daa39 in open_tables (thd=0x62b00008c208, options=..., start=0x7fffe5588f38, counter=0x7fffe5588f80, flags=0, prelocking_strategy=0x7fffe55890e0) at sql_base.cc:4279
#5  0x00005555567e0222 in open_and_lock_tables (thd=0x62b00008c208, options=..., tables=0x61c0000510a0, derived=true, flags=0, prelocking_strategy=0x7fffe55890e0) at sql_base.cc:5204
#6  0x0000555556c0d098 in open_and_lock_tables (thd=0x62b00008c208, options=..., tables=0x61c0000510a0, derived=true, flags=0) at sql_base.h:497
#7  0x0000555556c09fea in Sql_cmd_create_table_like::execute (this=0x6060000e1ee0, thd=0x62b00008c208) at sql_table.cc:11419
#8  0x000055555697708a in mysql_execute_command (thd=0x62b00008c208) at sql_parse.cc:5959
#9  0x00005555569845aa in mysql_parse (thd=0x62b00008c208, rawbuf=0x610000008c60 "create table tt (a int) engine=innodb select * from t", length=53, parser_state=0x7fffe558a860, is_com_multi=false, is_next_command=false) at sql_parse.cc:7988

At this point tt does not exit yet. Upon return from MDL_context::acquire_lock the lock is held all way down to ha_commit_trans and is downgraded at the end of the insert-select statement commit.

I only can conclude out of this observation that the C..S workaround is no longer relevant and its engine traces should be removed.

Whether that would be enough to turn DROP fully respect X locks by the prepared XA is an open issue, which is of your team.

Comment by Marko Mäkelä [ 2020-02-24 ]

Elkin, MDL have nothing to do with the conditions that are identified in the patch that I posted to highlight the problematic scenario:

 	if (table->get_ref_count() > 0 || table->n_rec_locks > 0
 	    || lock_table_has_locks(table)) {

The problem always was that ha_innobase::delete_table() is being invoked by CREATE TABLE…SELECT while the table is in active use. The get_ref_count() should return 0 after ha_innobase::close() was called. I think that the main problem should be lock_table_has_locks(table). Until the INSERT…SELECT part of the operation is rolled back, there should exist an IX lock on the table.

Again, InnoDB table locks are completely separate from MDL, but MDEV-20612 might eliminate InnoDB table locks in a future version.

I tried to explain that CREATE TABLE…SELECT must be rewritten so that it will first roll back the transaction, and only after that invoke handler::delete_table(). Ideally, that call should be unnecessary for InnoDB, but currently we do create the table in a separate InnoDB-internal transaction that will be committed before the INSERT…SELECT part.

We cannot remove the problematic work-around from InnoDB before the underlying bugs have been fixed outside InnoDB.

I am reassigning this back to you, so that you can work on fixing CREATE TABLE…SELECT outside InnoDB, or ask someone else to fix it. Once a fix for that is available, we can try to remove the InnoDB work-around. But we must still test it extensively, especially with DDL on tables that contain FOREIGN KEY constraints. The workaround might also be necessary until MDEV-21175 has been fixed.

Comment by Andrei Elkin [ 2020-03-23 ]

MDEV-742 adds more urgency to fix this issue.

Comment by Andrei Elkin [ 2020-06-04 ]

serg This bug requires some design efforts to make MDL to survive disconnection. It's certainly beyond the current release plan and my question is whether it should be targeted to 10.6 actually?

The description use case must be telling enough how practical it is. Could you please help to decide?

Comment by Marko Mäkelä [ 2020-06-08 ]

MDEV-22733 (which was caused by MDEV-742 in 10.5.2) was closed, claiming to be a duplicate of this bug, even though this bug affects much earlier versions than 10.5.

Comment by Marko Mäkelä [ 2021-05-28 ]

In MariaDB Server 10.6, this should be fixed on the InnoDB side by the DROP rewrite (and removal of the ‘background drop table’ queue) that is the third fix related to MDEV-25506. The CREATE TABLE…SELECT will consist of two transactions, mandated by the SQL layer:

  1. CREATE TABLE
  2. INSERT…SELECT: the error handling will invoke ha_innobase::delete_table() while the transaction is still active

ha_innobase::delete_table() can avoid starting a new transaction, and instead reuse the second transaction that was started by the SQL layer. We can simply drop the table and commit the transaction. Such hack even seems to work for partitioned tables, even though I expect that only one of the ha_innobase::delete_table() invocations will use the hijacked transaction. I tested it with the following:

--source include/have_innodb.inc
--source include/have_partition.inc
 
CREATE TABLE t(a INT) ENGINE=InnoDB;
INSERT INTO t VALUES(1),(1);
 
--error ER_DUP_ENTRY
CREATE TABLE t1(PRIMARY KEY(a)) ENGINE=InnoDB
SELECT * FROM t;
--error ER_DUP_ENTRY
CREATE TABLE t1(PRIMARY KEY(a)) ENGINE=InnoDB
PARTITION BY HASH(a) PARTITIONS 5 SELECT * FROM t;
--error ER_DUP_ENTRY
CREATE TEMPORARY TABLE t0(PRIMARY KEY(a)) ENGINE=InnoDB SELECT * FROM t;
 
DROP TABLE t;

Comment by Marko Mäkelä [ 2021-05-28 ]

For 10.6 as part of the work-in-progress MDEV-25506 part 3 of 3, I implemented this CREATE…SELECT detection in ha_innobase::delete_table(). The first condition is for avoiding CREATE…REPLACE.

  trx_t *trx= parent_trx;
  if (!trx->lock.table_locks.empty() &&
      thd_sql_command(trx->mysql_thd) == SQLCOM_CREATE_TABLE)
  {
#if 0 // MDEV-21602 FIXME: this fails for innodb.innodb and some others
    for (const lock_t *l : trx->lock.table_locks)
      if (l && l->type_mode == (LOCK_IX | LOCK_TABLE) &&
          l->un_member.tab_lock.table == table)
        goto create_select;
    sql_print_warning("InnoDB: CREATE...SELECT did not hold expected locks");
create_select:
#endif
    /* CREATE TABLE...PRIMARY KEY...SELECT ought to be dropping the
    table because a duplicate key was detected. We shall hijack the
    existing transaction to drop the table and commit the transaction.
    If this is a partitioned table, one partition will use this hijacked
    transaction; others will use a separate transaction, one per partition. */

The warning message above would also be issued for the above partitioning test, if the duplicates are not being reported for the first partition. (We would hijack the transaction anyway for dropping the first partition.) I think that such logic is fragile, and it would be better to have something that reliably distinguishes CREATE TABLE…SELECT from other statements, such as CREATE OR REPLACE TABLE.

Comment by Marko Mäkelä [ 2021-06-09 ]

MDEV-25506 in MariaDB Server 10.6.2 essentially fixed this. A less hacky solution could still be preferred.

Comment by Marko Mäkelä [ 2021-06-15 ]

My concern regarding detecting CREATE…SELECT in 10.6 was addressed.

Generated at Thu Feb 08 09:08:23 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.