[MDEV-16465] Invalid (old?) table or database name or hang in ha_innobase::delete_table and log semaphore wait upon concurrent DDL with foreign keys Created: 2018-06-11  Updated: 2019-03-07  Resolved: 2018-09-04

Status: Closed
Project: MariaDB Server
Component/s: Data Definition - Alter Table, Locking, Storage Engine - InnoDB
Affects Version/s: 10.0, 10.1, 10.2, 10.3
Fix Version/s: 10.0.37, 10.1.36, 10.2.18, 10.3.10

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Sergei Golubchik
Resolution: Fixed Votes: 1
Labels: foreign-keys, mdl

Attachments: HTML File threads    
Issue Links:
Problem/Incident
causes MDEV-17595 Server crashes in copy_data_between_t... Closed
Relates
relates to MDEV-13564 TRUNCATE TABLE and undo tablespace tr... Closed
relates to MDEV-13200 Assertion `index->id == btr_page_get_... Closed

 Description   

--source include/have_innodb.inc
 
CREATE TABLE t1 (a INT NOT NULL) ENGINE=InnoDB;
CREATE TABLE t2 (b INT PRIMARY KEY) ENGINE=InnoDB;
 
--connect (con1,localhost,root,,)
--send
  ALTER TABLE t1 ADD CONSTRAINT FOREIGN KEY (a) REFERENCES t2(b) ON DELETE CASCADE ON UPDATE CASCADE;
 
--connection default
CREATE TABLE t3 LIKE t1;
UPDATE t2 SET b = 1;
--connection con1
--reap
 
--disconnect con1
--connection default
DROP TABLE IF EXISTS t3;
DROP TABLE IF EXISTS t1;
DROP TABLE IF EXISTS t2;

The test case is non-deterministic, so it can end up differently.
Sometimes it just passes.
Sometimes it executes to the end, but fails with

10.2 a31e99a89cc75

2018-06-11 21:16:45 140673673762560 [ERROR] Invalid (old?) table or database name '#sql-8e9_a'
^ Found warnings in /data/bld/10.2/mysql-test/var/log/mysqld.1.err

Sometimes it hangs upon DROP TABLE

10.2 a31e99a89cc75

Thread 3 (Thread 0x7f6e6c094700 (LWP 2482)):
#0  0x00007f6e7af0b7dd in nanosleep () at ../sysdeps/unix/syscall-template.S:84
#1  0x0000556ea3926439 in os_thread_sleep (tm=20000) at /data/src/10.2/storage/innobase/os/os0thread.cc:230
#2  0x0000556ea39a8d90 in fil_wait_crypt_bg_threads (table=0x7f6e18020e58) at /data/src/10.2/storage/innobase/row/row0mysql.cc:3175
#3  0x0000556ea39a9e6e in row_drop_table_for_mysql (name=0x7f6e6c0907c0 "test/t1", trx=0x7f6e6f1967e8, drop_db=false, create_failed=0, nonatomic=true) at /data/src/10.2/storage/innobase/row/row0mysql.cc:3549
#4  0x0000556ea3876c81 in ha_innobase::delete_table (this=0x7f6e24012d60, name=0x7f6e6c092300 "./test/t1") at /data/src/10.2/storage/innobase/handler/ha_innodb.cc:13270
#5  0x0000556ea355ee90 in handler::ha_delete_table (this=0x7f6e24012d60, name=0x7f6e6c092300 "./test/t1") at /data/src/10.2/sql/handler.cc:4325
#6  0x0000556ea35598c2 in ha_delete_table (thd=0x7f6e24000b00, table_type=0x556ea6ca6b10, path=0x7f6e6c092300 "./test/t1", db=0x7f6e24012b30 "test", alias=0x7f6e240124e8 "t1", generate_warning=true) at /data/src/10.2/sql/handler.cc:2352
#7  0x0000556ea33a41f0 in mysql_rm_table_no_locks (thd=0x7f6e24000b00, tables=0x7f6e24012520, if_exists=true, drop_temporary=false, drop_view=false, dont_log_query=false, dont_free_locks=false) at /data/src/10.2/sql/sql_table.cc:2468
#8  0x0000556ea33a340d in mysql_rm_table (thd=0x7f6e24000b00, tables=0x7f6e24012520, if_exists=1 '\001', drop_temporary=0 '\000') at /data/src/10.2/sql/sql_table.cc:2088
#9  0x0000556ea32e5e47 in mysql_execute_command (thd=0x7f6e24000b00) at /data/src/10.2/sql/sql_parse.cc:4756
#10 0x0000556ea32efe49 in mysql_parse (thd=0x7f6e24000b00, rawbuf=0x7f6e24012448 "DROP TABLE IF EXISTS t1", length=23, parser_state=0x7f6e6c093200, is_com_multi=false, is_next_command=false) at /data/src/10.2/sql/sql_parse.cc:7999
#11 0x0000556ea32dd8f2 in dispatch_command (command=COM_QUERY, thd=0x7f6e24000b00, packet=0x7f6e240088a1 "DROP TABLE IF EXISTS t1", packet_length=23, is_com_multi=false, is_next_command=false) at /data/src/10.2/sql/sql_parse.cc:1821
#12 0x0000556ea32dc255 in do_command (thd=0x7f6e24000b00) at /data/src/10.2/sql/sql_parse.cc:1375
#13 0x0000556ea342ba66 in do_handle_one_connection (connect=0x556ea71cc470) at /data/src/10.2/sql/sql_connect.cc:1335
#14 0x0000556ea342b7f3 in handle_one_connection (arg=0x556ea71cc470) at /data/src/10.2/sql/sql_connect.cc:1241
#15 0x0000556ea384ccc0 in pfs_spawn_thread (arg=0x556ea71d69e0) at /data/src/10.2/storage/perfschema/pfs.cc:1862
#16 0x00007f6e7af02494 in start_thread (arg=0x7f6e6c094700) at pthread_create.c:333
#17 0x00007f6e792e893f in clone () from /lib/x86_64-linux-gnu/libc.so.6
 
Thread 2 (Thread 0x7f6e6c04a700 (LWP 2485)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
#1  0x0000556ea3cb72e1 in safe_cond_timedwait (cond=0x556ea4861f60 <COND_thread_cache>, mp=0x556ea4864800 <LOCK_thread_cache>, abstime=0x7f6e6c049da0, file=0x556ea3d46b40 "/data/src/10.2/include/mysql/psi/mysql_thread.h", line=1175) at /data/src/10.2/mysys/thr_mutex.c:545
#2  0x0000556ea31fa60a in inline_mysql_cond_timedwait (that=0x556ea4861f60 <COND_thread_cache>, mutex=0x556ea4864800 <LOCK_thread_cache>, abstime=0x7f6e6c049da0, src_file=0x556ea3d488aa "/data/src/10.2/sql/mysqld.cc", src_line=2973) at /data/src/10.2/include/mysql/psi/mysql_thread.h:1175
#3  0x0000556ea31ff72a in cache_thread (thd=0x7f6e18000b00) at /data/src/10.2/sql/mysqld.cc:2972
#4  0x0000556ea31ff9ce in one_thread_per_connection_end (thd=0x7f6e18000b00, put_in_cache=true) at /data/src/10.2/sql/mysqld.cc:3061
#5  0x0000556ea342bb4f in do_handle_one_connection (connect=0x556ea71b3fe0) at /data/src/10.2/sql/sql_connect.cc:1354
#6  0x0000556ea342b7f3 in handle_one_connection (arg=0x556ea71b3fe0) at /data/src/10.2/sql/sql_connect.cc:1241
#7  0x0000556ea384ccc0 in pfs_spawn_thread (arg=0x556ea71d6d90) at /data/src/10.2/storage/perfschema/pfs.cc:1862
#8  0x00007f6e7af02494 in start_thread (arg=0x7f6e6c04a700) at pthread_create.c:333
#9  0x00007f6e792e893f in clone () from /lib/x86_64-linux-gnu/libc.so.6
 
Thread 1 (Thread 0x7f6e7b31bb80 (LWP 2453)):
#0  0x00007f6e792df4ed in poll () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x0000556ea320648b in handle_connections_sockets () at /data/src/10.2/sql/mysqld.cc:6604
#2  0x0000556ea320577a in mysqld_main (argc=144, argv=0x556ea6a69370) at /data/src/10.2/sql/mysqld.cc:6070
#3  0x0000556ea31f9dc0 in main (argc=22, argv=0x7fff70057978) at /data/src/10.2/sql/main.cc:25

(all threads are attached).
When it hangs, it produces periodic error messages in the error log:

2018-06-11 21:17:50 140112235677440 [Warning] InnoDB: Waited 30 seconds for ref-count on table: `test`.`t1` space: 7
2018-06-11 21:18:20 140112235677440 [Warning] InnoDB: Waited 60 seconds for ref-count on table: `test`.`t1` space: 7

and in due time starts to complain about long semaphore wait

2018-06-11 21:32:16 139934062102272 [Warning] InnoDB: Waited 240 seconds for ref-count on table: `test`.`t1` space: 7
2018-06-11 21:32:19 139933834528512 [Warning] InnoDB: A long semaphore wait:
--Thread 139933817743104 has waited at srv0srv.cc line 2095 for 241.00 seconds the semaphore:
X-lock on RW-latch at 0x55d7d08ea4b0 created in file dict0dict.cc line 1108
a writer (thread id 139934062102272) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file row0purge.cc line 853
Last time write locked in file row0mysql.cc line 3356
2018-06-11 21:32:19 139933834528512 [Note] InnoDB: A semaphore wait:
--Thread 139933817743104 has waited at srv0srv.cc line 2095 for 241.00 seconds the semaphore:
X-lock on RW-latch at 0x55d7d08ea4b0 created in file dict0dict.cc line 1108
a writer (thread id 139934062102272) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file row0purge.cc line 853
Last time write locked in file row0mysql.cc line 3356

Doesn't seem to be reproducible on 10.1



 Comments   
Comment by Marko Mäkelä [ 2018-06-26 ]

For me, the hung DROP TABLE is waiting for dict_table_t::n_ref_count of test.t1 to become 0, but it is 1:

10.2 31c950cca8fa1623d41a8a7784e09fbcd6bc2c71

#2  0x000056432f1c27fb in fil_wait_crypt_bg_threads (table=0x7fae0001e678)
    at /mariadb/10.3/storage/innobase/row/row0mysql.cc:3176
#3  0x000056432f1c39dc in row_drop_table_for_mysql (
    name=0x7fae66d36780 "test/t1", trx=0x7fae679027e8, drop_db=false, 
    create_failed=0, nonatomic=true)
    at /mariadb/10.3/storage/innobase/row/row0mysql.cc:3573
#4  0x000056432f08df98 in ha_innobase::delete_table (this=0x7fadfc012ff0, 
    name=0x7fae66d382c0 "./test/t1")
    at /mariadb/10.3/storage/innobase/handler/ha_innodb.cc:13348

I did not find any thread that would be holding a handle to the table. Possibly we forgot to decrement the n_ref_count somewhere. The following instrumentation

diff --git a/storage/innobase/include/dict0dict.ic b/storage/innobase/include/dict0dict.ic
index ca2ea769612..d079b03e792 100644
--- a/storage/innobase/include/dict0dict.ic
+++ b/storage/innobase/include/dict0dict.ic
@@ -1514,6 +1514,8 @@ void
 dict_table_t::acquire()
 {
 	ut_ad(mutex_own(&dict_sys->mutex));
+	if (!strcmp(name.m_name, "test/t1"))
+		fprintf(stderr, "acquire %p\n", this);
 	++n_ref_count;
 }
 
@@ -1525,6 +1527,8 @@ dict_table_t::release()
 {
 	ut_ad(mutex_own(&dict_sys->mutex));
 	ut_ad(n_ref_count > 0);
+	if (!strcmp(name.m_name, "test/t1"))
+		fprintf(stderr, "release %p\n", this);
 	return !--n_ref_count;
 }
 

produced this output:

Version: '10.2.17-MariaDB-debug-log'  socket: '/dev/shm/10.2/mysql-test/var/tmp/mysqld.1.sock'  port: 16000  Source distribution
acquire 0x7fd6a006eb68
release 0x7fd6a006eb68
acquire 0x7fd6a006eb68
release 0x7fd6a006eb68
acquire 0x7fd6a006eb68
release 0x7fd6a006eb68
acquire 0x7fd6a006eb68
acquire 0x7fd6a006eb68
2018-06-26 19:46:58 140561600452352 [ERROR] Invalid (old?) table or database name '#sql-bb8_a'
release 0x7fd6a006eb68
release 0x7fd6a006eb68
acquire 0x7fd6a006eb68
release 0x7fd6a401d148
acquire 0x7fd6a401d148
release 0x7fd6a401d148

The table object 0x7fd6a006eb68 was freed. For the table 0x7fd6a401d148, we for some reason got n_ref_count=1. I should improve the instrumentation to catch all acquire and release on that object, no matter what the table name was.

Comment by Marko Mäkelä [ 2018-06-27 ]

After changing the instrumentation to emit messages for any table name, I got the following:

acquire 0x7f580001e448 (test/#sql-1960_a)
release 0x7f580001e448 (test/#sql-1960_a)
acquire 0x7f580001e448 (test/#sql-1960_a)
release 0x7f580001e448 (test/#sql-1960_a)
acquire 0x7f580001e448 (test/#sql-1960_a)
release 0x7f580001e448 (test/#sql-1960_a)
acquire 0x7f580001e448 (test/#sql-1960_a)
release 0x7f580001e448 (test/#sql-1960_a)
acquire 0x7f580001e448 (test/#sql-1960_a)
release 0x7f580001e448 (test/#sql-1960_a)
acquire 0x7f580001e448 (test/#sql-1960_a)
acquire 0x7f580001e448 (test/#sql-1960_a)
release 0x7f580001e448 (test/t1)
acquire 0x7f580001e448 (test/t1)
release 0x7f580001e448 (test/t1)

For the #sql name there are 2 acquire operations at the end, but only 1 release (after the table was renamed to t1).

Comment by Marko Mäkelä [ 2018-06-27 ]

When the test fails, I am observing an increment of the reference count from 1 to 2 that comes from a stray write (not dict_table_t::acquire(). Once I got it during __lll_lock_wait() (unfortunately no stack trace), and twice so far with this stack:

#0  0x0000555556507d91 in _db_enter_ (_func_=0x5555568636b1 "unpack_dirname", 
    _file_=0x555556863642 "mysys/mf_pack.c", _line_=338, 
    _stack_frame_=0x7ffff07675f0) at dbug/dbug.c:1101
#1  0x00005555564cb13d in unpack_dirname (to=0x7ffff0767890 "././test/", 
    from=0x7ffff0767890 "././test/") at mysys/mf_pack.c:338
#2  0x00005555564cb46d in unpack_filename (
    to=0x7ffff0767b80 "././test/#sql-4bdf_9.frm", 
    from=0x7ffff0767b80 "././test/#sql-4bdf_9.frm")
    at mysys/mf_pack.c:421
#3  0x0000555555f9f0cd in ha_innobase::info_low (this=0x7fff88026cf8, 
    flag=26, is_analyze=false)
    at storage/innobase/handler/ha_innodb.cc:14493
#4  0x0000555555f9f342 in ha_innobase::info (this=0x7fff88026cf8, flag=26)
    at storage/innobase/handler/ha_innodb.cc:14549
#5  0x0000555555f8e2f4 in ha_innobase::open (this=0x7fff88026cf8, 
    name=0x7fff8801f8f8 "./test/#sql-4bdf_9")
    at storage/innobase/handler/ha_innodb.cc:6643
#6  0x0000555555d88130 in handler::ha_open (this=0x7fff88026cf8, 
    table_arg=0x7fff880260f0, name=0x7fff8801f8f8 "./test/#sql-4bdf_9", 
    mode=2, test_if_locked=18) at sql/handler.cc:2501
#7  0x0000555555c14e9a in open_table_from_share (thd=0x7fff88000e00, 
    share=0x7fff8801f3e0, alias=0x7ffff076a240 "#sql-4bdf_9", db_stat=1, 
    prgflag=8, ha_open_flags=18, outparam=0x7fff880260f0, 
    is_create_table=false) at sql/table.cc:3334
#8  0x0000555555d03961 in THD::open_temporary_table (this=0x7fff88000e00, 
    share=0x7fff8801f3e0, alias=0x7ffff076a240 "#sql-4bdf_9", 
    open_in_engine=true) at sql/temporary_tables.cc:1102
#9  0x0000555555d01976 in THD::create_and_open_tmp_table (
    this=0x7fff88000e00, hton=0x555557ad6610, frm=0x7ffff07695a0, 
    path=0x7ffff076aaac "./test/#sql-4bdf_9", db=0x7fff88011b40 "test", 
    table_name=0x7ffff076a240 "#sql-4bdf_9", open_in_engine=true)
    at sql/temporary_tables.cc:76
#10 0x0000555555be66bf in mysql_alter_table (thd=0x7fff88000e00, 
    new_db=0x7fff88011b40 "test", new_name=0x0, create_info=0x7ffff076ae20, 
    table_list=0x7fff88011530, alter_info=0x7ffff076ad70, order_num=0, 
    order=0x0, ignore=false) at sql/sql_table.cc:9487
#11 0x0000555555c5fdbe in Sql_cmd_alter_table::execute (this=0x7fff88011d40, 
    thd=0x7fff88000e00) at sql/sql_alter.cc:318
#12 0x0000555555b19cd8 in mysql_execute_command (thd=0x7fff88000e00)
    at sql/sql_parse.cc:6223
#13 0x0000555555b1ed7a in mysql_parse (thd=0x7fff88000e00, 
    rawbuf=0x7fff880113b8 "ALTER TABLE t1 ADD CONSTRAINT FOREIGN KEY (a) REFERENCES t2(b) ON DELETE CASCADE ON UPDATE CASCADE", length=98, 
    parser_state=0x7ffff076c1e0, is_com_multi=false, is_next_command=false)
    at sql/sql_parse.cc:8000

void _db_enter_(const char *_func_, const char *_file_,
                uint _line_, struct _db_stack_frame_ *_stack_frame_)
{
  int save_errno, org_cs_locked;
  CODE_STATE *cs;
  if (!((cs=code_state())))

   0x0000555556507d88 <+24>:	callq  0x5555565059c9 <code_state>
   0x0000555556507d8d <+29>:	mov    %rax,-0x18(%rbp)
=> 0x0000555556507d91 <+33>:	cmpq   $0x0,-0x18(%rbp)

This looks strange, because the code appears to be accessing the local stack frame, and %rbp is nowhere close to the n_ref_count.
I recompiled cmake -DWITH_ASAN:BOOL=ON, and the behaviour remained mostly the same. I also got a more plausible stack trace:

#0  dict_table_t::acquire (this=0x617000040288)
    at storage/innobase/include/dict0dict.ic:1519
#1  0x00005555575b10b6 in dict_table_open_on_name (
    table_name=0x7fffdec9e6b0 "test/#sql-588b_9", dict_locked=0, try_drop=1, 
    ignore_err=DICT_ERR_IGNORE_NONE)
    at storage/innobase/dict/dict0dict.cc:1212
#2  0x0000555556f4ec3f in ha_innobase::open_dict_table (
    table_name=0x61b000079228 "./test/#sql-588b_9", 
    norm_name=0x7fffdec9e6b0 "test/#sql-588b_9", is_partition=false, 
    ignore_err=DICT_ERR_IGNORE_NONE)
    at storage/innobase/handler/ha_innodb.cc:6681
#3  0x0000555556f4bf6f in ha_innobase::open (this=0x61d000170308, 
    name=0x61b000079228 "./test/#sql-588b_9")
    at storage/innobase/handler/ha_innodb.cc:6341
#4  0x00005555569f73b5 in handler::ha_open (this=0x61d000170308, 
    table_arg=0x61f0000444f0, name=0x61b000079228 "./test/#sql-588b_9", 
    mode=2, test_if_locked=18) at sql/handler.cc:2501
#5  0x0000555556672a78 in open_table_from_share (thd=0x62a0000ae270, 
    share=0x61b000078d08, alias=0x62b000000d30 "#mysql50##sql-588b_9", 
    db_stat=33, prgflag=8, ha_open_flags=18, outparam=0x61f0000444f0, 
    is_create_table=false) at sql/table.cc:3334
#6  0x00005555562816de in open_table (thd=0x62a0000ae270, 
    table_list=0x62b000000e28, ot_ctx=0x7fffdec9f8b0)
    at sql/sql_base.cc:1885
#7  0x0000555556288499 in open_and_process_table (thd=0x62a0000ae270, 
    lex=0x62a0000b1d80, tables=0x62b000000e28, counter=0x7fffdec9fc10, 
    flags=0, prelocking_strategy=0x7fffdec9f990, has_prelocking_list=false, 
    ot_ctx=0x7fffdec9f8b0) at sql/sql_base.cc:3404
#8  0x000055555628b678 in open_tables (thd=0x62a0000ae270, options=..., 
    start=0x7fffdec9fa60, counter=0x7fffdec9fc10, flags=0, 
    prelocking_strategy=0x7fffdec9f990) at sql/sql_base.cc:3958
#9  0x000055555633f8c9 in open_tables (thd=0x62a0000ae270, 
    tables=0x7fffdec9fa60, counter=0x7fffdec9fc10, flags=0)
    at sql/sql_base.h:470
#10 0x000055555662e24f in mysql_update (thd=0x62a0000ae270, 
    table_list=0x62b000000360, fields=..., values=..., conds=0x0, 
    order_num=0, order=0x0, limit=18446744073709551615, 
    handle_duplicates=DUP_ERROR, ignore=false, found_return=0x7fffdeca0a30, 
    updated_return=0x7fffdeca0a70) at sql/sql_update.cc:286
#11 0x00005555563af225 in mysql_execute_command (thd=0x62a0000ae270)
    at sql/sql_parse.cc:4280
#12 0x00005555563cab19 in mysql_parse (thd=0x62a0000ae270, 
    rawbuf=0x62b000000288 "UPDATE t2 SET b = 1", length=19, 
    parser_state=0x7fffdeca3c40, is_com_multi=false, is_next_command=false)
    at sql/sql_parse.cc:8000

In this stack trace, the UPDATE t2 is for some reason opening the intermediate copy of the table t1 where a connection to t2 is being added:

ALTER TABLE t1 ADD CONSTRAINT FOREIGN KEY (a) REFERENCES t2(b) ON DELETE CASCADE ON UPDATE CASCADE;

At this point, the only foreign key metadata that is attached to the intermediate table is foreign_set with 1 element:

(gdb) p **foreign_set.begin()
$4 = {heap = 0x611000050b80, id = 0x61600004ff78 "test/#sql-588b_9_ibfk_1", 
  n_fields = 1, type = 5, 
  foreign_table_name = 0x61600004ffa0 "test/#sql-588b_9", 
  foreign_table_name_lookup = 0x61600004ffa0 "test/#sql-588b_9", 
  foreign_table = 0x617000040288, foreign_col_names = 0x61600004ffb8, 
  referenced_table_name = 0x61600004ffc8 "test/t2", 
  referenced_table_name_lookup = 0x61600004ffc8 "test/t2", 
  referenced_table = 0x617000032288, referenced_col_names = 0x61600004ffd0, 
  foreign_index = 0x618000050d08, referenced_index = 0x618000040d08, 
  v_cols = 0x0}

mysql_update() comes up with a table_list of 2 elements: t2 and the intermediate (uncommitted) copy of the table t1.

  DBUG_ENTER("mysql_update");
 
  create_explain_query(thd->lex, thd->mem_root);
  if (open_tables(thd, &table_list, &table_count, 0))
    DBUG_RETURN(1);

We should ensure that FOREIGN KEY relationships are not exposed to the SQL layer before the completion of the ALTER TABLE.

Comment by Marko Mäkelä [ 2018-06-27 ]

The UPDATE t2 will find the intermediate copy of t1, because the ALTER TABLE would register it in t2:

(gdb) p **referenced_set.begin()
$20 = {heap = 0x611000050b80, id = 0x61600004ff78 "test/#sql-588b_9_ibfk_1", 
  n_fields = 1, type = 5, 
  foreign_table_name = 0x61600004ffa0 "test/#sql-588b_9", 
  foreign_table_name_lookup = 0x61600004ffa0 "test/#sql-588b_9", 
  foreign_table = 0x617000040288, foreign_col_names = 0x61600004ffb8, 
  referenced_table_name = 0x61600004ffc8 "test/t2", 
  referenced_table_name_lookup = 0x61600004ffc8 "test/t2", 
  referenced_table = 0x617000032288, referenced_col_names = 0x61600004ffd0, 
  foreign_index = 0x618000050d08, referenced_index = 0x618000040d08, 
  v_cols = 0x0}

svoj pointed out that the ALTER TABLE should prevent writes on t2 for the duration of the ALGORITHM=COPY operation, just like it read-locks t1, not only because the being-added FOREIGN KEY constraint could be violated by concurrent modifications to the related tables, but also because concurrent modifications would require CASCADE modifications on the ALTER TABLE (and InnoDB would fail to perform these modifications, because the foreign key relationship was not recorded in dict_table_t::foreign_list or dict_table_t::referenced_list yet.

So, it seems to me that the fix should be to acquire MDL on all connected tables during ALTER TABLE when LOCK=NONE is not allowed.

Comment by Marko Mäkelä [ 2018-06-27 ]

I think that an isolation violation or race condition between the ALTER and UPDATE should affect any InnoDB version. The dict_table_t::n_ref_count was introduced in MariaDB 10.0.

Comment by Elena Stepanova [ 2018-06-27 ]

Upon marko's advice, I added some data to the test case, this one and one from MDEV-16060. Now I get "Invalid (old?) table or database name" on 10.0 and 10.1 as well. Release builds are much more willing to produce it than debug, but it happens on both after enough attempts.

I paste both test cases, because the probability of getting the error seems to vary depending on 10.0/10.1, innodb_plugin/xtradb, and debug/release – with some combinations one test case gives better results, and on other combinations the other one.

Please note that on 10.1 they need to be run with --mysqld=--sequence and on 10.0 with --mysqld=--sequence --mysqld=--plugin-load-add=ha_sequence (and with --repeat of course.

--source include/have_innodb.inc
 
CREATE TABLE t1 (`pk` INT PRIMARY KEY, f1 INT NOT NULL, INDEX ind_f1 (f1)) ENGINE=InnoDB;
CREATE TABLE t2 (`pk` INT PRIMARY KEY, f2 INT NOT NULL, INDEX ind_f2 (f2)) ENGINE=InnoDB;
 
--connect (con1,localhost,root,,test)
CREATE TABLE t3 (`pk` INT PRIMARY KEY, f3 INT NOT NULL, INDEX ind_f3 (f3)) ENGINE=InnoDB;
CREATE TABLE t4 (`pk` INT PRIMARY KEY, f4 INT NOT NULL, INDEX ind_f4 (f4)) ENGINE=InnoDB;
CREATE TABLE t5 (`pk` INT PRIMARY KEY, f5 INT NOT NULL, INDEX ind_f5 (f5), FOREIGN KEY (`pk`) REFERENCES t4 (`pk`) ON DELETE CASCADE ON UPDATE CASCADE) ENGINE=InnoDB;
 
INSERT INTO t4 SELECT seq, seq FROM seq_1_to_10000;
INSERT INTO t3 SELECT seq, seq FROM seq_1_to_10000;
INSERT INTO t5 SELECT seq, seq FROM seq_1_to_10000;
INSERT INTO t2 SELECT seq, seq FROM seq_1_to_10000;
INSERT INTO t1 SELECT seq, seq FROM seq_1_to_10000;
 
--send
  ALTER TABLE t3 ADD CONSTRAINT fk3 FOREIGN KEY (`pk`) REFERENCES t5 (`pk`) ON DELETE CASCADE ON UPDATE CASCADE;
 
--connection default
UPDATE IGNORE t4 SET pk = 10001 WHERE `pk` = 1;
 
--connection con1
--error 0,ER_NO_REFERENCED_ROW_2,ER_ERROR_ON_RENAME
--reap
ALTER TABLE t1 ADD CONSTRAINT fk1 FOREIGN KEY (f1) REFERENCES t2 (`pk`) ON DELETE CASCADE ON UPDATE CASCADE;
 
# Cleanup
--disconnect con1
--connection default
DROP TABLE t3, t1, t5, t4, t2;

--source include/have_innodb.inc
 
CREATE TABLE t1 (a INT NOT NULL) ENGINE=InnoDB;
CREATE TABLE t2 (b INT PRIMARY KEY) ENGINE=InnoDB;
 
INSERT INTO t2 SELECT seq FROM seq_1_to_100000;
INSERT INTO t1 SELECT seq FROM seq_1_to_100000;
 
--connect (con1,localhost,root,,)
--send
  ALTER TABLE t1 ADD CONSTRAINT FOREIGN KEY (a) REFERENCES t2(b) ON DELETE CASCADE ON UPDATE CASCADE;
 
--connection default
CREATE TABLE t3 LIKE t1;
UPDATE t2 SET b = b + 100000 LIMIT 2;
--connection con1
--error 0,ER_NO_REFERENCED_ROW_2,ER_ERROR_ON_RENAME
--reap
 
--disconnect con1
--connection default
SELECT * FROM t1 WHERE a NOT IN (SELECT b FROM t2);
DROP TABLE IF EXISTS t3;
DROP TABLE IF EXISTS t1;
DROP TABLE IF EXISTS t2;

10.0 bf4244d1a0d

line
180627 21:08:03 [ERROR] Invalid (old?) table or database name '#sql-118e_4'
^ Found warnings in /data/bld/10.0-rel/mysql-test/var/log/mysqld.1.err

Comment by Sergei Golubchik [ 2018-07-16 ]

Looking at marko explanation above, here's a modification of elenst test that deterministically fails:

install soname 'ha_sequence';
 
--source include/have_innodb.inc
--source include/have_debug_sync.inc
CREATE TABLE t1 (a INT NOT NULL) ENGINE=InnoDB;
CREATE TABLE t2 (b INT PRIMARY KEY) ENGINE=InnoDB;
INSERT INTO t2 SELECT seq FROM seq_1_to_100;
INSERT INTO t1 SELECT seq FROM seq_1_to_100;
--connect (con1,localhost,root,,)
set debug_sync='alter_table_before_rename_result_table SIGNAL g1 WAIT_FOR g2';
--send
  ALTER TABLE t1 ADD CONSTRAINT FOREIGN KEY (a) REFERENCES t2(b) ON DELETE CASCADE ON UPDATE CASCADE;
--connection default
set debug_sync='before_execute_sql_command WAIT_FOR g1';
UPDATE t2 SET b = b + 100000 LIMIT 2;
set debug_sync='now SIGNAL g2';
--connection con1
--reap
--disconnect con1
--connection default
SELECT * FROM t1 WHERE a NOT IN (SELECT b FROM t2);
DROP TABLE IF EXISTS t1;
DROP TABLE IF EXISTS t2;
 
flush tables;
uninstall plugin sequence;
set debug_sync='reset';

The trick is to run the update after the ALTER finished creating the temporary table, but before it has renamed it.

Comment by Sergei Golubchik [ 2018-07-17 ]

And this is a modification of the test that not only results in a seemingly harmless warning, but actually breaks foreign key constraints. All it takes is to add another table, referencing the one that is altered:

install soname 'ha_sequence';
 
--source include/have_innodb.inc
--source include/have_debug_sync.inc
CREATE TABLE t1 (f1 INT PRIMARY KEY) ENGINE=InnoDB;
CREATE TABLE t2 (f2 INT PRIMARY KEY) ENGINE=InnoDB;
CREATE TABLE t3 (f3 INT PRIMARY KEY, FOREIGN KEY (f3) REFERENCES t2(f2)) ENGINE=InnoDB;
INSERT INTO t1 SELECT seq FROM seq_1_to_100;
INSERT INTO t2 SELECT seq FROM seq_1_to_100;
INSERT INTO t3 SELECT seq FROM seq_1_to_100;
--connect con1,localhost,root
set debug_sync='alter_table_before_rename_result_table SIGNAL g1 WAIT_FOR g2';
--send
  ALTER TABLE t2 ADD CONSTRAINT FOREIGN KEY (f2) REFERENCES t1(f1) ON DELETE CASCADE ON UPDATE CASCADE;
--connection default
set debug_sync='before_execute_sql_command WAIT_FOR g1';
UPDATE t1 SET f1 = f1 + 100000 LIMIT 2;
set debug_sync='now SIGNAL g2';
--connection con1
--reap
--disconnect con1
--connection default
SELECT * FROM t2 WHERE f2 NOT IN (SELECT f1 FROM t1);
SELECT * FROM t3 WHERE f3 NOT IN (SELECT f2 FROM t2);
DROP TABLE t3, t2, t1;
 
flush tables;
uninstall plugin sequence;
set debug_sync='reset';

Comment by Marko Mäkelä [ 2018-09-03 ]

During the testing of MDEV-13564, mleich got a failure that I think would be prevented if not only ALTER TABLE, OPTIMIZE TABLE, CREATE/DROP INDEX but also TRUNCATE TABLE used the same protection rules: acquire not only MDL but also InnoDB table lock on the tables that are connected by FOREIGN KEY constraints.

bb-10.2-marko 9de899ac0983cec9165f3e8d4a5c67ce63e2b9f5

storage/innobase/row/row0mysql.cc:1724: void init_fts_doc_id_for_ref(dict_table_t*, ulint*): Assertion `foreign->foreign_table != __null' failed.

Generated at Thu Feb 08 08:29:07 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.