[MDEV-26217] Failing assertion: list.count > 0 in ut_list_remove or Assertion `lock->trx == this' failed in dberr_t trx_t::drop_table Created: 2021-07-22  Updated: 2024-02-08  Resolved: 2022-04-26

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.6
Fix Version/s: 10.6.8, 10.7.4, 10.8.3, 10.9.1

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: regression

Issue Links:
Problem/Incident
causes MDEV-28454 Latching order violation (and hang) i... Closed
causes MDEV-32899 InnoDB is holding shared dict_sys.lat... Closed
Relates
relates to MDEV-30103 FK constraint fails upon referenced t... Stalled
relates to MDEV-33104 Assertion `table.get_ref_count() <= 1... Closed
relates to MDEV-29504 AddressSanitizer: heap-use-after-free... Closed

 Description   

Note: The test case is non-deterministic, run with --repeat=N. It usually fails for me within 1-2 attempts.

--source include/have_innodb.inc
 
CREATE TABLE t1 (pk INT PRIMARY KEY) ENGINE=InnoDB;
INSERT INTO t1 VALUES (1);
 
CREATE TABLE t2 (pk INT PRIMARY KEY, FOREIGN KEY(pk) REFERENCES t1(pk)) ENGINE=InnoDB;
 
--connect (con1,localhost,root,,test)
SET FOREIGN_KEY_CHECKS= OFF;
--send
  CREATE OR REPLACE TABLE t1 (b INT) ENGINE=InnoDB;
 
--connection default
--error 0,ER_NO_REFERENCED_ROW_2
INSERT INTO t2 VALUES (1);
 
--connection con1
--error 0,ER_CANT_CREATE_TABLE
--reap
 
# Cleanup
--disconnect con1
--connection default
DROP TABLE IF EXISTS t2, t1;

10.6 f43370ba debug

mariadbd: /data/src/10.6/storage/innobase/dict/drop.cc:161: dberr_t trx_t::drop_table(const dict_table_t&): Assertion `lock->trx == this' failed.
210722 20:47:20 [ERROR] mysqld got signal 6 ;
 
#7  0x00007f90122e5f36 in __GI___assert_fail (assertion=0x5570e43f93cf "lock->trx == this", file=0x5570e43f9088 "/data/src/10.6/storage/innobase/dict/drop.cc", line=161, function=0x5570e43f9318 "dberr_t trx_t::drop_table(const dict_table_t&)") at assert.c:101
#8  0x00005570e3d8cc45 in trx_t::drop_table (this=0x7f900c7da540, table=...) at /data/src/10.6/storage/innobase/dict/drop.cc:161
#9  0x00005570e3a314b1 in ha_innobase::delete_table (this=0x7f8fd0014e10, name=0x7f900c4e1820 "./test/t1") at /data/src/10.6/storage/innobase/handler/ha_innodb.cc:13491
#10 0x00005570e35aac20 in hton_drop_table (hton=0x5570e75953f8, path=0x7f900c4e1820 "./test/t1") at /data/src/10.6/sql/handler.cc:576
#11 0x00005570e35b0fbe in ha_delete_table (thd=0x7f8fd0000db8, hton=0x5570e75953f8, path=0x7f900c4e1820 "./test/t1", db=0x7f900c4e15f0, alias=0x7f900c4e1600, generate_warning=false) at /data/src/10.6/sql/handler.cc:3086
#12 0x00005570e33042d5 in mysql_rm_table_no_locks (thd=0x7f8fd0000db8, tables=0x7f900c4e1d70, current_db=0x7f8fd0000e50, ddl_log_state=0x7f900c4e2810, if_exists=false, drop_temporary=false, drop_view=false, drop_sequence=false, dont_log_query=true, dont_free_locks=true) at /data/src/10.6/sql/sql_table.cc:1469
#13 0x00005570e330d272 in create_table_impl (thd=0x7f8fd0000db8, ddl_log_state_create=0x7f900c4e27f0, ddl_log_state_rm=0x7f900c4e2810, orig_db=..., orig_table_name=..., db=..., table_name=..., path=..., options=..., create_info=0x7f900c4e2ac0, alter_info=0x7f900c4e29d0, create_table_mode=0, is_trans=0x7f900c4e27ce, key_info=0x7f900c4e2530, key_count=0x7f900c4e2524, frm=0x7f900c4e2550) at /data/src/10.6/sql/sql_table.cc:4308
#14 0x00005570e330de17 in mysql_create_table_no_lock (thd=0x7f8fd0000db8, ddl_log_state_create=0x7f900c4e27f0, ddl_log_state_rm=0x7f900c4e2810, db=0x7f8fd0014298, table_name=0x7f8fd00142a8, create_info=0x7f900c4e2ac0, alter_info=0x7f900c4e29d0, is_trans=0x7f900c4e27ce, create_table_mode=0, table_list=0x7f8fd0014280) at /data/src/10.6/sql/sql_table.cc:4546
#15 0x00005570e330e3a8 in mysql_create_table (thd=0x7f8fd0000db8, create_table=0x7f8fd0014280, create_info=0x7f900c4e2ac0, alter_info=0x7f900c4e29d0) at /data/src/10.6/sql/sql_table.cc:4658
#16 0x00005570e332402f in Sql_cmd_create_table_like::execute (this=0x7f8fd0014220, thd=0x7f8fd0000db8) at /data/src/10.6/sql/sql_table.cc:11778
#17 0x00005570e321ab52 in mysql_execute_command (thd=0x7f8fd0000db8, is_called_from_prepared_stmt=false) at /data/src/10.6/sql/sql_parse.cc:5993
#18 0x00005570e3220cfd in mysql_parse (thd=0x7f8fd0000db8, rawbuf=0x7f8fd0014160 "CREATE OR REPLACE TABLE t1 (b INT) ENGINE=InnoDB", length=48, parser_state=0x7f900c4e3480) at /data/src/10.6/sql/sql_parse.cc:8026
#19 0x00005570e320d165 in dispatch_command (command=COM_QUERY, thd=0x7f8fd0000db8, packet=0x7f8fd000b879 "CREATE OR REPLACE TABLE t1 (b INT) ENGINE=InnoDB", packet_length=48, blocking=true) at /data/src/10.6/sql/sql_parse.cc:1896
#20 0x00005570e320bb01 in do_command (thd=0x7f8fd0000db8, blocking=true) at /data/src/10.6/sql/sql_parse.cc:1404
#21 0x00005570e33c921b in do_handle_one_connection (connect=0x5570e7b36db8, put_in_cache=true) at /data/src/10.6/sql/sql_connect.cc:1410
#22 0x00005570e33c8f77 in handle_one_connection (arg=0x5570e7b36db8) at /data/src/10.6/sql/sql_connect.cc:1312
#23 0x00005570e39391a5 in pfs_spawn_thread (arg=0x5570e7b36e98) at /data/src/10.6/storage/perfschema/pfs.cc:2201
#24 0x00007f90127fe609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#25 0x00007f90123d1293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

10.6 f43370ba non-debug

2021-07-22 20:49:26 0x7f15f40de700  InnoDB: Assertion failure in file /data/src/10.6/storage/innobase/include/ut0lst.h line 334
InnoDB: Failing assertion: list.count > 0
 
#6  0x0000558744e04e4e in ut_dbg_assertion_failed (expr=expr@entry=0x5587459f3f91 "list.count > 0", file=file@entry=0x5587459f3f60 "/data/src/10.6/storage/innobase/include/ut0lst.h", line=line@entry=334) at /data/src/10.6/storage/innobase/ut/ut0dbg.cc:60
#7  0x0000558744de45ef in ut_list_remove<ut_list_base<ib_lock_t, ut_list_node<ib_lock_t> ib_lock_t::*>, GenericGetNode<ib_lock_t> > (get_node=..., node=..., list=...) at /data/src/10.6/storage/innobase/lock/lock0lock.cc:3236
#8  ut_list_remove<ut_list_base<ib_lock_t, ut_list_node<ib_lock_t> ib_lock_t::*> > (elem=<optimized out>, list=...) at /data/src/10.6/storage/innobase/include/ut0lst.h:386
#9  lock_table_remove_low (lock=0x7f15eec5dcc8) at /data/src/10.6/storage/innobase/lock/lock0lock.cc:3291
#10 lock_table_dequeue (in_lock=0x7f15eec5dcc8, owns_wait_mutex=false) at /data/src/10.6/storage/innobase/lock/lock0lock.cc:3529
#11 0x00005587454c63c4 in lock_release_try (trx=<optimized out>) at /data/src/10.6/storage/innobase/lock/lock0lock.cc:3770
#12 lock_release (trx=trx@entry=0x7f15eec5d130) at /data/src/10.6/storage/innobase/lock/lock0lock.cc:3806
#13 0x0000558745581050 in trx_t::release_locks (this=0x7f15eec5d130) at /data/src/10.6/storage/innobase/trx/trx0trx.cc:490
#14 trx_t::commit_in_memory (this=this@entry=0x7f15eec5d130, mtr=mtr@entry=0x0) at /data/src/10.6/storage/innobase/trx/trx0trx.cc:1297
#15 0x000055874557fdd7 in trx_t::commit_low (mtr=<optimized out>, this=0x7f15eec5d130) at /data/src/10.6/storage/innobase/trx/trx0trx.cc:1453
#16 trx_t::commit_persist (this=0x7f15eec5d130) at /data/src/10.6/storage/innobase/trx/trx0trx.cc:1467
#17 0x000055874557ff06 in trx_t::commit (this=0x7f15eec5d130) at /data/src/10.6/storage/innobase/trx/trx0trx.cc:1473
#18 0x0000558745572656 in trx_t::rollback_finish (this=0x7f15eec5d130) at /data/src/10.6/storage/innobase/trx/trx0roll.cc:66
#19 trx_t::rollback_low (this=this@entry=0x7f15eec5d130, savept=savept@entry=0x0) at /data/src/10.6/storage/innobase/trx/trx0roll.cc:133
#20 0x0000558745571510 in trx_rollback_for_mysql_low (trx=0x7f15eec5d130) at /data/src/10.6/storage/innobase/trx/trx0roll.cc:190
#21 trx_rollback_for_mysql (trx=trx@entry=0x7f15eec5d130) at /data/src/10.6/storage/innobase/trx/trx0roll.cc:259
#22 0x000055874546d592 in innobase_rollback (hton=<optimized out>, thd=0x7f15c4000c58, rollback_trx=<optimized out>) at /data/src/10.6/storage/innobase/handler/ha_innodb.cc:4563
#23 0x0000558745168900 in ha_rollback_trans (thd=thd@entry=0x7f15c4000c58, all=all@entry=false) at /data/src/10.6/sql/handler.cc:2153
#24 0x0000558745057fb9 in trans_rollback_stmt (thd=thd@entry=0x7f15c4000c58) at /data/src/10.6/sql/transaction.cc:535
#25 0x0000558744f39ea1 in mysql_execute_command (thd=0x7f15c4000c58, is_called_from_prepared_stmt=<optimized out>) at /data/src/10.6/sql/sql_parse.cc:6046
#26 0x0000558744f2ac77 in mysql_parse (thd=0x7f15c4000c58, rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>) at /data/src/10.6/sql/sql_parse.cc:8026
#27 0x0000558744f36bdd in dispatch_command (command=COM_QUERY, thd=0x7f15c4000c58, packet=<optimized out>, packet_length=<optimized out>, blocking=<optimized out>) at /data/src/10.6/sql/sql_class.h:1340
#28 0x0000558744f38af8 in do_command (thd=0x7f15c4000c58, blocking=blocking@entry=true) at /data/src/10.6/sql/sql_parse.cc:1404
#29 0x0000558745047db7 in do_handle_one_connection (connect=<optimized out>, put_in_cache=true) at /data/src/10.6/sql/sql_connect.cc:1410
#30 0x000055874504811d in handle_one_connection (arg=arg@entry=0x5587472c2148) at /data/src/10.6/sql/sql_connect.cc:1312
#31 0x00005587453cfadd in pfs_spawn_thread (arg=0x558747251018) at /data/src/10.6/storage/perfschema/pfs.cc:2201
#32 0x00007f15f9009609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#33 0x00007f15f8bf7293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

More variations of the failure from the same test case

10.6 f43370ba ASAN non-debug

2021-07-22 21:11:39 0x7f99ce43e300  InnoDB: Assertion failure in file /data/src/10.6/storage/innobase/dict/dict0dict.cc line 1891
InnoDB: Failing assertion: table->n_rec_locks == 0
 
#6  0x000055ed0a795cf0 in ut_dbg_assertion_failed (expr=expr@entry=0x55ed0cabdae0 "table->n_rec_locks == 0", file=file@entry=0x55ed0cabc360 "/data/src/10.6/storage/innobase/dict/dict0dict.cc", line=line@entry=1891) at /data/src/10.6/storage/innobase/ut/ut0dbg.cc:60
#7  0x000055ed0a7b2050 in dict_sys_t::remove (this=<optimized out>, table=table@entry=0x617000071988, lru=lru@entry=false, keep=keep@entry=false) at /data/src/10.6/storage/innobase/dict/dict0dict.cc:1891
#8  0x000055ed0c2ecd03 in trx_t::commit (this=this@entry=0x7f99d4df3b20, deleted=std::vector of length 0, capacity 0) at /data/src/10.6/storage/innobase/dict/drop.cc:251
#9  0x000055ed0bdbcd1e in ha_innobase::delete_table (this=<optimized out>, name=<optimized out>) at /data/src/10.6/storage/innobase/handler/ha_innodb.cc:13496
#10 0x000055ed0b33ce49 in hton_drop_table (hton=<optimized out>, path=<optimized out>) at /data/src/10.6/sql/handler.cc:576
#11 0x000055ed0b3531c7 in ha_delete_table (thd=thd@entry=0x62b00010a218, hton=hton@entry=0x615000002898, path=path@entry=0x7f99ce438b80 "./test/t1", db=db@entry=0x7f99ce438000, alias=alias@entry=0x7f99ce438020, generate_warning=generate_warning@entry=false) at /data/src/10.6/sql/handler.cc:3086
#12 0x000055ed0add3835 in mysql_rm_table_no_locks (thd=0x62b00010a218, tables=<optimized out>, current_db=<optimized out>, ddl_log_state=<optimized out>, if_exists=<optimized out>, drop_temporary=<optimized out>, drop_view=<optimized out>, drop_sequence=<optimized out>, dont_log_query=<optimized out>, dont_free_locks=<optimized out>) at /data/src/10.6/sql/sql_table.cc:1469
#13 0x000055ed0ae0fce3 in create_table_impl (thd=thd@entry=0x62b00010a218, ddl_log_state_create=ddl_log_state_create@entry=0x7f99ce43a730, ddl_log_state_rm=ddl_log_state_rm@entry=0x7f99ce43a770, orig_db=..., orig_table_name=..., db=..., table_name=..., path=..., options=..., create_info=<optimized out>, alter_info=<optimized out>, create_table_mode=<optimized out>, is_trans=<optimized out>, key_info=<optimized out>, key_count=<optimized out>, frm=<optimized out>) at /data/src/10.6/sql/sql_table.cc:4308
#14 0x000055ed0ae10a41 in mysql_create_table_no_lock (thd=thd@entry=0x62b00010a218, ddl_log_state_create=ddl_log_state_create@entry=0x7f99ce43a730, ddl_log_state_rm=ddl_log_state_rm@entry=0x7f99ce43a770, db=db@entry=0x62b0001113a8, table_name=table_name@entry=0x62b0001113b8, create_info=create_info@entry=0x7f99ce43ab90, alter_info=<optimized out>, is_trans=<optimized out>, create_table_mode=<optimized out>, table_list=<optimized out>) at /data/src/10.6/sql/sql_table.cc:4546
#15 0x000055ed0ae1141c in mysql_create_table (thd=thd@entry=0x62b00010a218, create_table=create_table@entry=0x62b000111390, create_info=create_info@entry=0x7f99ce43ab90, alter_info=alter_info@entry=0x7f99ce43aa60) at /data/src/10.6/sql/sql_table.cc:4658
#16 0x000055ed0ae16a6f in Sql_cmd_create_table_like::execute (this=<optimized out>, thd=0x62b00010a218) at /data/src/10.6/sql/sql_table.cc:11778
#17 0x000055ed0abbaac4 in mysql_execute_command (thd=0x62b00010a218, is_called_from_prepared_stmt=<optimized out>) at /data/src/10.6/sql/sql_parse.cc:5993
#18 0x000055ed0ab81e94 in mysql_parse (thd=0x62b00010a218, rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>) at /data/src/10.6/sql/sql_parse.cc:8026
#19 0x000055ed0abae1b8 in dispatch_command (command=COM_QUERY, thd=0x62b00010a218, packet=<optimized out>, packet_length=<optimized out>, blocking=<optimized out>) at /data/src/10.6/sql/sql_class.h:1340
#20 0x000055ed0abb4787 in do_command (thd=0x62b00010a218, blocking=blocking@entry=true) at /data/src/10.6/sql/sql_parse.cc:1404
#21 0x000055ed0af73a3c in do_handle_one_connection (connect=<optimized out>, put_in_cache=<optimized out>) at /data/src/10.6/sql/sql_connect.cc:1410
#22 0x000055ed0af74295 in handle_one_connection (arg=arg@entry=0x608000002f38) at /data/src/10.6/sql/sql_connect.cc:1312
#23 0x000055ed0bb9c3a0 in pfs_spawn_thread (arg=0x61700000a898) at /data/src/10.6/storage/perfschema/pfs.cc:2201
#24 0x00007f99de38b609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#25 0x00007f99ddf5e293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

10.6 f43370ba ASAN non-debug

==4176551==ERROR: AddressSanitizer: heap-use-after-free on address 0x61600009c720 at pc 0x55ae52a25d28 bp 0x7f1d0def83c0 sp 0x7f1d0def83b0
READ of size 8 at 0x61600009c720 thread T12
    #0 0x55ae52a25d27 in lock_rec_dequeue_from_page /data/src/10.6/storage/innobase/lock/lock0lock.cc:1933
    #1 0x55ae52a2b561 in lock_release_try /data/src/10.6/storage/innobase/lock/lock0lock.cc:3755
    #2 0x55ae52a2b561 in lock_release(trx_t*) /data/src/10.6/storage/innobase/lock/lock0lock.cc:3806
    #3 0x55ae52cacb1f in trx_t::release_locks() /data/src/10.6/storage/innobase/trx/trx0trx.cc:490
    #4 0x55ae52cacb1f in trx_t::commit_in_memory(mtr_t const*) /data/src/10.6/storage/innobase/trx/trx0trx.cc:1297
    #5 0x55ae52cab240 in trx_t::commit_persist() /data/src/10.6/storage/innobase/trx/trx0trx.cc:1467
    #6 0x55ae52cabc40 in trx_t::commit() /data/src/10.6/storage/innobase/trx/trx0trx.cc:1473
    #7 0x55ae52cabc40 in trx_commit_for_mysql(trx_t*) /data/src/10.6/storage/innobase/trx/trx0trx.cc:1589
    #8 0x55ae52905a95 in innobase_commit_low(trx_t*) /data/src/10.6/storage/innobase/handler/ha_innodb.cc:4258
    #9 0x55ae52905d1a in innobase_commit_ordered_2 /data/src/10.6/storage/innobase/handler/ha_innodb.cc:4364
    #10 0x55ae52906131 in innobase_commit /data/src/10.6/storage/innobase/handler/ha_innodb.cc:4468
    #11 0x55ae51eb59b8 in commit_one_phase_2 /data/src/10.6/sql/handler.cc:2041
    #12 0x55ae51eb7fe9 in ha_commit_trans(THD*, bool) /data/src/10.6/sql/handler.cc:1788
    #13 0x55ae51b29150 in trans_commit_stmt(THD*) /data/src/10.6/sql/transaction.cc:472
    #14 0x55ae51739d91 in mysql_execute_command(THD*, bool) /data/src/10.6/sql/sql_parse.cc:6053
    #15 0x55ae51701e93 in mysql_parse(THD*, char*, unsigned int, Parser_state*) /data/src/10.6/sql/sql_parse.cc:8026
    #16 0x55ae5172e1b7 in dispatch_command(enum_server_command, THD*, char*, unsigned int, bool) /data/src/10.6/sql/sql_parse.cc:1896
    #17 0x55ae51734786 in do_command(THD*, bool) /data/src/10.6/sql/sql_parse.cc:1404
    #18 0x55ae51af3a3b in do_handle_one_connection(CONNECT*, bool) /data/src/10.6/sql/sql_connect.cc:1410
    #19 0x55ae51af4294 in handle_one_connection /data/src/10.6/sql/sql_connect.cc:1312
    #20 0x55ae5271c39f in pfs_spawn_thread /data/src/10.6/storage/perfschema/pfs.cc:2201
    #21 0x7f1d1d28d608 in start_thread /build/glibc-eX1tMB/glibc-2.31/nptl/pthread_create.c:477
    #22 0x7f1d1ce60292 in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x122292)

Reproducible on 10.6 debug and non-debug as described above.
Not reproducible on 10.5.



 Comments   
Comment by Elena Stepanova [ 2021-07-22 ]

A slightly different test case which produces similar failures as above on a non-ASAN build, but a different one on ASAN build:

--source include/have_innodb.inc
 
CREATE TABLE t1 (pk INT PRIMARY KEY) ENGINE=InnoDB;
CREATE TABLE t2 (pk INT PRIMARY KEY) ENGINE=InnoDB;
 
--connect (con1,localhost,root,,test)
ALTER TABLE t2 ADD CONSTRAINT FOREIGN KEY(pk) REFERENCES t1(pk);
 
SET innodb_lock_wait_timeout= 0;
--send
  ALTER TABLE t1 FORCE;
 
--connection default
 
START TRANSACTION;
--error 0,ER_LOCK_WAIT_TIMEOUT,ER_NO_REFERENCED_ROW_2
INSERT INTO t2 VALUES (1),(2),(3),(4),(5),(6);
 
--connection con1
--error 0,ER_LOCK_WAIT_TIMEOUT
--reap
 
# Cleanup
--disconnect con1
--connection default
DROP TABLE t2, t1;

10.6 f43370ba non-debug ASAN

==34056==ERROR: AddressSanitizer: heap-use-after-free on address 0x61700005bd7c at pc 0x55ac857c0b6c bp 0x7f4565706550 sp 0x7f4565706540
WRITE of size 4 at 0x61700005bd7c thread T12
    #0 0x55ac857c0b6b in std::__atomic_base<unsigned int>::compare_exchange_strong(unsigned int&, unsigned int, std::memory_order, std::memory_order) /usr/include/c++/9/bits/atomic_base.h:502
    #1 0x55ac857c0b6b in srw_mutex::wr_lock_try() /data/src/10.6/storage/innobase/include/srw_lock.h:65
    #2 0x55ac857c0b6b in dict_table_t::lock_mutex_trylock() /data/src/10.6/storage/innobase/include/dict0mem.h:1991
    #3 0x55ac857c0b6b in lock_release_try /data/src/10.6/storage/innobase/lock/lock0lock.cc:3766
    #4 0x55ac857c0b6b in lock_release(trx_t*) /data/src/10.6/storage/innobase/lock/lock0lock.cc:3806
    #5 0x55ac85a41b1f in trx_t::release_locks() /data/src/10.6/storage/innobase/trx/trx0trx.cc:490
    #6 0x55ac85a41b1f in trx_t::commit_in_memory(mtr_t const*) /data/src/10.6/storage/innobase/trx/trx0trx.cc:1297
    #7 0x55ac85a40240 in trx_t::commit_persist() /data/src/10.6/storage/innobase/trx/trx0trx.cc:1467
    #8 0x55ac85a40c40 in trx_t::commit() /data/src/10.6/storage/innobase/trx/trx0trx.cc:1473
    #9 0x55ac85a40c40 in trx_commit_for_mysql(trx_t*) /data/src/10.6/storage/innobase/trx/trx0trx.cc:1589
    #10 0x55ac8569aa95 in innobase_commit_low(trx_t*) /data/src/10.6/storage/innobase/handler/ha_innodb.cc:4258
    #11 0x55ac8569ad1a in innobase_commit_ordered_2 /data/src/10.6/storage/innobase/handler/ha_innodb.cc:4364
    #12 0x55ac8569b131 in innobase_commit /data/src/10.6/storage/innobase/handler/ha_innodb.cc:4468
    #13 0x55ac84c4a9b8 in commit_one_phase_2 /data/src/10.6/sql/handler.cc:2041
    #14 0x55ac84c4cfe9 in ha_commit_trans(THD*, bool) /data/src/10.6/sql/handler.cc:1788
    #15 0x55ac848bdc34 in trans_commit_implicit(THD*) /data/src/10.6/sql/transaction.cc:329
    #16 0x55ac844ce397 in mysql_execute_command(THD*, bool) /data/src/10.6/sql/sql_parse.cc:3750
    #17 0x55ac84496e93 in mysql_parse(THD*, char*, unsigned int, Parser_state*) /data/src/10.6/sql/sql_parse.cc:8026
    #18 0x55ac844c31b7 in dispatch_command(enum_server_command, THD*, char*, unsigned int, bool) /data/src/10.6/sql/sql_parse.cc:1896
    #19 0x55ac844c9786 in do_command(THD*, bool) /data/src/10.6/sql/sql_parse.cc:1404
    #20 0x55ac84888a3b in do_handle_one_connection(CONNECT*, bool) /data/src/10.6/sql/sql_connect.cc:1410
    #21 0x55ac84889294 in handle_one_connection /data/src/10.6/sql/sql_connect.cc:1312
    #22 0x55ac854b139f in pfs_spawn_thread /data/src/10.6/storage/perfschema/pfs.cc:2201
    #23 0x7f4574a91608 in start_thread /build/glibc-eX1tMB/glibc-2.31/nptl/pthread_create.c:477
    #24 0x7f4574664292 in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x122292)

It is also non-deterministic, and fails more reluctantly than the one in description, usually in 10-20 attempts.

Comment by Marko Mäkelä [ 2022-04-20 ]

The second test case did not fail for me in several thousand attempts, neither with or without ASAN, possibly thanks to MDEV-26554. The first one does fail, both with and without ASAN:

10.6 d1edb011eead8401bf14c8d1e90a6fb3f2a4d6e9

mysqltest: At line 15: query 'INSERT INTO t2 VALUES (1)' failed with wrong errno <Unknown> (2013): 'Lost connection to server during query', instead of  (0)...
mariadbd: /mariadb/10.6/storage/innobase/dict/drop.cc:159: dberr_t trx_t::drop_table(const dict_table_t&): Assertion `lock->trx == this' failed.
#19 0x0000564872f0a049 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x7f75e0000d48, packet=packet@entry=0x7f75e000ace9 "CREATE OR REPLACE TABLE t1 (b INT) ENGINE=InnoDB", packet_length=packet_length@entry=48, blocking=blocking@entry=true) at /mariadb/10.6/sql/sql_parse.cc:1912

Comment by Marko Mäkelä [ 2022-04-20 ]

I was not able to reproduce the failure with rr record. From a core dump, I could tell that lock->trx belongs to another thread:

10.6 d1edb011eead8401bf14c8d1e90a6fb3f2a4d6e9

[Current thread is 1 (Thread 0x7f8564452640 (LWP 1163113))]
(gdb) frame 8
#8  0x000055b7c60ddef0 in trx_t::drop_table (this=this@entry=0x7f856c639c80, 
(gdb) p lock.trx.id
$1 = 30
(gdb) p lock.trx.state
$2 = {m = {_M_i = TRX_STATE_ACTIVE}}
(gdb) p/x lock.trx.mysql_thd.real_id
$3 = 0x7f856c063640
(gdb) thread find 0x7f856c063640
Thread 19 has target id 'Thread 0x7f856c063640 (LWP 1163050)'
(gdb) thread apply 19 backtrace
 
Thread 19 (Thread 0x7f856c063640 (LWP 1163050)):
#4  0x000055b7c61a1280 in safe_cond_timedwait (cond=0x7f856c637c10, mp=0x55b7c73977c0 <lock_sys+192>, abstime=0x7f856c061110, file=0x55b7c644e3e0 "/mariadb/10.6/storage/innobase/lock/lock0lock.cc", line=1823) at /mariadb/10.6/mysys/thr_mutex.c:546
#5  0x000055b7c5f36849 in lock_wait (thr=thr@entry=0x7f851000c890) at /mariadb/10.6/storage/innobase/lock/lock0lock.cc:1823
#6  0x000055b7c5fb7c92 in row_ins_check_foreign_constraint (check_ref=check_ref@entry=1, foreign=foreign@entry=0x7f85100c0340, table=table@entry=0x7f851004b370, entry=entry@entry=0x7f8510036e10, thr=thr@entry=0x7f851000c890) at /mariadb/10.6/storage/innobase/row/row0ins.cc:1824
#7  0x000055b7c5fb7e74 in row_ins_check_foreign_constraints (table=0x7f851004b370, index=index@entry=0x7f851004cab0, pk=pk@entry=true, entry=entry@entry=0x7f8510036e10, thr=thr@entry=0x7f851000c890) at /mariadb/10.6/storage/innobase/row/row0ins.cc:1951
#8  0x000055b7c5fb9473 in row_ins_clust_index_entry (index=index@entry=0x7f851004cab0, entry=entry@entry=0x7f8510036e10, thr=thr@entry=0x7f851000c890, n_ext=n_ext@entry=0) at /mariadb/10.6/storage/innobase/row/row0ins.cc:3184
#20 0x000055b7c5a59049 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x7f8510000d48, packet=packet@entry=0x7f8510087069 "INSERT INTO t2 VALUES (1)", packet_length=packet_length@entry=25, blocking=blocking@entry=true) at /mariadb/10.6/sql/sql_parse.cc:1912

The INSERT INTO t2 is waiting for a IS lock on the table test.t1 that is being dropped. MDEV-26554 worked around insufficient metadata locking in exactly this type of scenarios, but only in ha_innobase::truncate() and ha_innobase::rename_table(), not ha_innobase::delete_table().

After I added exclusive child table locking to ha_innobase::delete_table(), the test would occasionally trigger an assertion failure table->n_rec_locks == 0 in dict_sys_t::remove(), with the count actually being 1. The interesting part of the culprit’s stack trace (INSERT INTO t2) is as follows:

#5  0x000055bba7673ac8 in lock_sys_t::wr_lock (this=this@entry=0x55bba8b32700 <lock_sys>) at /mariadb/10.6/storage/innobase/include/lock0lock.h:753
#6  0x000055bba76ca84e in lock_rec_queue_validate (locked_lock_trx_sys=locked_lock_trx_sys@entry=false, id=id@entry={m_id = 21474836483}, rec=rec@entry=0x7f1eaac7407d "\200", index=index@entry=0x7f1e5418bec0, offsets=offsets@entry=0x7f1eaa4261f0) at /mariadb/10.6/storage/innobase/lock/lock0lock.cc:4567
#7  0x000055bba76cd380 in lock_clust_rec_read_check_and_lock (flags=flags@entry=0, block=block@entry=0x7f1eaa7b1560, rec=rec@entry=0x7f1eaac7407d "\200", index=index@entry=0x7f1e5418bec0, offsets=offsets@entry=0x7f1eaa4261f0, mode=mode@entry=LOCK_S, gap_mode=1024, thr=0x7f1e5404fc70) at /mariadb/10.6/storage/innobase/lock/lock0lock.cc:5513
#8  0x000055bba774ce6a in row_ins_set_shared_rec_lock (type=type@entry=1024, block=block@entry=0x7f1eaa7b1560, rec=rec@entry=0x7f1eaac7407d "\200", index=index@entry=0x7f1e5418bec0, offsets=offsets@entry=0x7f1eaa4261f0, thr=thr@entry=0x7f1e5404fc70) at /mariadb/10.6/storage/innobase/row/row0ins.cc:1398
#9  0x000055bba7752b0b in row_ins_check_foreign_constraint (check_ref=check_ref@entry=1, foreign=foreign@entry=0x7f1e540c0340, table=table@entry=0x7f1e5404b370, entry=entry@entry=0x7f1e5404c750, thr=thr@entry=0x7f1e5404fc70) at /mariadb/10.6/storage/innobase/row/row0ins.cc:1706
#10 0x000055bba7752f30 in row_ins_check_foreign_constraints (table=0x7f1e5404b370, index=index@entry=0x7f1e5404cab0, pk=pk@entry=true, entry=entry@entry=0x7f1e5404c750, thr=thr@entry=0x7f1e5404fc70) at /mariadb/10.6/storage/innobase/row/row0ins.cc:1951

Comment by Marko Mäkelä [ 2022-04-25 ]

mleich provided an rr replay trace. It turns out that the DDL operation is releasing the exclusive table lock on the referencing table t2:

10.6 fae0ccad6ec391bf85c39070f784589d48e11951 with a patch

#2  0x0000558285f80823 in lock_table_remove_low (lock=0x7fdba8d01e98) at /data/Server/10.6A/storage/innobase/lock/lock0lock.cc:3345
#3  0x0000558285f81d91 in lock_table_dequeue (in_lock=0x7fdba8d01e98, owns_wait_mutex=false) at /data/Server/10.6A/storage/innobase/lock/lock0lock.cc:3579
#4  0x0000558285f84222 in lock_release_try (trx=0x7fdba8d01940) at /data/Server/10.6A/storage/innobase/lock/lock0lock.cc:3846
#5  0x0000558285f846bb in lock_release (trx=0x7fdba8d01940) at /data/Server/10.6A/storage/innobase/lock/lock0lock.cc:3879
#6  0x00005582862d7325 in trx_t::release_locks (this=0x7fdba8d01940) at /data/Server/10.6A/storage/innobase/trx/trx0trx.cc:489
#7  0x00005582862cfbb1 in trx_t::commit_in_memory (mtr=0x7fdba0bb4c10, this=0x7fdba8d01940) at /data/Server/10.6A/storage/innobase/trx/trx0trx.cc:1292
#8  trx_t::commit_low (this=0x7fdba8d01940, mtr=0x7fdba0bb4c10) at /data/Server/10.6A/storage/innobase/trx/trx0trx.cc:1448
#9  0x00005582862d083f in trx_t::commit_persist (this=0x7fdba8d01940) at /data/Server/10.6A/storage/innobase/trx/trx0trx.cc:1462
#10 0x00005582864b9927 in trx_t::commit (this=0x7fdba8d01940, deleted=std::vector of length 0, capacity 0) at /data/Server/10.6A/storage/innobase/dict/drop.cc:233
#11 0x0000558285e1bc58 in ha_innobase::delete_table (this=0x62b0000e7f98, name=0x7fdba0bb7260 "./test/t1") at /data/Server/10.6A/storage/innobase/handler/ha_innodb.cc:13746
#20 0x0000558284d42880 in mysql_parse (thd=0x62b0000e0218, rawbuf=0x62b0000e7238 "CREATE OR REPLACE TABLE t1 (b INT) ENGINE=InnoDB", length=48, parser_state=0x7fdba0bb9b20)

At this point, the execution of INSERT INTO t2 VALUES(1) was blocked in lock_wait(), waiting for a lock on the table t2, which is referencing the table t1 that is being dropped and created. While the ha_innobase::delete_table() is waiting for the redo log to be persisted, the INSERT will succeed in acquiring the table lock and a record lock, which will later on cause the assertion failure during ha_innobase::delete_table():

10.6 fae0ccad6ec391bf85c39070f784589d48e11951 with a patch

#15 0x0000558285fb25d1 in log_write_up_to (lsn=125203, flush_to_disk=true, rotate_key=false, callback=0x7fdba0bb4800) at /data/Server/10.6A/storage/innobase/log/log0log.cc:852
#16 0x00005582862cd797 in trx_flush_log_if_needed_low (lsn=125203, trx=0x7fdba8d01940) at /data/Server/10.6A/storage/innobase/trx/trx0trx.cc:1152
#17 0x00005582862cd84f in trx_flush_log_if_needed (lsn=125203, trx=0x7fdba8d01940) at /data/Server/10.6A/storage/innobase/trx/trx0trx.cc:1170
#18 0x00005582862d02c6 in trx_t::commit_in_memory (mtr=0x7fdba0bb4c10, this=0x7fdba8d01940) at /data/Server/10.6A/storage/innobase/trx/trx0trx.cc:1354
#19 trx_t::commit_low (this=0x7fdba8d01940, mtr=0x7fdba0bb4c10) at /data/Server/10.6A/storage/innobase/trx/trx0trx.cc:1448
#20 0x00005582862d083f in trx_t::commit_persist (this=0x7fdba8d01940) at /data/Server/10.6A/storage/innobase/trx/trx0trx.cc:1462
#21 0x00005582864b9927 in trx_t::commit (this=0x7fdba8d01940, deleted=std::vector of length 0, capacity 0) at /data/Server/10.6A/storage/innobase/dict/drop.cc:233
(rr) c
Continuing.
 
Thread 2 received signal SIGABRT, Aborted.
[Switching to Thread 3066115.3071712]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(rr) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007fdbacaac859 in __GI_abort () at abort.c:79
#2  0x00005582862f1663 in ut_dbg_assertion_failed (expr=0x55828756f7e0 "table->n_rec_locks == 0", file=0x55828756d8a0 "/data/Server/10.6A/storage/innobase/dict/dict0dict.cc", line=1853)
    at /data/Server/10.6A/storage/innobase/ut/ut0dbg.cc:60
#3  0x000055828643b51e in dict_sys_t::remove (this=0x55828846c680 <dict_sys>, table=0x618000003d08, lru=false, keep=false) at /data/Server/10.6A/storage/innobase/dict/dict0dict.cc:1853
#4  0x00005582864b9d0f in trx_t::commit (this=0x7fdba8d01940, deleted=std::vector of length 0, capacity 0) at /data/Server/10.6A/storage/innobase/dict/drop.cc:249
#5  0x0000558285e1bc58 in ha_innobase::delete_table (this=0x62b0000e7f98, name=0x7fdba0bb7260 "./test/t1") at /data/Server/10.6A/storage/innobase/handler/ha_innodb.cc:13746

The fix should be to release any locks atomically with invoking dict_sys_t::remove() to remove the table definition.

Comment by Marko Mäkelä [ 2022-04-25 ]

With a more complex fix that would hold the exclusive table locks until the point of eviction, AddressSanitizer would report heap-use-after-free or I would get other trouble, because at the time we are dropping the table, the FOREIGN KEY check is attempting to lock the table without holding a reference to it:

10.6 fae0ccad6ec391bf85c39070f784589d48e11951 with some changes

#13 0x000055f4caf57303 in ssux_lock_impl<true>::rd_wait (this=this@entry=0x55f4cd1000c0 <lock_sys+64>) at /mariadb/10.6/storage/innobase/sync/srw_lock.cc:392
#14 0x000055f4cacf5945 in ssux_lock_impl<true>::rd_lock (this=this@entry=0x55f4cd1000c0 <lock_sys+64>) at /mariadb/10.6/storage/innobase/include/srw_lock.h:247
#15 0x000055f4cad25a18 in lock_sys_t::rd_lock (this=this@entry=0x55f4cd100080 <lock_sys>) at /mariadb/10.6/storage/innobase/include/lock0lock.h:774
#16 0x000055f4cad19871 in lock_table (table=table@entry=0x618000045108, mode=mode@entry=LOCK_IS, thr=thr@entry=0x620000034868) at /mariadb/10.6/storage/innobase/lock/lock0lock.cc:3509
#17 0x000055f4cae6a529 in row_ins_check_foreign_constraint (check_ref=check_ref@entry=1, foreign=foreign@entry=0x616000988f08, table=table@entry=0x618000046508, entry=entry@entry=0x6160009c5b08, thr=thr@entry=0x620000034868) at /mariadb/10.6/storage/innobase/row/row0ins.cc:1608

The table pointer was simply obtained from a cached object:

		check_table = foreign->referenced_table;

That pointer would be reset by dict_sys_t::remove(), which in my patch is protected by exclusive lock_sys.latch. As far as I can tell, the problem occurs at this step in lock_table():

	lock_sys.rd_lock(SRW_LOCK_CALL);
	table->lock_mutex_lock();

After acquiring the lock_sys.latch, lock_table() should validate the table pointer without dereferencing it.

Comment by Marko Mäkelä [ 2022-04-25 ]

I added a parameter to lock_table(). After lock_sys.latch has been acquired, and if that parameter is not NULL and not pointing to table, DB_DEADLOCK will be returned. The parameter table must not be dereferenced before that check. I got over 50×1000 runs of the test passing with that final fix. I had to adjust the test in the Description, to account for the added error code:

--error 0,ER_NO_REFERENCED_ROW_2,ER_LOCK_DEADLOCK
INSERT INTO t2 VALUES (1);

Generated at Thu Feb 08 09:43:38 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.