[MDEV-26554] Table-rebuilding DDL on parent table causes crash for INSERT into child table Created: 2021-09-07  Updated: 2024-02-08  Resolved: 2021-10-18

Status: Closed
Project: MariaDB Server
Component/s: Locking, Storage Engine - InnoDB
Affects Version/s: 10.6.2, 10.6.3, 10.6.4
Fix Version/s: 10.6.5, 10.7.1

Type: Bug Priority: Blocker
Reporter: Matthias Leich Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: ASAN, affects-tests, corruption, crash, race, regression-10.6, rr-profile-analyzed

Attachments: File MDEV-26554.sh     File MDEV-26554.test     File MDEV-26554trunc.test     File TBR-1186.yy    
Issue Links:
Problem/Incident
causes MDEV-26931 Deadlock between ALTER on parent and ... Open
causes MDEV-32899 InnoDB is holding shared dict_sys.lat... Closed
Relates
relates to MDEV-17595 Server crashes in copy_data_between_t... Closed
relates to MDEV-22788 SUMMARY: AddressSanitizer: heap-use-a... Confirmed
relates to MDEV-26824 Can't add foreign key with empty refe... Closed
relates to MDEV-33104 Assertion `table.get_ref_count() <= 1... Closed
relates to MDEV-21175 Remove dict_table_t::n_foreign_key_ch... Closed
relates to MDEV-22361 Cross-engine foreign keys support Open
relates to MDEV-29504 AddressSanitizer: heap-use-after-free... Closed

 Description   

origin/10.6 ee39757f3c91e04a0ccbb5424fba7dd56167ad93 2021-09-01T07:44:11+03:00
# 2021-09-07T01:48:07 [2499567] | [rr 2506142 89968]210907  1:46:10 [rr 2506142 89971][ERROR] mysqld got signal 11 ;
# 2021-09-07T01:48:07 [2499567] | Query (0x62b0000b6238): INSERT INTO `BB` SELECT * FROM `view_E`
# 2021-09-07T01:48:07 [2499567] | Connection ID (thread ID): 14
# 2021-09-07T01:48:07 [2499567] | [rr 2506142 105389]Status: KILL_TIMEOUT
 
 
sdp:/data/Results/1631001251/TBR-1186/dev/shm/vardir/1631001251/200/1/rr(rr) bt
#0  0x00005623dbab3d0d in id_name_t::operator() (this=0x10) at /data/Server/10.6I/storage/innobase/include/dict0mem.h:513
#1  0x00005623dbd99673 in row_ins_foreign_report_add_err (trx=0x20d977a41b48, foreign=0x61600406b708, rec=0x51282bee4070 "supremum", entry=0x61a000315ee0) at /data/Server/10.6I/storage/innobase/row/row0ins.cc:834
#2  0x00005623dbd9e16b in row_ins_check_foreign_constraint (check_ref=1, foreign=0x61600406b708, table=0x6180000afd08, entry=0x61a000315ee0, thr=0x621000311e38) at /data/Server/10.6I/storage/innobase/row/row0ins.cc:1802
#3  0x00005623dbd9ef09 in row_ins_check_foreign_constraints (table=0x6180000afd08, index=0x616004069008, pk=false, entry=0x61a000315ee0, thr=0x621000311e38) at /data/Server/10.6I/storage/innobase/row/row0ins.cc:1950
#4  0x00005623dbda60c4 in row_ins_sec_index_entry (index=0x616004069008, entry=0x61a000315ee0, thr=0x621000311e38, check_foreign=true) at /data/Server/10.6I/storage/innobase/row/row0ins.cc:3274
#5  0x00005623dbda66e1 in row_ins_index_entry (index=0x616004069008, entry=0x61a000315ee0, thr=0x621000311e38) at /data/Server/10.6I/storage/innobase/row/row0ins.cc:3351
#6  0x00005623dbda7736 in row_ins_index_entry_step (node=0x621000311af0, thr=0x621000311e38) at /data/Server/10.6I/storage/innobase/row/row0ins.cc:3517
#7  0x00005623dbda80f5 in row_ins (node=0x621000311af0, thr=0x621000311e38) at /data/Server/10.6I/storage/innobase/row/row0ins.cc:3663
#8  0x00005623dbda921d in row_ins_step (thr=0x621000311e38) at /data/Server/10.6I/storage/innobase/row/row0ins.cc:3809
#9  0x00005623dbde8ed2 in row_insert_for_mysql (mysql_rec=0x6190016c88c8 "", prebuilt=0x621000311188, ins_mode=ROW_INS_NORMAL) at /data/Server/10.6I/storage/innobase/row/row0mysql.cc:1318
#10 0x00005623dba6c4b1 in ha_innobase::write_row (this=0x61d0008d36b8, record=0x6190016c88c8 "") at /data/Server/10.6I/storage/innobase/handler/ha_innodb.cc:7820
#11 0x00005623db199c7a in handler::ha_write_row (this=0x61d0008d36b8, buf=0x6190016c88c8 "") at /data/Server/10.6I/sql/handler.cc:7514
#12 0x00005623da8f60d3 in write_record (thd=0x62b0000af218, table=0x6190016c8398, info=0x629000ddbc08, sink=0x0) at /data/Server/10.6I/sql/sql_insert.cc:2135
#13 0x00005623da9051bf in select_insert::send_data (this=0x629000ddbbb8, values=...) at /data/Server/10.6I/sql/sql_insert.cc:4091
#14 0x00005623dab2154b in select_result_sink::send_data_with_check (this=0x629000ddbbb8, items=..., u=0x62b0000b3380, sent=0) at /data/Server/10.6I/sql/sql_class.h:5631
#15 0x00005623daadfcdd in end_send (join=0x629000ddbc80, join_tab=0x629000ddf730, end_of_records=false) at /data/Server/10.6I/sql/sql_select.cc:22287
#16 0x00005623daad88d4 in evaluate_join_record (join=0x629000ddbc80, join_tab=0x629000ddf380, error=0) at /data/Server/10.6I/sql/sql_select.cc:21280
#17 0x00005623daad77fd in sub_select (join=0x629000ddbc80, join_tab=0x629000ddf380, end_of_records=false) at /data/Server/10.6I/sql/sql_select.cc:21057
#18 0x00005623daad62a9 in do_select (join=0x629000ddbc80, procedure=0x0) at /data/Server/10.6I/sql/sql_select.cc:20604
#19 0x00005623daa67778 in JOIN::exec_inner (this=0x629000ddbc80) at /data/Server/10.6I/sql/sql_select.cc:4737
#20 0x00005623daa64fa2 in JOIN::exec (this=0x629000ddbc80) at /data/Server/10.6I/sql/sql_select.cc:4515
#21 0x00005623daa68b90 in mysql_select (thd=0x62b0000af218, tables=0x62b0000b70e0, fields=..., conds=0x0, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2202244745984, result=0x629000ddbbb8, unit=0x62b0000b3380, 
    select_lex=0x62b0000b6ab0) at /data/Server/10.6I/sql/sql_select.cc:4993
#22 0x00005623daa3dc01 in handle_select (thd=0x62b0000af218, lex=0x62b0000b32b8, result=0x629000ddbbb8, setup_tables_done_option=1073741824) at /data/Server/10.6I/sql/sql_select.cc:545
#23 0x00005623da9ace84 in mysql_execute_command (thd=0x62b0000af218, is_called_from_prepared_stmt=false) at /data/Server/10.6I/sql/sql_parse.cc:4711
#24 0x00005623da9c2b5c in mysql_parse (thd=0x62b0000af218, rawbuf=0x62b0000b6238 "INSERT INTO `BB` SELECT * FROM `view_E` /* E_R Thread1 QNO 23 CON_ID 14 */", length=74, parser_state=0x289908ce6b20) at /data/Server/10.6I/sql/sql_parse.cc:8030
#25 0x00005623da99adc7 in dispatch_command (command=COM_QUERY, thd=0x62b0000af218, packet=0x629000cb2219 "INSERT INTO `BB` SELECT * FROM `view_E` /* E_R Thread1 QNO 23 CON_ID 14 */ ", packet_length=75, blocking=true)
    at /data/Server/10.6I/sql/sql_parse.cc:1896
#26 0x00005623da99819f in do_command (thd=0x62b0000af218, blocking=true) at /data/Server/10.6I/sql/sql_parse.cc:1404
#27 0x00005623dad9755d in do_handle_one_connection (connect=0x608000003138, put_in_cache=true) at /data/Server/10.6I/sql/sql_connect.cc:1418
#28 0x00005623dad96dee in handle_one_connection (arg=0x608000002c38) at /data/Server/10.6I/sql/sql_connect.cc:1312
#29 0x00004d53061b9609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#30 0x00006d2c2bfb3293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(rr) 
 
RQG
====
git clone https://github.com/mleich1/rqg --branch experimental RQG
 
perl rqg.pl \
--duration=300 \
--queries=10000000 \
--no_mask \
--seed=random \
--engine=InnoDB \
--rpl_mode=none \
--views \
--mysqld=--innodb-buffer-pool-size=8M \
--mysqld=--interactive_timeout=28800 \
--mysqld=--innodb_stats_persistent=on \
--mysqld=--net_write_timeout=60 \
--mysqld=--loose_innodb_use_native_aio=0 \
--mysqld=--log-bin \
--mysqld=--file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
--mysqld=--innodb-lock-wait-timeout=50 \
--mysqld=--log-output=none \
--mysqld=--loose-max-statement-time=30 \
--mysqld=--loose-table_lock_wait_timeout=50 \
--mysqld=--innodb_adaptive_hash_index=off \
--mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
--mysqld=--loose-idle_transaction_timeout=0 \
--mysqld=--connect_timeout=60 \
--mysqld=--log_bin_trust_function_creators=1 \
--mysqld=--slave_net_timeout=60 \
--mysqld=--plugin-load-add=file_key_management.so \
--mysqld=--wait_timeout=28800 \
--mysqld=--innodb_page_size=8K \
--mysqld=--net_read_timeout=30 \
--mysqld=--loose-innodb-sync-debug \
--mysqld=--lock-wait-timeout=86400 \
--mysqld=--loose_innodb_lock_schedule_algorithm=fcfs \
--mysqld=--loose-debug_assert_on_not_freed_memory=0 \
--mysqld=--loose-idle_readonly_transaction_timeout=0 \
--mysqld=--loose-idle_write_transaction_timeout=0 \
--mysqld=--sync-binlog=1 \
--reporters=Backtrace,Deadlock1,ErrorLog \
--validators=None \
--threads=2 \
--grammar=TBR-1186.yy \
--workdir=<local settings> \
--vardir=<local settings> \
--mtr-build-thread=<local settings> \
--basedir1=<local settings> \
--script_debug=_nix_



 Comments   
Comment by Matthias Leich [ 2021-09-07 ]

218 RQG runs of 2840 in total with the simplified grammar TBR-1186.yy showed the error described above. Only around 20 runs harvested a different error. But these were problems of "rr" only.
Hence the problem should be quite good reproducible.
 
At the end of the test simplification process(It does not invoke rr) the grammar TBR-1186.yy revealed TBR-1184 from below.
 
When trying to simplify the test I hit errors with the following search patterns (to be used via Perl)
#  [ 'TBR-1186', '#0 .{1,30} in id_name_t::operator\(\).+#1 .{1,30} in row_ins_foreign_report_add_err.+#2 .{1,30} in row_ins_check_foreign_constraint.+#3 .{1,30} in row_ins_check_foreign_constraints.+#4 .{1,30} in row_ins_sec_index_entry.+#5 .{1,30} in row_ins_index_entry.+#6 .{1,30} in row_ins_index_entry_step.+#7 .{1,30} in row_ins.+#8 .{1,30} in row_ins_step.+#9 .{1,30} in row_insert_for_mysql.+#10 .{1,30} ha_innobase::write_row.+#11 .{1,30} in handler::ha_write_row.+#12 .{1,30} in write_record.+#13 .{1,30} in select_insert::send_data.+#14 .{1,30} in select_result_sink::send_data_with_check.+#15 .{1,30} in end_send.+#16 .{1,30} in evaluate_join_record.+RESULT: The RQG run ended with status STATUS_SERVER_CRASHED' ],
#   [ 'TBR-1173', 'mysqld: .{1,300}row0mysql.cc:.{1,10}: void init_fts_doc_id_for_ref.{1,100}: Assertion \`foreign->foreign_table\' failed..+RESULT: The RQG run ended with status STATUS_SERVER_CRASHED' ],
#   [ 'TBR-1184', '#3 .{1,30}<signal handler called>.+#4 .{1,30} in id_name_t::operator\(\).+#5 .{1,30} in row_ins_foreign_report_add_err.+#6 .{1,30} in row_ins_check_foreign_constraint.+#7 .{1,30} in row_ins_check_foreign_constraints.+#8 .{1,30} in row_ins_sec_index_entry.+#9 .{1,30} in row_ins_index_entry.+#10 .{1,30} in row_ins_index_entry_step.+#11 .{1,30} in row_ins.+#12 .{1,30} in row_ins_step.+#13 .{1,30} in row_insert_for_mysql.+#14 .{1,30} in ha_innobase::write_row.+#15 .{1,30} in handler::ha_write_row.+#16 .{1,30} in write_record.+RESULT: The RQG run ended with status STATUS_SERVER_CRASHED' ],
#   [ 'TBR-1183', '#3 .{1,30}<signal handler called>.+#4 .{1,30} in id_name_t::operator\(\).+#5 .{1,30} in row_ins_foreign_report_add_err.+#6 .{1,30} in row_ins_check_foreign_constraint.+#7 .{1,30} in row_ins_check_foreign_constraints.+#8 .{1,30} in row_ins_sec_index_entry.+#9 .{1,30} in row_upd_sec_index_entry.+#10 .{1,30} in row_upd_sec_step.+#11 .{1,30} in row_upd.+#12 .{1,30} in row_upd_step.+#13 .{1,30} in row_update_for_mysql.+#14 .{1,30} in ha_innobase::update_row.+#15 .{1,30} in handler::ha_update_row.+#16 .{1,30} in write_record.+#17 .{1,30} in read_sep_field.+RESULT: The RQG run ended with status STATUS_SERVER_CRASHED' ],
#   [ 'TBR-1182', '#3 .{1,30}<signal handler called>.+#4 .{1,30} in id_name_t::operator\(\).+#5 .{1,30} in row_ins_foreign_report_add_err.+#6 .{1,30} in row_ins_check_foreign_constraint.+#7 .{1,30} in row_ins_check_foreign_constraints.+#8 .{1,30} in row_ins_clust_index_entry.+#9 .{1,30} in row_ins_index_entry.+#10 .{1,30} in row_ins_index_entry_step.+#11 .{1,30} in row_ins.+#12 .{1,30} in row_ins_step.+#13 .{1,30} in row_insert_for_mysql.+#14 .{1,30} in ha_innobase::write_row.+#15 .{1,30} in handler::ha_write_row.+#16 .{1,30} in write_record.+#17 .{1,30} in read_sep_field.+RESULT: The RQG run ended with status STATUS_SERVER_CRASHED' ],
#   [ 'TBR-1181', 'mysqld: .{1,250}drop.cc:.{1,10}: dberr_t trx_t::drop_table\(const dict_table_t\&\): Assertion \`lock->trx == this\' failed..+RESULT: The RQG run ended with status STATUS_SERVER_CRASHED' ],
#   [ 'TBR-1180', 'SUMMARY: AddressSanitizer: heap-use-after-free .{1,200}row0ins.cc:.{1,10} in row_ins_set_detailed.+RESULT: The RQG run ended with status STATUS_SERVER_CRASHED' ],
#   [ 'TBR-1179', 'SUMMARY: AddressSanitizer: heap-use-after-free .{1,200}crc32c_amd64.cc:.{1,10} in align_to_8.+RESULT: The RQG run ended with status STATUS_SERVER_CRASHED' ],
#   [ 'TBR-1178', 'ERROR: AddressSanitizer: heap-use-after-free .{1,30000}READ of size .+#1 .{1,30} in Field_blob::pack.+#2 .{1,30} in pack_row.+#3 .{1,30} in THD::binlog_update_row.+#4 .{1,30} in Update_rows_log_event::binlog_row_logging_function.+#5 .{1,30} in handler::binlog_log_row.+#6 .{1,30} in handler::ha_update_row.+#7 .{1,30} in write_record.+#8 .{1,30} in read_sep_field.+#9 .{1,30} in mysql_load.+RESULT: The RQG run ended with status STATUS_SERVER_CRASHED' ],
#   [ 'TBR-1177', 'SUMMARY: AddressSanitizer: heap-use-after-free .{1,200}dict0mem.h:.{1,10} in table_name_t::is_temporary\(\) const.+RESULT: The RQG run ended with status STATUS_SERVER_CRASHED' ],
#   [ 'TBR-1176', 'SUMMARY: AddressSanitizer: heap-use-after-free .{1,200}row0mysql.cc:.{1,10} in init_fts_doc_id_for_ref.+RESULT: The RQG run ended with status STATUS_SERVER_CRASHED' ],
#   [ 'TBR-1175', 'SUMMARY: AddressSanitizer: heap-use-after-free .{1,200}row0ins.cc:.{1,10} in row_ins_foreign_report_err.+RESULT: The RQG run ended with status STATUS_SERVER_CRASHED' ], 

Comment by Marko Mäkelä [ 2021-09-08 ]

ssh sdp
rr replay /data/Results/1631001251/TBR-1186/dev/shm/vardir/1631001251/200/1/rr/latest-trace

The problem in the trace is that foreign->referenced_table and foreign->referenced_index are NULL. The foreign->referenced_table_name is test.A. It was added in

ALTER TABLE `BB` ADD FOREIGN KEY ( `col_int_nokey` ) REFERENCES `A` (col_int_key) ON UPDATE NO ACTION /* E_R Thread1 QNO 10 CON_ID 14 */;

At that point, foreign key checks were enabled and the table test.A existed. That table is being renamed in Thread 15 later:

10.6 ee39757f3c91e04a0ccbb5424fba7dd56167ad93

(rr) backtrace
#4  0x00005623db189aed in handler::ha_rename_table (this=0x62b000146510, 
    from=0x586b3f42ac80 "./test/A", 
    to=0x586b3f42aec0 "./test/#sql-backup-263d9e-f")
    at /data/Server/10.6I/sql/handler.cc:5298
#5  0x00005623dac00fc7 in mysql_rename_table (base=0x615000001498, 
    old_db=0x586b3f42d130, old_name=0x586b3f42d140, new_db=0x586b3f42d130, 
    new_name=0x586b3f42b810, id=0x586b3f42d1d0, flags=2)
    at /data/Server/10.6I/sql/sql_table.cc:4970
#6  0x00005623dac26239 in mysql_alter_table (thd=0x62b00013b218, 
    new_db=0x62b00013fc18, new_name=0x62b000140030, 
    create_info=0x586b3f42e5d0, table_list=0x62b000142418, 
    alter_info=0x586b3f42e4a0, order_num=0, order=0x0, ignore=false, 
    if_exists=false) at /data/Server/10.6I/sql/sql_table.cc:10529

At the time of the SIGSEGV in Thread 2, a related rename in Thread 15 is still in progress.

(rr) thread apply 15 backtrace
#8  0x00006d2c2bfa93f1 in fdatasync (fd=7) at ../sysdeps/unix/sysv/linux/fdatasync.c:28
#9  0x00005623dbca0784 in os_file_sync_posix (file=7) at /data/Server/10.6I/storage/innobase/os/os0file.cc:896
#10 0x00005623dbca1283 in os_file_flush_func (file=7) at /data/Server/10.6I/storage/innobase/os/os0file.cc:982
#11 0x00005623dbc1b419 in file_os_io::flush (this=0x6020000001b0) at /data/Server/10.6I/storage/innobase/log/log0log.cc:326
#12 0x00005623dbc1bf23 in log_file_t::flush (this=0x5623deec7c20 <log_sys+480>) at /data/Server/10.6I/storage/innobase/log/log0log.cc:440
#13 0x00005623dbc1d0a4 in log_t::file::flush (this=0x5623deec7c00 <log_sys+448>) at /data/Server/10.6I/storage/innobase/log/log0log.cc:507
#14 0x00005623dbc1e07b in log_write_flush_to_disk_low (lsn=705990) at /data/Server/10.6I/storage/innobase/log/log0log.cc:629
#15 0x00005623dbc1f272 in log_write_up_to (lsn=705990, flush_to_disk=true, rotate_key=false, callback=0x0) at /data/Server/10.6I/storage/innobase/log/log0log.cc:829
#16 0x00005623dc12c577 in fil_name_write_rename (space_id=25, old_name=0x603000152e90 "./test/#sql-alter-263d9e-f.ibd", new_name=0x602000131a70 "./test/A.ibd") at /data/Server/10.6I/storage/innobase/fil/fil0fil.cc:1534
#17 0x00005623dc12eb69 in fil_space_t::rename (this=0x6120000847c0, path=0x602000131a70 "./test/A.ibd", log=true, replace=false) at /data/Server/10.6I/storage/innobase/fil/fil0fil.cc:1853
#18 0x00005623dc09de88 in dict_table_t::rename_tablespace (this=0x618000106d08, new_name=0x586b3f429cb0 "test/A", replace=false) at /data/Server/10.6I/storage/innobase/dict/dict0dict.cc:1438
#19 0x00005623dc09e82a in dict_table_rename_in_cache (table=0x618000106d08, new_name=0x586b3f429cb0 "test/A", rename_also_foreigns=true, replace_new_file=false) at /data/Server/10.6I/storage/innobase/dict/dict0dict.cc:1498
#20 0x00005623dbdf2a9a in row_rename_table_for_mysql (old_name=0x586b3f429ef0 "test/#sql-alter-263d9e-f", new_name=0x586b3f429cb0 "test/A", trx=0x20d977a43e88, use_fk=true) at /data/Server/10.6I/storage/innobase/row/row0mysql.cc:2915
#21 0x00005623dba8bc65 in innobase_rename_table (trx=0x20d977a43e88, from=0x586b3f42ac80 "./test/#sql-alter-263d9e-f", to=0x586b3f42aec0 "./test/A", use_fk=true) at /data/Server/10.6I/storage/innobase/handler/ha_innodb.cc:13655
#22 0x00005623dba8e674 in ha_innobase::rename_table (this=0x62b000146cc0, from=0x586b3f42ac80 "./test/#sql-alter-263d9e-f", to=0x586b3f42aec0 "./test/A") at /data/Server/10.6I/storage/innobase/handler/ha_innodb.cc:13997
#23 0x00005623db189aed in handler::ha_rename_table (this=0x62b000146cc0, from=0x586b3f42ac80 "./test/#sql-alter-263d9e-f", to=0x586b3f42aec0 "./test/A") at /data/Server/10.6I/sql/handler.cc:5298
#24 0x00005623dac00fc7 in mysql_rename_table (base=0x615000001498, old_db=0x586b3f42d170, old_name=0x586b3f42d1a0, new_db=0x586b3f42d170, new_name=0x586b3f42d190, id=0x586b3f42d1c0, flags=1) at /data/Server/10.6I/sql/sql_table.cc:4970
#25 0x00005623dac26389 in mysql_alter_table (thd=0x62b00013b218, new_db=0x62b00013fc18, new_name=0x62b000140030, create_info=0x586b3f42e5d0, table_list=0x62b000142418, alter_info=0x586b3f42e4a0, order_num=0, order=0x0, ignore=false, if_exists=false) at /data/Server/10.6I/sql/sql_table.cc:10561
#9  0x00005623da9c2b5c in mysql_parse (thd=0x62b00013b218, 
    rawbuf=0x62b000142238 "ALTER TABLE `A` ADD FOREIGN KEY ( `col_int_nokey` ) REFERENCES `D` (pk) ON UPDATE NO ACTION /* E_R Thread2 QNO 28 CON_ID 15 */", 
    length=126, parser_state=0x586b3f42fb20)
    at /data/Server/10.6I/sql/sql_parse.cc:8030

This would suggest that there is insufficient MDL being held in either thread. It turns out that the ALTER TABLE A in Thread 15 was being granted MDL_EXCLUSIVE:

(rr) backtrace
#0  MDL_context::acquire_lock (this=0x62b00013b348, 
    mdl_request=0x586b3f42b340, lock_wait_timeout=86400)
    at /data/Server/10.6I/sql/mdl.cc:2272
#1  0x00005623dadbce60 in MDL_context::upgrade_shared_lock (
    this=0x62b00013b348, mdl_ticket=0x60700004e520, new_type=MDL_EXCLUSIVE, 
    lock_wait_timeout=86400) at /data/Server/10.6I/sql/mdl.cc:2566
#2  0x00005623da82bade in wait_while_table_is_used (thd=0x62b00013b218, 
    table=0x61900007e498, function=HA_EXTRA_PREPARE_FOR_RENAME)
    at /data/Server/10.6I/sql/sql_base.cc:1313
#3  0x00005623dac26066 in mysql_alter_table (thd=0x62b00013b218, 
    new_db=0x62b00013fc18, new_name=0x62b000140030, 
    create_info=0x586b3f42e5d0, table_list=0x62b000142418, 
    alter_info=0x586b3f42e4a0, order_num=0, order=0x0, ignore=false, 
    if_exists=false) at /data/Server/10.6I/sql/sql_table.cc:10501
#4  0x00005623dadb1081 in Sql_cmd_alter_table::execute (this=0x62b000142df0, 
    thd=0x62b00013b218) at /data/Server/10.6I/sql/sql_alter.cc:550
#5  0x00005623da9b668a in mysql_execute_command (thd=0x62b00013b218, 
    is_called_from_prepared_stmt=false)
    at /data/Server/10.6I/sql/sql_parse.cc:5997
#6  0x00005623da9c2b5c in mysql_parse (thd=0x62b00013b218, 
    rawbuf=0x62b000142238 "ALTER TABLE `A` ADD FOREIGN KEY ( `col_int_nokey` ) REFERENCES `D` (pk) ON UPDATE NO ACTION /* E_R Thread2 QNO 28 CON_ID 15 */", 
    length=126, parser_state=0x586b3f42fb20)
(rr) thread apply 2 backtrace
#18 0x00005623dbc9ebe3 in os_file_read_string (file=0x615000001980, str=0x611000001a80 "", size=256) at /data/Server/10.6I/storage/innobase/os/os0file.cc:412
#19 0x00005623dbf26099 in trx_set_detailed_error_from_file (trx=0x20d977a41b48, file=0x615000001980) at /data/Server/10.6I/storage/innobase/trx/trx0trx.cc:88
#20 0x00005623dbd9856e in row_ins_set_detailed (trx=0x20d977a41b48, foreign=0x61600406b708) at /data/Server/10.6I/storage/innobase/row/row0ins.cc:686
#21 0x00005623dbd99377 in row_ins_foreign_report_add_err (trx=0x20d977a41b48, foreign=0x61600406b708, rec=0x51282bee4070 "supremum", entry=0x61a000315ee0) at /data/Server/10.6I/storage/innobase/row/row0ins.cc:810
#44 0x00005623da9c2b5c in mysql_parse (thd=0x62b0000af218, rawbuf=0x62b0000b6238 "INSERT INTO `BB` SELECT * FROM `view_E` /* E_R Thread1 QNO 23 CON_ID 14 */", length=74, parser_state=0x289908ce6b20) at /data/Server/10.6I/sql/sql_parse.cc:8030

In MDEV-21175, I made InnoDB assume that sufficient MDL are being acquired in the SQL layer. If this bug is not repeatable in earlier versions, then we must either improve the MDL logic or work around the missing MDL by making ha_innobase::rename_table() wait for an exclusive InnoDB table lock.

Comment by Marko Mäkelä [ 2021-09-16 ]

serg, I think that this needs your feedback. Ideally, this would be fixed outside the storage engine.

Could the ALTER TABLE acquire some MDL on all tables that are immediately connected via FOREIGN KEY constraints (immediate parents and children)?

Comment by Marko Mäkelä [ 2021-10-13 ]

wlad, can you please look at this? Note that the first rule in TBR-1186.yy produces syntax errors and can probably be safely ignored.

As far as I can tell, we have two ALTER TABLE that are executing ADD FOREIGN KEY, and one INSERT that is accessing a freed table name when reporting a FOREIGN KEY violation.

I tried but failed to create an mtr test case for this. I think that the DEBUG_SYNC point innodb_rename_in_cache could be useful here.

Comment by Marko Mäkelä [ 2021-10-13 ]

I repeated the bug with a grammar that was simplified further from TBR-1186.yy:

work_load:
    INSERT INTO _table SELECT * FROM _table |
    ALTER TABLE _table ADD FOREIGN KEY ({ @columns=(); '' } `col_int_nokey` { push @columns, $last_field; '' }) REFERENCES _table;
query:
    { exit 0 };
thread1:
    work_load ;
thread2:
    work_load ;

With this, I got a different stack trace:

10.6

# 2021-10-13T18:11:22 [223547] #7  0x00007f84b9ccb7f2 in __GI___assert_fail (assertion=0x556908273060 "i == j", file=0x55690825a4a0 "/mariadb/10.6/storage/innobase/handler/ha_innodb.cc", line=12463, function=0x556908272ce0 "dberr_t create_table_info_t::create_foreign_keys()") at assert.c:101
# 2021-10-13T18:11:22 [223547] #8  0x0000556907147791 in create_table_info_t::create_foreign_keys (this=this@entry=0x7f849903fc70) at /mariadb/10.6/storage/innobase/handler/ha_innodb.cc:12463
# 2021-10-13T18:11:22 [223547] #9  0x00005569071494ed in create_table_info_t::create_table (this=this@entry=0x7f849903fc70, create_fk=create_fk@entry=true) at /mariadb/10.6/storage/innobase/handler/ha_innodb.cc:12733
# 2021-10-13T18:11:22 [223547] #10 0x0000556907167fe8 in ha_innobase::create (this=<optimized out>, name=<optimized out>, form=<optimized out>, create_info=<optimized out>, file_per_table=<optimized out>, trx=0x7f84a530aee0, trx@entry=0x0) at /mariadb/10.6/storage/innobase/handler/ha_innodb.cc:13187
# 2021-10-13T18:11:22 [223547] #11 0x000055690714a316 in ha_innobase::create (this=<optimized out>, name=<optimized out>, form=<optimized out>, create_info=<optimized out>) at /mariadb/10.6/storage/innobase/handler/ha_innodb.cc:13227
# 2021-10-13T18:11:22 [223547] #12 0x00005569065742ca in handler::ha_create (this=0x61d000c166b8, name=<optimized out>, form=form@entry=0x7f84990405d0, info_arg=info_arg@entry=0x7f84990440a0) at /mariadb/10.6/sql/handler.cc:5422
# 2021-10-13T18:11:22 [223547] #13 0x0000556906576734 in ha_create_table (thd=thd@entry=0x62b00012d218, path=path@entry=0x7f8499043bae "./test/#sql-alter-36974-f", db=<optimized out>, table_name=<optimized out>, create_info=create_info@entry=0x7f84990440a0, frm=frm@entry=0x7f8499041210, skip_frm_file=<optimized out>) at /mariadb/10.6/sql/handler.cc:5887
# 2021-10-13T18:11:22 [223547] #14 0x0000556905f6fa3f in mysql_alter_table (thd=thd@entry=0x62b00012d218, new_db=new_db@entry=0x62b000131e18, new_name=new_name@entry=0x62b000132230, create_info=0x7f84990440a0, table_list=table_list@entry=0x62b0001343e8, alter_info=<optimized out>, order_num=<optimized out>, order=<optimized out>, ignore=<optimized out>, if_exists=<optimized out>) at /mariadb/10.6/sql/sql_table.cc:10271
# 2021-10-13T18:11:22 [223547] #15 0x000055690610b933 in Sql_cmd_alter_table::execute (this=<optimized out>, thd=0x62b00012d218) at /mariadb/10.6/sql/sql_alter.cc:550
# 2021-10-13T18:11:22 [223547] #16 0x0000556905cbd741 in mysql_execute_command (thd=thd@entry=0x62b00012d218, is_called_from_prepared_stmt=is_called_from_prepared_stmt@entry=false) at /mariadb/10.6/sql/sql_parse.cc:5997
# 2021-10-13T18:11:22 [223547] #17 0x0000556905cc0525 in mysql_parse (thd=thd@entry=0x62b00012d218, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x7f8499045860) at /mariadb/10.6/sql/sql_parse.cc:8030
# 2021-10-13T18:11:22 [223547] #18 0x0000556905cc7751 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x62b00012d218, packet=packet@entry=0x629001270219 "ALTER TABLE `BB` ADD FOREIGN KEY ( `col_int_nokey` ) REFERENCES `AA` /* E_R Thread1 QNO 16 CON_ID 15 */ ", packet_length=packet_length@entry=104, blocking=blocking@entry=true) at /mariadb/10.6/sql/sql_parse.cc:1896

I think that the fix should be something like this (adapted from DML_prelocking_strategy::handle_table()). However, I do not know what the last parameter should be. We would want tables to be locked both in the ‘child’ and ‘parent’ direction. I think that it should suffice to acquire MDL_SHARED on the immediate parents or children.

diff --git a/sql/sql_base.cc b/sql/sql_base.cc
--- a/sql/sql_base.cc
+++ b/sql/sql_base.cc
@@ -4965,7 +4965,10 @@ bool Alter_table_prelocking_strategy::handle_table(THD *thd,
              Query_tables_list *prelocking_ctx, TABLE_LIST *table_list,
              bool *need_prelocking)
 {
-  return FALSE;
+  TABLE *table= table_list->table;
+  return table->file->referenced_by_foreign_key() &&
+    prepare_fk_prelocking_list(thd, prelocking_ctx, table_list,
+                               need_prelocking, ???);
 }
 
 

Comment by Marko Mäkelä [ 2021-10-13 ]

This attempted fix did not make any difference.

diff --git a/sql/sql_base.cc b/sql/sql_base.cc
index 8ee1d19961f..bd3b1d21b20 100644
--- a/sql/sql_base.cc
+++ b/sql/sql_base.cc
@@ -4965,7 +4965,33 @@ bool Alter_table_prelocking_strategy::handle_table(THD *thd,
              Query_tables_list *prelocking_ctx, TABLE_LIST *table_list,
              bool *need_prelocking)
 {
-  return FALSE;
+  TABLE *table= table_list->table;
+  if (!table->file->referenced_by_foreign_key())
+    return false;
+  table_list->for_insert_data= true;
+  if (prepare_fk_prelocking_list(thd, prelocking_ctx, table_list,
+                                 need_prelocking, 255))
+    return true;
+  Query_arena backup, *arena= thd->activate_stmt_arena_if_needed(&backup);
+
+  List<FOREIGN_KEY_INFO> fk_child_list;
+  table->file->get_foreign_key_list(thd, &fk_child_list);
+  List_iterator<FOREIGN_KEY_INFO> fk_child_it(fk_child_list);
+  while (const FOREIGN_KEY_INFO *fk= fk_child_it++)
+  {
+    if (table_already_fk_prelocked(prelocking_ctx->query_tables,
+                                   fk->foreign_db, fk->foreign_table, TL_READ))
+      continue;
+    TABLE_LIST *tl= (TABLE_LIST *) thd->alloc(sizeof(TABLE_LIST));
+    tl->init_one_table_for_prelocking(fk->foreign_db, fk->foreign_table,
+                                      nullptr, TL_READ, TABLE_LIST::PRELOCK_FK,
+                                      table_list->belong_to_view, 255,
+                                      &prelocking_ctx->query_tables_last, true);
+  }
+
+  if (arena)
+    thd->restore_active_arena(arena, &backup);
+  return false;
 }
 
 

I also tried replacing 255 with 0, TL_READ with TL_FIRST_WRITE, and fk->foreign_ with fk->referenced_, to no avail. serg, can you please show how to make mysql_alter_table() acquire shared metadata locks on all parent and child tables?

Comment by Marko Mäkelä [ 2021-10-13 ]

I get my DDL-time crash with the simpler grammar from my earlier comment, and with the invocation in MDEV-26554.sh, using this version of RQG.

Comment by Marko Mäkelä [ 2021-10-14 ]

I filed my DDL-time assertion failure as MDEV-26824. It does not involve any concurrency.

With the the original grammar TBR-1186.yy I had the problem that it would generate invalid SQL like this:

ALTER TABLE `DD` ADD FOREIGN KEY ( `col_int_nokey` ) REFERENCES `view_CC` ();

Something must be wrong with my invocation of RQG, because it fails in the same way even with the tool revision of the day when this ticket was filed.

I tried again with a simpler grammar, but I could not reproduce any crash:

work_load:
    INSERT INTO _table SELECT * FROM _table |
    ALTER TABLE _table ADD FOREIGN KEY (col_int_nokey) REFERENCES _table (col_int_nokey);
 
query: { exit 0 };
 
thread1: work_load;
thread2: work_load;

I omitted the first work_load grammar rule of TBR-1186.yy, because it would flood the log with syntax errors. I tried replacing it with COMMIT or ROLLBACK, but it did not help.

In addition to the currently latest 10.6 revision b4911f5a34f8dcfb642c6f14535bc9d5d97ade44 I tested ee39757f3c91e04a0ccbb5424fba7dd56167ad93 for which we have the rr record trace, but I failed to reproduce the assertion.

Comment by Marko Mäkelä [ 2021-10-14 ]

Here is some more data from the rr replay that I analyzed a little in my first comment. Remember, A is a parent table of BB. Here, Thread 3 corresponds to Thread 15 of my initial comment. Thread 2 is the one that will crash on INSERT.

10.6 ee39757f3c91e04a0ccbb5424fba7dd56167ad93

Thread 3 hit Breakpoint 1, fk_prepare_copy_alter_table (thd=0x62b00013b218, 
    table=0x61900007e498, alter_info=0x586b3f42e4a0, alter_ctx=0x586b3f42d120)
    at /data/Server/10.6I/sql/sql_table.cc:8915
8915	    List_iterator<Key> fk_list_it(alter_info->key_list);
1: thd->query_string->string->str = 0x62b000142238 "ALTER TABLE `A` ADD FOREIGN KEY ( `col_int_nokey` ) REFERENCES `D` (pk) ON UPDATE NO ACTION /* E_R Thread2 QNO 28 CON_ID 15 */"
(rr) when
Current event: 88002
(rr) thr a 2 bt
 
Thread 2 (Thread 2506142.2506876):
#0  0x00005623df0148b9 in ?? () from /usr/lib/x86_64-linux-gnu/libasan.so.5
#1  0x00005623dc0b0911 in dict_table_get_index_on_name (table=0x6180000f2108, name=0x61d00092f1e7 "col_datetime_key") at /data/Server/10.6I/storage/innobase/dict/dict0dict.cc:4351
#2  0x00005623dbab5de9 in dict_table_get_index_on_name (table=0x6180000f2108, name=0x61d00092f1e7 "col_datetime_key") at /data/Server/10.6I/storage/innobase/include/dict0dict.h:1312
#3  0x00005623dba5fcd2 in check_index_consistency (table=0x619001758e98, ib_table=0x6180000f2108) at /data/Server/10.6I/storage/innobase/handler/ha_innodb.cc:5703
#4  0x00005623dba6132e in ha_innobase::open (this=0x61d000940cb8, name=0x61b000078448 "./test/#sql-alter-263d9e-e") at /data/Server/10.6I/storage/innobase/handler/ha_innodb.cc:5971
#5  0x00005623db17bc43 in handler::ha_open (this=0x61d000940cb8, table_arg=0x619001758e98, name=0x61b000078448 "./test/#sql-alter-263d9e-e", mode=2, test_if_locked=4114, mem_root=0x0, partitions_to_open=0x0) at /data/Server/10.6I/sql/handler.cc:3313
#6  0x00005623dacd01dd in open_table_from_share (thd=0x62b0000af218, share=0x61b000077e98, alias=0x289908ce2520, db_stat=1, prgflag=8, ha_open_flags=4114, outparam=0x619001758e98, is_create_table=false, partitions_to_open=0x0) at /data/Server/10.6I/sql/table.cc:4327
#7  0x00005623dafc7ac5 in THD::open_temporary_table (this=0x62b0000af218, share=0x61b000077e98, alias_arg=0x62b0000b63c8 "AA") at /data/Server/10.6I/sql/temporary_tables.cc:1117
#8  0x00005623dafc316e in THD::create_and_open_tmp_table (this=0x62b0000af218, frm=0x289908ce27f0, path=0x289908ce4b5e "./test/#sql-alter-263d9e-e", db=0x62b0000b6b00 "test", table_name=0x62b0000b63c8 "AA", open_internal_tables=true) at /data/Server/10.6I/sql/temporary_tables.cc:74
#9  0x00005623dac24af2 in mysql_alter_table (thd=0x62b0000af218, new_db=0x62b0000b3c18, new_name=0x62b0000b4030, create_info=0x289908ce55d0, table_list=0x62b0000b6418, alter_info=0x289908ce54a0, order_num=0, order=0x0, ignore=false, if_exists=false) at /data/Server/10.6I/sql/sql_table.cc:10283
#10 0x00005623dadb1081 in Sql_cmd_alter_table::execute (this=0x62b0000b6df0, thd=0x62b0000af218) at /data/Server/10.6I/sql/sql_alter.cc:550
#11 0x00005623da9b668a in mysql_execute_command (thd=0x62b0000af218, is_called_from_prepared_stmt=false) at /data/Server/10.6I/sql/sql_parse.cc:5997
#12 0x00005623da9c2b5c in mysql_parse (thd=0x62b0000af218, rawbuf=0x62b0000b6238 "ALTER TABLE `AA` ADD FOREIGN KEY ( `col_int_nokey` ) REFERENCES `C` (pk) ON UPDATE NO ACTION /* E_R Thread1 QNO 22 CON_ID 14 */", length=127, parser_state=0x289908ce6b20) at /data/Server/10.6I/sql/sql_parse.cc:8030
(rr) c
Continuing.
 
Thread 2 hit Breakpoint 2, ha_innobase::write_row (this=0x62b0000b1be8, record=0x5623da852b8f <fill_record(THD*, TABLE*, Field**, List<Item>&, bool, bool)> "\363\017\036\372UH\211\345AVAUATSH\201\354\020\001")
    at /data/Server/10.6I/storage/innobase/handler/ha_innodb.cc:7748
7748	{
(rr) when
Current event: 89574
(rr) bt
#14 0x00005623da9c2b5c in mysql_parse (thd=0x62b0000af218, rawbuf=0x62b0000b6238 "INSERT INTO `BB` SELECT * FROM `view_E` /* E_R Thread1 QNO 23 CON_ID 14 */", length=74, parser_state=0x289908ce6b20)
    at /data/Server/10.6I/sql/sql_parse.cc:8030
(rr) thr a 3 bt
 
Thread 3 (Thread 2506142.2513854):
#8  0x00006d2c2bfa93f1 in fdatasync (fd=19) at ../sysdeps/unix/sysv/linux/fdatasync.c:28
#9  0x00005623dc3f148c in my_sync (fd=19, my_flags=16) at /data/Server/10.6I/mysys/my_sync.c:92
#10 0x00005623dac6dc9c in inline_mysql_file_sync (fd=19, flags=16) at /data/Server/10.6I/include/mysql/psi/mysql_file.h:1448
#11 0x00005623dac6ded7 in ddl_log_sync_file () at /data/Server/10.6I/sql/ddl_log.cc:193
#12 0x00005623dac6e885 in update_phase (entry_pos=2, phase=6 '\006') at /data/Server/10.6I/sql/ddl_log.cc:340
#13 0x00005623dac7bf21 in ddl_log_update_phase (state=0x586b3f42b970, phase=6 '\006') at /data/Server/10.6I/sql/ddl_log.cc:2935
#14 0x00005623dac26012 in mysql_alter_table (thd=0x62b00013b218, new_db=0x62b00013fc18, new_name=0x62b000140030, create_info=0x586b3f42e5d0, table_list=0x62b000142418, alter_info=0x586b3f42e4a0, order_num=0, order=0x0, ignore=false, if_exists=false) at /data/Server/10.6I/sql/sql_table.cc:10496
#15 0x00005623dadb1081 in Sql_cmd_alter_table::execute (this=0x62b000142df0, thd=0x62b00013b218) at /data/Server/10.6I/sql/sql_alter.cc:550
#16 0x00005623da9b668a in mysql_execute_command (thd=0x62b00013b218, is_called_from_prepared_stmt=false) at /data/Server/10.6I/sql/sql_parse.cc:5997
#17 0x00005623da9c2b5c in mysql_parse (thd=0x62b00013b218, rawbuf=0x62b000142238 "ALTER TABLE `A` ADD FOREIGN KEY ( `col_int_nokey` ) REFERENCES `D` (pk) ON UPDATE NO ACTION /* E_R Thread2 QNO 28 CON_ID 15 */", length=126, parser_state=0x586b3f42fb20) at /data/Server/10.6I/sql/sql_parse.cc:8030

We can see that at the time the INSERT on the child table BB is inserting the first row, ALTER TABLE A on its parent table is in progress. That ALTER TABLE would corrupt the metadata of the INSERT by rebuilding the dict_table_t object (which corresponds to TABLE_SHARE in the SQL layer):

Thread 3 hit Hardware watchpoint 3: -location foreign->referenced_index
 
Old value = (dict_index_t *) 0x616000708408
New value = (dict_index_t *) 0x0
dict_table_rename_in_cache (table=0x618000043908, new_name=0x586b3f429cb0 "test/#sql-backup-263d9e-f", rename_also_foreigns=false, replace_new_file=false)
    at /data/Server/10.6I/storage/innobase/dict/dict0dict.cc:1557
1557			     ++it) {
(rr) when
Current event: 89695
(rr) bt
#0  dict_table_rename_in_cache (table=0x618000043908, new_name=0x586b3f429cb0 "test/#sql-backup-263d9e-f", rename_also_foreigns=false, replace_new_file=false)
    at /data/Server/10.6I/storage/innobase/dict/dict0dict.cc:1557
#1  0x00005623dbdf2a9a in row_rename_table_for_mysql (old_name=0x586b3f429ef0 "test/A", new_name=0x586b3f429cb0 "test/#sql-backup-263d9e-f", trx=0x20d977a43e88, use_fk=true)
    at /data/Server/10.6I/storage/innobase/row/row0mysql.cc:2915
#2  0x00005623dba8bc65 in innobase_rename_table (trx=0x20d977a43e88, from=0x586b3f42ac80 "./test/A", to=0x586b3f42aec0 "./test/#sql-backup-263d9e-f", use_fk=true)
    at /data/Server/10.6I/storage/innobase/handler/ha_innodb.cc:13655
#3  0x00005623dba8e674 in ha_innobase::rename_table (this=0x62b000146510, from=0x586b3f42ac80 "./test/A", to=0x586b3f42aec0 "./test/#sql-backup-263d9e-f")
    at /data/Server/10.6I/storage/innobase/handler/ha_innodb.cc:13997
#4  0x00005623db189aed in handler::ha_rename_table (this=0x62b000146510, from=0x586b3f42ac80 "./test/A", to=0x586b3f42aec0 "./test/#sql-backup-263d9e-f") at /data/Server/10.6I/sql/handler.cc:5298
#5  0x00005623dac00fc7 in mysql_rename_table (base=0x615000001498, old_db=0x586b3f42d130, old_name=0x586b3f42d140, new_db=0x586b3f42d130, new_name=0x586b3f42b810, id=0x586b3f42d1d0, flags=2)
    at /data/Server/10.6I/sql/sql_table.cc:4970
#6  0x00005623dac26239 in mysql_alter_table (thd=0x62b00013b218, new_db=0x62b00013fc18, new_name=0x62b000140030, create_info=0x586b3f42e5d0, table_list=0x62b000142418, alter_info=0x586b3f42e4a0, order_num=0, 
    order=0x0, ignore=false, if_exists=false) at /data/Server/10.6I/sql/sql_table.cc:10529
#7  0x00005623dadb1081 in Sql_cmd_alter_table::execute (this=0x62b000142df0, thd=0x62b00013b218) at /data/Server/10.6I/sql/sql_alter.cc:550
#8  0x00005623da9b668a in mysql_execute_command (thd=0x62b00013b218, is_called_from_prepared_stmt=false) at /data/Server/10.6I/sql/sql_parse.cc:5997
#9  0x00005623da9c2b5c in mysql_parse (thd=0x62b00013b218, 
    rawbuf=0x62b000142238 "ALTER TABLE `A` ADD FOREIGN KEY ( `col_int_nokey` ) REFERENCES `D` (pk) ON UPDATE NO ACTION /* E_R Thread2 QNO 28 CON_ID 15 */", length=126, parser_state=0x586b3f42fb20)
    at /data/Server/10.6I/sql/sql_parse.cc:8030
(rr) thr a 2 bt
 
Thread 2 (Thread 2506142.2506876):
#13 0x00006d2c2bf17063 in __GI__IO_fread (buf=0x611000001a80, size=1, count=255, fp=0x615000001980) at iofread.c:38
#14 0x00005623defa9426 in ?? () from /usr/lib/x86_64-linux-gnu/libasan.so.5
#15 0x00005623dbc9ec06 in os_file_read_string (file=0x615000001980, str=0x611000001a80 "", size=256) at /data/Server/10.6I/storage/innobase/os/os0file.cc:414
#16 0x00005623dbf26099 in trx_set_detailed_error_from_file (trx=0x20d977a41b48, file=0x615000001980) at /data/Server/10.6I/storage/innobase/trx/trx0trx.cc:88
#17 0x00005623dbd9856e in row_ins_set_detailed (trx=0x20d977a41b48, foreign=0x61600406b708) at /data/Server/10.6I/storage/innobase/row/row0ins.cc:686
#18 0x00005623dbd99377 in row_ins_foreign_report_add_err (trx=0x20d977a41b48, foreign=0x61600406b708, rec=0x51282bee4070 "supremum", entry=0x61a000315ee0) at /data/Server/10.6I/storage/innobase/row/row0ins.cc:810
#19 0x00005623dbd9e16b in row_ins_check_foreign_constraint (check_ref=1, foreign=0x61600406b708, table=0x6180000afd08, entry=0x61a000315ee0, thr=0x621000311e38) at /data/Server/10.6I/storage/innobase/row/row0ins.cc:1802
#20 0x00005623dbd9ef09 in row_ins_check_foreign_constraints (table=0x6180000afd08, index=0x616004069008, pk=false, entry=0x61a000315ee0, thr=0x621000311e38) at /data/Server/10.6I/storage/innobase/row/row0ins.cc:1950
#21 0x00005623dbda60c4 in row_ins_sec_index_entry (index=0x616004069008, entry=0x61a000315ee0, thr=0x621000311e38, check_foreign=true) at /data/Server/10.6I/storage/innobase/row/row0ins.cc:3274
#22 0x00005623dbda66e1 in row_ins_index_entry (index=0x616004069008, entry=0x61a000315ee0, thr=0x621000311e38) at /data/Server/10.6I/storage/innobase/row/row0ins.cc:3351
#23 0x00005623dbda7736 in row_ins_index_entry_step (node=0x621000311af0, thr=0x621000311e38) at /data/Server/10.6I/storage/innobase/row/row0ins.cc:3517
#24 0x00005623dbda80f5 in row_ins (node=0x621000311af0, thr=0x621000311e38) at /data/Server/10.6I/storage/innobase/row/row0ins.cc:3663
#25 0x00005623dbda921d in row_ins_step (thr=0x621000311e38) at /data/Server/10.6I/storage/innobase/row/row0ins.cc:3809
#26 0x00005623dbde8ed2 in row_insert_for_mysql (mysql_rec=0x6190016c88c8 "", prebuilt=0x621000311188, ins_mode=ROW_INS_NORMAL) at /data/Server/10.6I/storage/innobase/row/row0mysql.cc:1318
#27 0x00005623dba6c4b1 in ha_innobase::write_row (this=0x61d0008d36b8, record=0x6190016c88c8 "") at /data/Server/10.6I/storage/innobase/handler/ha_innodb.cc:7820

At the time the metadata of the parent table A was being corrupted, the INSERT on the child table was reporting a FOREIGN KEY violation and referring to the parent table A while doing that.

Next, let us look at the MDL acquisitions:

Thread 2 hit Breakpoint 5, MDL_context::acquire_lock (this=0x62b0000af348, mdl_request=0x62b0000b6810, lock_wait_timeout=86400) at /data/Server/10.6I/sql/mdl.cc:2272
2272	  DBUG_PRINT("enter", ("lock_type: %s  timeout: %f",
3: *mdl_request->key.m_ptr@mdl_request.key.m_length = "\002test\000BB"
(rr) when
Current event: 89565
(rr) bt
#0  MDL_context::acquire_lock (this=0x62b0000af348, mdl_request=0x62b0000b6810, lock_wait_timeout=86400) at /data/Server/10.6I/sql/mdl.cc:2272
#1  0x00005623da82c3da in open_table_get_mdl_lock (thd=0x62b0000af218, ot_ctx=0x289908ce5dc0, mdl_request=0x62b0000b6810, flags=0, mdl_ticket=0x289908ce5870) at /data/Server/10.6I/sql/sql_base.cc:1548
#2  0x00005623da82d8f1 in open_table (thd=0x62b0000af218, table_list=0x62b0000b63b8, ot_ctx=0x289908ce5dc0) at /data/Server/10.6I/sql/sql_base.cc:1817
#3  0x00005623da83799c in open_and_process_table (thd=0x62b0000af218, tables=0x62b0000b63b8, counter=0x289908ce5ef0, flags=0, prelocking_strategy=0x289908ce5fd0, has_prelocking_list=false, 
    ot_ctx=0x289908ce5dc0) at /data/Server/10.6I/sql/sql_base.cc:3807
#4  0x00005623da839f47 in open_tables (thd=0x62b0000af218, options=..., start=0x289908ce5f00, counter=0x289908ce5ef0, flags=0, prelocking_strategy=0x289908ce5fd0) at /data/Server/10.6I/sql/sql_base.cc:4290
#5  0x00005623da83e592 in open_and_lock_tables (thd=0x62b0000af218, options=..., tables=0x62b0000b63b8, derived=true, flags=0, prelocking_strategy=0x289908ce5fd0) at /data/Server/10.6I/sql/sql_base.cc:5262
#6  0x00005623da7a2a86 in open_and_lock_tables (thd=0x62b0000af218, tables=0x62b0000b63b8, derived=true, flags=0) at /data/Server/10.6I/sql/sql_base.h:509
#7  0x00005623da9ac7cd in mysql_execute_command (thd=0x62b0000af218, is_called_from_prepared_stmt=false) at /data/Server/10.6I/sql/sql_parse.cc:4649
#8  0x00005623da9c2b5c in mysql_parse (thd=0x62b0000af218, rawbuf=0x62b0000b6238 "INSERT INTO `BB` SELECT * FROM `view_E` /* E_R Thread1 QNO 23 CON_ID 14 */", length=74, parser_state=0x289908ce6b20)
    at /data/Server/10.6I/sql/sql_parse.cc:8030
(rr) c
Continuing.
 
Thread 2 hit Breakpoint 5, MDL_context::acquire_lock (this=0x62b0000af348, mdl_request=0x289908ce5910, lock_wait_timeout=86400) at /data/Server/10.6I/sql/mdl.cc:2272
2272	  DBUG_PRINT("enter", ("lock_type: %s  timeout: %f",
3: *mdl_request->key.m_ptr@mdl_request.key.m_length = "\000\000"
(rr) when
Current event: 89565
(rr) bt
#0  MDL_context::acquire_lock (this=0x62b0000af348, mdl_request=0x289908ce5910, lock_wait_timeout=86400) at /data/Server/10.6I/sql/mdl.cc:2272
#1  0x00005623da82f00b in open_table (thd=0x62b0000af218, table_list=0x62b0000b63b8, ot_ctx=0x289908ce5dc0) at /data/Server/10.6I/sql/sql_base.cc:2094
#2  0x00005623da83799c in open_and_process_table (thd=0x62b0000af218, tables=0x62b0000b63b8, counter=0x289908ce5ef0, flags=0, prelocking_strategy=0x289908ce5fd0, has_prelocking_list=false, 
    ot_ctx=0x289908ce5dc0) at /data/Server/10.6I/sql/sql_base.cc:3807
#3  0x00005623da839f47 in open_tables (thd=0x62b0000af218, options=..., start=0x289908ce5f00, counter=0x289908ce5ef0, flags=0, prelocking_strategy=0x289908ce5fd0) at /data/Server/10.6I/sql/sql_base.cc:4290
#4  0x00005623da83e592 in open_and_lock_tables (thd=0x62b0000af218, options=..., tables=0x62b0000b63b8, derived=true, flags=0, prelocking_strategy=0x289908ce5fd0) at /data/Server/10.6I/sql/sql_base.cc:5262
#5  0x00005623da7a2a86 in open_and_lock_tables (thd=0x62b0000af218, tables=0x62b0000b63b8, derived=true, flags=0) at /data/Server/10.6I/sql/sql_base.h:509
#6  0x00005623da9ac7cd in mysql_execute_command (thd=0x62b0000af218, is_called_from_prepared_stmt=false) at /data/Server/10.6I/sql/sql_parse.cc:4649
#7  0x00005623da9c2b5c in mysql_parse (thd=0x62b0000af218, rawbuf=0x62b0000b6238 "INSERT INTO `BB` SELECT * FROM `view_E` /* E_R Thread1 QNO 23 CON_ID 14 */", length=74, parser_state=0x289908ce6b20)
    at /data/Server/10.6I/sql/sql_parse.cc:8030
(rr) c
Continuing.
 
Thread 2 hit Breakpoint 5, MDL_context::acquire_lock (this=0x62b0000af348, mdl_request=0x62b0000b7538, lock_wait_timeout=86400) at /data/Server/10.6I/sql/mdl.cc:2272
2272	  DBUG_PRINT("enter", ("lock_type: %s  timeout: %f",
3: *mdl_request->key.m_ptr@mdl_request.key.m_length = "\002test\000view_E"
(rr) c
Continuing.
 
Thread 2 hit Breakpoint 5, MDL_context::acquire_lock (this=0x62b0000af348, mdl_request=0x629000dd9690, lock_wait_timeout=86400) at /data/Server/10.6I/sql/mdl.cc:2272
2272	  DBUG_PRINT("enter", ("lock_type: %s  timeout: %f",
3: *mdl_request->key.m_ptr@mdl_request.key.m_length = "\002test\000E"
(rr) c
Continuing.
 
Thread 2 hit Breakpoint 2, ha_innobase::write_row (this=0x62b0000b1be8, record=0x5623da852b8f <fill_record(THD*, TABLE*, Field**, List<Item>&, bool, bool)> "\363\017\036\372UH\211\345AVAUATSH\201\354\020\001")
    at /data/Server/10.6I/storage/innobase/handler/ha_innodb.cc:7748
7748	{
(rr) c
Continuing.
[Switching to Thread 2506142.2513854]
 
Thread 3 hit Breakpoint 5, MDL_context::acquire_lock (this=0x62b00013b348, mdl_request=0x586b3f42b340, lock_wait_timeout=86400) at /data/Server/10.6I/sql/mdl.cc:2272
2272	  DBUG_PRINT("enter", ("lock_type: %s  timeout: %f",
3: *mdl_request->key.m_ptr@mdl_request.key.m_length = "\002test\000A"
(rr) when
Current event: 89603
(rr) bt
#0  MDL_context::acquire_lock (this=0x62b00013b348, mdl_request=0x586b3f42b340, lock_wait_timeout=86400) at /data/Server/10.6I/sql/mdl.cc:2272
#1  0x00005623dadbce60 in MDL_context::upgrade_shared_lock (this=0x62b00013b348, mdl_ticket=0x60700004e520, new_type=MDL_EXCLUSIVE, lock_wait_timeout=86400) at /data/Server/10.6I/sql/mdl.cc:2566
#2  0x00005623da82bade in wait_while_table_is_used (thd=0x62b00013b218, table=0x61900007e498, function=HA_EXTRA_PREPARE_FOR_RENAME) at /data/Server/10.6I/sql/sql_base.cc:1313
#3  0x00005623dac26066 in mysql_alter_table (thd=0x62b00013b218, new_db=0x62b00013fc18, new_name=0x62b000140030, create_info=0x586b3f42e5d0, table_list=0x62b000142418, alter_info=0x586b3f42e4a0, order_num=0, 
    order=0x0, ignore=false, if_exists=false) at /data/Server/10.6I/sql/sql_table.cc:10501
#4  0x00005623dadb1081 in Sql_cmd_alter_table::execute (this=0x62b000142df0, thd=0x62b00013b218) at /data/Server/10.6I/sql/sql_alter.cc:550
#5  0x00005623da9b668a in mysql_execute_command (thd=0x62b00013b218, is_called_from_prepared_stmt=false) at /data/Server/10.6I/sql/sql_parse.cc:5997
#6  0x00005623da9c2b5c in mysql_parse (thd=0x62b00013b218, 
    rawbuf=0x62b000142238 "ALTER TABLE `A` ADD FOREIGN KEY ( `col_int_nokey` ) REFERENCES `D` (pk) ON UPDATE NO ACTION /* E_R Thread2 QNO 28 CON_ID 15 */", length=126, parser_state=0x586b3f42fb20)
    at /data/Server/10.6I/sql/sql_parse.cc:8030

Huh? What was that empty database and table name? Should it have been test.A? Apart from the main table test.BB, the INSERT acquired metadata lock for all other names but not the parent table name test.A. Because test.A was not locked, the ALTER TABLE will be granted MDL_EXCLUSIVE on it.

Comment by Marko Mäkelä [ 2021-10-14 ]

It turns out that the empty name simply belongs to a MDL_key::BACKUP lock. We see that the INSERT only locked the target table BB, its children (none), and the source view view_E and its underlying table E. At the time of the sole ha_innobase::write_row() invocation for the INSERT, the table test/BB inside InnoDB has no children and only one parent foreign key constraint:

(rr) p *m_prebuilt.table
$7 = {id = 29, id_hash = 0x0, name = {m_name = 0x60300014f290 "test/BB"}, …,
  mdl_name = {m_name = 0x60300014f290 "test/BB"},
  foreign_set = std::set with 1 element = {[0] = 0x61600406b708}, 
  referenced_set = std::set with 0 elements,
(rr) p *(dict_foreign_t*)0x61600406b708
$10 = {heap = 0x611000188340, id = 0x61600406b798 "test/BB_ibfk_1", n_fields = 1, type = 32, foreign_table_name = 0x61600406b7d0 "test/BB", foreign_table_name_lookup = 0x61600406b7d0 "test/BB", 
  foreign_table = 0x6180000afd08, foreign_col_names = 0x61600406b7f8, referenced_table_name = 0x61600406b808 "test/A", referenced_table_name_lookup = 0x61600406b808 "test/A", referenced_table = 0x618000043908, 
  referenced_col_names = 0x61600406b830, foreign_index = 0x616004069008, referenced_index = 0x616000708408, v_cols = 0x0}

This constraint would be corrupted during the rename phase of the ALTER TABLE A that is adding a parent to that table.

I think that before ALTER TABLE with ALGORITHM=COPY invokes handler::rename_table() on the original table, it must acquire MDL_EXCLUSIVE on all child tables to prevent this race. Before MDEV-21175, this problem was prevented by excessive locking inside InnoDB.

For ALGORITHM=INPLACE there probably is no problem. ADD FOREIGN KEY would only execute with ALGORITHM=INPLACE if SET foreign_key_checks=OFF is in effect.

Comment by Marko Mäkelä [ 2021-10-14 ]

Here is an mtr test case. Not acceptable for a regression test suite; we would probably want DEBUG_SYNC for that.

--source include/have_innodb.inc
 
CREATE TABLE parent(a INT PRIMARY KEY) ENGINE=InnoDB;
CREATE TABLE child(a INT PRIMARY KEY REFERENCES parent(a)) ENGINE=InnoDB;
 
DELIMITER |;
CREATE PROCEDURE proc_alter()
BEGIN
	DECLARE i int DEFAULT 1;
	WHILE (i <= 10000000) DO
		ALTER TABLE parent FORCE, ALGORITHM=COPY;
		SET i = i + 1;
	END WHILE;
	COMMIT;
END|
DELIMITER ;|
 
connect (ddl,localhost,root,,);
send CALL proc_alter();
 
connection default;
--disable_warnings
--disable_query_log
let $N=100000;
while ($N) {
INSERT IGNORE INTO child SET a=1;
dec $N;
}
--enable_query_log
--enable_warnings
 
connection ddl;
reap;
disconnect ddl;
 
connection default;
DROP TABLE child, parent;

The result:

10.6 41c66ef6f76634112a3c84b85097239b40d1efad

mysqltest: At line 28: query 'INSERT IGNORE INTO child SET a=1' failed: <Unknown> (2013): Lost connection to server during query
Version: '10.6.5-MariaDB-debug-log'  socket: '/dev/shm/10.6m/mysql-test/var/tmp/mysqld.1.sock'  port: 16000  Source distribution
211014 15:16:32 [ERROR] mysqld got signal 11 ;
#4  id_name_t::operator() (this=0x10) at /mariadb/10.6m/storage/innobase/include/dict0mem.h:513
#5  row_ins_foreign_report_add_err (trx=trx@entry=0x7eff82acc1b8, foreign=foreign@entry=0x7eff240c0970, rec=0x7eff81f5c070 "supremum", entry=entry@entry=0x7eff2400bd50) at /mariadb/10.6m/storage/innobase/row/row0ins.cc:834
#6  0x000055e32221d34e in row_ins_check_foreign_constraint (check_ref=check_ref@entry=1, foreign=foreign@entry=0x7eff240c0970, table=table@entry=0x7eff240a1b80, entry=entry@entry=0x7eff2400bd50, thr=thr@entry=0x7eff240c3350) at /mariadb/10.6m/storage/innobase/row/row0ins.cc:1802

Comment by Marko Mäkelä [ 2021-10-14 ]

If I change the procedure to this, I will get a different crash:

DELIMITER |;
CREATE PROCEDURE proc_alter()
BEGIN
	DECLARE i int DEFAULT 1;
	SET foreign_key_checks=0;
	WHILE (i <= 10000000) DO
		ALTER TABLE parent FORCE, ALGORITHM=INPLACE;
		SET i = i + 1;
	END WHILE;
	COMMIT;
END|
DELIMITER ;|

10.6 41c66ef6f76634112a3c84b85097239b40d1efad

mysqltest: At line 29: query 'INSERT IGNORE INTO child SET a=1' failed: <Unknown> (2013): Lost connection to server during query
Version: '10.6.5-MariaDB-debug-log'  socket: '/dev/shm/10.6m/mysql-test/var/tmp/mysqld.1.sock'  port: 16000  Source distribution
mariadbd: /mariadb/10.6m/storage/innobase/dict/drop.cc:159: dberr_t trx_t::drop_table(const dict_table_t&): Assertion `lock->trx == this' failed.
#7  0x00007f02dac3d7f2 in __GI___assert_fail (assertion=0x5582715d5d73 "lock->trx == this", file=0x5582715d5da8 "/mariadb/10.6m/storage/innobase/dict/drop.cc", line=159, function=0x5582715d6038 "dberr_t trx_t::drop_table(const dict_table_t&)") at assert.c:101
#8  0x00005582711c4911 in trx_t::drop_table (…) at /mariadb/10.6m/storage/innobase/dict/drop.cc:159
#9  0x0000558270ef9ca0 in commit_try_rebuild (…) at /mariadb/10.6m/storage/innobase/handler/handler0alter.cc:9953
#10 0x0000558270eec917 in ha_innobase::commit_inplace_alter_table (this=<optimized out>, altered_table=0x7f02d80cd2d0, ha_alter_info=0x7f02d80cd210, commit=true) at /mariadb/10.6m/storage/innobase/handler/handler0alter.cc:11003
#11 0x0000558270b765b6 in handler::ha_commit_inplace_alter_table (this=0x7f02780392a0, altered_table=altered_table@entry=0x7f02d80cd2d0, ha_alter_info=ha_alter_info@entry=0x7f02d80cd210, commit=commit@entry=true) at /mariadb/10.6m/sql/handler.cc:5194

So, it looks like we must acquire MDL_EXCLUSIVE on all child tables during the final phase of ALTER TABLE, no matter if that final phase is executing handler::rename_table() or handler::commit_inplace_alter_table().

Comment by Marko Mäkelä [ 2021-10-14 ]

You can invoke MDEV-26554.test to reproduce the two crashes:

./mtr --mysqld=--alter-algorithm=inplace MDEV-26554
./mtr --mysqld=--alter-algorithm=copy MDEV-26554

Comment by Marko Mäkelä [ 2021-10-14 ]

I think that if the ALTER TABLE parent FORCE is replaced with OPTIMIZE TABLE parent, we should get exactly the same results.

Furthermore, I believe (but did not test yet) that causing a table rebuild by TRUNCATE TABLE parent could cause trouble too. I would think that also TRUNCATE needs to acquire MDL_EXCLUSIVE for all child table. (Alternatively, all DML statements should acquire MDL also on parent tables, but that would likely hurt performance.)

Comment by Marko Mäkelä [ 2021-10-14 ]

When running a loop of TRUNCATE TABLE parent concurrently with INSERT IGNORE INTO child, it fails in a similar way as with ALTER TABLE parent FORCE, ALGORITHM=INPLACE:

10.6 41c66ef6f76634112a3c84b85097239b40d1efad

#6  0x00007fa92a3907f2 in __GI___assert_fail (assertion=0x56382b389433 "lock->trx == this", file=0x56382b3891a2 "/mariadb/10.6m/storage/innobase/dict/drop.cc", line=159, function=0x56382b3893b6 "dberr_t trx_t::drop_table(const dict_table_t &)") at assert.c:101
#7  0x000056382af43565 in trx_t::drop_table (this=0x7fa924707540, table=@0x7fa8c403e028: <incomplete type>) at /mariadb/10.6m/storage/innobase/dict/drop.cc:159
#8  0x000056382ac507e8 in ha_innobase::truncate (this=0x7fa8c403ae20) at /mariadb/10.6m/storage/innobase/handler/ha_innodb.cc:13843

Comment by Marko Mäkelä [ 2021-10-15 ]

I came up with an idea how to work around the missing MDL_EXCLUSIVE:

  • In ha_innobase::commit_inplace_alter_table(commit=true), try to acquire LOCK_X on all child tables, and fail if this cannot be done. Note: the lock wait would be blocked by MDL_EXCLUSIVE on the being-altered table.
  • In ha_innobase::rename_table() when renaming a table from a non-#sql to a #sql name, try to acquire LOCK_X on all child tables, and return an error if this cannot be done.
  • In ha_innobase::truncate(), try to acquire LOCK_X on all child tables, and return an error if this cannot be done.

A different solution may be needed when we get around to implementing MDEV-22361.

Comment by Marko Mäkelä [ 2021-10-18 ]

The following seems to fix this race condition for TRUNCATE, eventually causing the TRUNCATE to fail with an acceptable error:

10.6 9c5835e067e99e1f85477f28d3bdc807537393a8 with patch

mysqltest: At line 33: query 'reap' failed: ER_LOCK_DEADLOCK (1213): Deadlock found when trying to get lock; try restarting transaction

diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc
index e3e9ed08eb4..cc70423ed47 100644
--- a/storage/innobase/handler/ha_innodb.cc
+++ b/storage/innobase/handler/ha_innodb.cc
@@ -13792,6 +13792,19 @@ int ha_innobase::truncate()
 		error = fts_lock_tables(trx, *ib_table);
 	}
 
+	if (error == DB_SUCCESS) {
+		dict_sys.freeze();
+		for (const dict_foreign_t* f : ib_table->referenced_set) {
+			if (dict_table_t* child = f->foreign_table) {
+				error = lock_table_for_trx(child, trx, LOCK_X);
+				if (error != DB_SUCCESS) {
+					break;
+				}
+			}
+		}
+		dict_sys.unfreeze();
+	}
+
 	/* Wait for purge threads to stop using the table. */
 	for (uint n = 15; ib_table->get_ref_count() > 1; ) {
 		if (!--n) {

I will attempt a similar fix on ha_innobase::commit_inplace_alter_table() and ha_innobase::rename_table() as well.

Comment by Marko Mäkelä [ 2021-10-18 ]

If we acquire exclusive locks on the child tables before acquiring an exclusive lock on the main table or the statistics tables, no deadlocks will be reported between the concurrent INSERT and TRUNCATE. A similar fix for ha_innobase::rename_table() seems to work for ALTER TABLE or OPTIMIZE TABLE when using old_alter_table=1 a.k.a. alter_algorithm=copy.

Comment by Marko Mäkelä [ 2021-10-18 ]

My fix survived MDEV-26554trunc.test:

innodb.MDEV-26554trunc 'innodb'          [ pass ]  52125

I will try to create a smaller test that demonstrates that the INSERT on the child table would block the DDL operation on the parent table.

Generated at Thu Feb 08 09:46:11 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.