|
For the record, the only interesting threads among the stack traces of killed process were the following:
FLUSH TABLES executing something that is intended to be part of FLUSH TABLES…FOR EXPORT:
#6 0x000055cec54a950c in purge_sys_t::stop (this=0x55cec7a35c80 <purge_sys>) at /data/Server/bb-10.6-MDEV-30986B/storage/innobase/srv/srv0srv.cc:1358
|
#7 0x000055cec5484742 in row_quiesce_table_start (table=0x618000116908, trx=trx@entry=0x7f99e42fc340) at /data/Server/bb-10.6-MDEV-30986B/storage/innobase/row/row0quiesce.cc:533
|
#8 0x000055cec4fe9908 in ha_innobase::external_lock (this=0x61d0002026b8, thd=<optimized out>, lock_type=<optimized out>) at /data/Server/bb-10.6-MDEV-30986B/storage/innobase/handler/ha_innodb.cc:16079
|
#9 0x000055cec45f4d00 in handler::ha_external_lock (this=0x61d0002026b8, thd=thd@entry=0x62b00009a218, lock_type=lock_type@entry=0) at /data/Server/bb-10.6-MDEV-30986B/sql/handler.cc:7128
|
A purge worker task is waiting for a metadata lock:
#4 0x000055cec41a181c in inline_mysql_cond_timedwait (src_line=1207, src_file=0x55cec5f5de20 "/data/Server/bb-10.6-MDEV-30986B/sql/mdl.cc", abstime=0x7f998224fca0, mutex=0x62b00001c370, that=0x62b00001c420) at /data/Server/bb-10.6-MDEV-30986B/include/mysql/psi/mysql_thread.h:1088
|
#5 MDL_wait::timed_wait (this=this@entry=0x62b00001c370, owner=0x62b00001c300, abs_timeout=abs_timeout@entry=0x7f998224fca0, set_status_on_timeout=set_status_on_timeout@entry=false, wait_state_name=<optimized out>) at /data/Server/bb-10.6-MDEV-30986B/sql/mdl.cc:1207
|
The purge coordinator task was waiting for purge worker completion, and the buf_flush_page_cleaner() thread was idle.
A simplified grammar for reproducing this apparent hang would help a lot.
|
|
The hang that I am analyzing involves two threads and a metadata lock on the table table200_innodb_int_autoinc. The SQL thread executes FLUSH TABLES /* E_R Thread1 QNO 60 CON_ID 13 */) and an internal thread will invoke purge_worker_callback.
ssh pluto
|
rr replay /data/results/1686228537/MDEV-31393/1/rr/latest-trace
|
run 112200
|
break MDL_context::acquire_lock
|
condition 1 $_streq(mdl_request.key.m_ptr+6,"table200_innodb_int_autoinc")
|
continue
|
backtrace
|
continue
|
backtrace
|
thread apply 23 backtrace
|
finish
|
|
10.6 f569e06e03a7efa6050258a8d167de0aaa4e124c
|
(rr) condition 1 $_streq(mdl_request.key.m_ptr+6,"table200_innodb_int_autoinc")
|
(rr) continue
|
Continuing.
|
2023-06-08 12:49:04 13 [Note] InnoDB: Writing table metadata to './test/table100_innodb_int_autoinc.cfg'
|
2023-06-08 12:49:04 13 [Note] InnoDB: Table `test`.`table100_innodb_int_autoinc` flushed to disk
|
[Switching to Thread 985824.987186]
|
|
Thread 23 hit Breakpoint 1, MDL_context::acquire_lock (this=0x5a2c2c000db0, mdl_request=0x4ddb490dbc80, lock_wait_timeout=86400) at /data/Server/10.6B1/sql/mdl.cc:2276
|
2276 {
|
(rr) backtrace
|
#0 MDL_context::acquire_lock (this=0x5a2c2c000db0, mdl_request=0x4ddb490dbc80, lock_wait_timeout=86400) at /data/Server/10.6B1/sql/mdl.cc:2276
|
#1 0x0000563c727e7bbc in MDL_context::upgrade_shared_lock (this=this@entry=0x5a2c2c000db0, mdl_ticket=0x32161c47cc20, new_type=new_type@entry=MDL_EXCLUSIVE, lock_wait_timeout=86400) at /data/Server/10.6B1/sql/mdl.cc:2594
|
#2 0x0000563c72670b4a in wait_while_table_is_used (thd=0x5a2c2c000c58, table=0x32161c0989b8, function=HA_EXTRA_PREPARE_FOR_FORCED_CLOSE) at /data/Server/10.6B1/sql/sql_base.cc:1344
|
#3 0x0000563c726730cb in close_cached_tables (thd=thd@entry=0x5a2c2c000c58, tables=tables@entry=0x0, wait_for_refresh=wait_for_refresh@entry=true, timeout=86400) at /data/Server/10.6B1/sql/sql_base.cc:391
|
#4 0x0000563c727fa671 in reload_acl_and_cache (thd=thd@entry=0x5a2c2c000c58, options=4, tables=tables@entry=0x0, write_to_binlog=write_to_binlog@entry=0x4ddb490dc0f0) at /data/Server/10.6B1/sql/sql_reload.cc:340
|
#5 0x0000563c726dec2d in mysql_execute_command (thd=0x5a2c2c000c58, is_called_from_prepared_stmt=<optimized out>) at /data/Server/10.6B1/sql/sql_parse.cc:5475
|
#6 0x0000563c726ce5c6 in mysql_parse (rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>, thd=0x5a2c2c000c58) at /data/Server/10.6B1/sql/sql_parse.cc:8036
|
…
|
(rr) continue
|
Continuing.
|
2023-06-08 12:49:04 13 [Note] InnoDB: Deleting the meta-data file './test/table100_innodb_int_autoinc.cfg'
|
2023-06-08 12:49:04 13 [Note] InnoDB: Resuming purge
|
[Switching to Thread 985824.985912]
|
|
Thread 1 hit Breakpoint 1, MDL_context::acquire_lock (this=this@entry=0x563c7593c810, mdl_request=mdl_request@entry=0x7fa3f97149e0, lock_wait_timeout=86400) at /data/Server/10.6B1/sql/mdl.cc:2276
|
2276 {
|
(rr) backtrace
|
#0 MDL_context::acquire_lock (this=this@entry=0x563c7593c810, mdl_request=mdl_request@entry=0x7fa3f97149e0, lock_wait_timeout=86400) at /data/Server/10.6B1/sql/mdl.cc:2276
|
#1 0x0000563c72caaf14 in dict_acquire_mdl_shared<false, true> (table=<optimized out>, thd=thd@entry=0x563c7593c6b8, mdl=mdl@entry=0x563c759d7fa8, table_op=table_op@entry=DICT_TABLE_OP_NORMAL) at /data/Server/10.6B1/sql/mdl.h:610
|
#2 0x0000563c72cab4c4 in dict_table_open_on_id<true> (table_id=132, dict_locked=dict_locked@entry=false, table_op=table_op@entry=DICT_TABLE_OP_NORMAL, thd=0x563c7593c6b8, mdl=mdl@entry=0x563c759d7fa8) at /usr/include/c++/9/bits/atomic_base.h:539
|
#3 0x0000563c72c1ea17 in row_purge_parse_undo_rec (updated_extern=0x7fa3f9714ca1, thr=0x563c7599c120, undo_rec=0x563c759e71b8 "", node=0x563c759d7e28) at /data/Server/10.6B1/storage/innobase/row/row0purge.cc:1081
|
#4 row_purge (thr=<optimized out>, undo_rec=<optimized out>, node=0x563c759d7e28) at /data/Server/10.6B1/storage/innobase/row/row0purge.cc:1259
|
#5 row_purge_step (thr=thr@entry=0x563c7599c120) at /data/Server/10.6B1/storage/innobase/row/row0purge.cc:1325
|
#6 0x0000563c72bf12c0 in que_thr_step (thr=0x563c7599c120) at /data/Server/10.6B1/storage/innobase/que/que0que.cc:588
|
#7 que_run_threads_low (thr=<optimized out>) at /data/Server/10.6B1/storage/innobase/que/que0que.cc:644
|
#8 que_run_threads (thr=thr@entry=0x563c7599c120) at /data/Server/10.6B1/storage/innobase/que/que0que.cc:664
|
#9 0x0000563c72c391ad in srv_task_execute () at /data/Server/10.6B1/storage/innobase/srv/srv0srv.cc:1602
|
#10 purge_worker_callback () at /data/Server/10.6B1/storage/innobase/srv/srv0srv.cc:1850
|
#11 0x0000563c72d125b0 in tpool::task_group::execute (this=0x563c73f27340 <purge_task_group>, t=0x563c73ef46e0 <purge_worker_task>) at /data/Server/10.6B1/tpool/task_group.cc:55
|
#12 0x0000563c72d10477 in tpool::thread_pool_generic::worker_main (this=0x563c75951390, thread_var=0x563c75951c40) at /data/Server/10.6B1/tpool/tpool_generic.cc:580
|
…
|
(rr) thread apply 23 backtrace
|
|
Thread 23 (Thread 985824.987186):
|
#0 0x0000000070000002 in syscall_traced ()
|
…
|
#11 0x0000563c72d65375 in read (__nbytes=20451, __buf=0x5a2c2c045bb8, __fd=22) at /usr/include/x86_64-linux-gnu/bits/unistd.h:44
|
#12 my_read (Filedes=Filedes@entry=22, Buffer=Buffer@entry=0x5a2c2c045bb8 ' ' <repeats 200 times>..., Count=Count@entry=20451, MyFlags=MyFlags@entry=16) at /data/Server/10.6B1/mysys/my_read.c:51
|
#13 0x0000563c727acbde in inline_mysql_file_read (flags=16, count=20451, buffer=0x5a2c2c045bb8 ' ' <repeats 200 times>..., file=22) at /data/Server/10.6B1/include/mysql/psi/mysql_file.h:1146
|
#14 open_table_def (thd=thd@entry=0x5a2c2c000c58, share=share@entry=0x5a2c2c03c630, flags=flags@entry=11) at /data/Server/10.6B1/sql/table.cc:707
|
#15 0x0000563c7287b9a0 in tdc_acquire_share (thd=thd@entry=0x5a2c2c000c58, tl=tl@entry=0x5a2c2c05eda0, flags=flags@entry=3, out_table=out_table@entry=0x4ddb490db800) at /data/Server/10.6B1/sql/table_cache.cc:836
|
#16 0x0000563c72671f5f in open_table (thd=0x5a2c2c000c58, table_list=0x5a2c2c05eda0, ot_ctx=0x4ddb490dbe30) at /data/Server/10.6B1/sql/sql_base.cc:1886
|
#17 0x0000563c72672de4 in Locked_tables_list::reopen_tables (this=this@entry=0x5a2c2c004a48, thd=thd@entry=0x5a2c2c000c58, need_reopen=need_reopen@entry=false) at /data/Server/10.6B1/sql/sql_base.cc:2708
|
#18 0x0000563c72673102 in close_cached_tables (thd=thd@entry=0x5a2c2c000c58, tables=tables@entry=0x0, wait_for_refresh=wait_for_refresh@entry=true, timeout=86400) at /data/Server/10.6B1/sql/sql_base.cc:404
|
#19 0x0000563c727fa671 in reload_acl_and_cache (thd=thd@entry=0x5a2c2c000c58, options=4, tables=tables@entry=0x0, write_to_binlog=write_to_binlog@entry=0x4ddb490dc0f0) at /data/Server/10.6B1/sql/sql_reload.cc:340
|
#20 0x0000563c726dec2d in mysql_execute_command (thd=0x5a2c2c000c58, is_called_from_prepared_stmt=<optimized out>) at /data/Server/10.6B1/sql/sql_parse.cc:5475
|
#21 0x0000563c726ce5c6 in mysql_parse (rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>, thd=0x5a2c2c000c58) at /data/Server/10.6B1/sql/sql_parse.cc:8036
|
…
|
(rr) finish
|
Run till exit from #0 MDL_context::acquire_lock (this=this@entry=0x563c7593c810, mdl_request=mdl_request@entry=0x7fa3f97149e0, lock_wait_timeout=86400) at /data/Server/10.6B1/sql/mdl.cc:2276
|
2023-06-08 12:49:04 13 [Note] InnoDB: Sync to disk of `test`.`table100_innodb_int_autoinc` started.
|
2023-06-08 12:49:04 13 [Note] InnoDB: Stopping purge
|
2023-06-08 13:03:40 12 [Warning] Aborted connection 12 to db: 'mysql' user: 'root' host: 'localhost' (Got an error reading communication packets)
|
[New Thread 985824.987290]
|
[New Thread 985824.1153006]
|
|
Thread 24 received signal SIGABRT, Aborted.
|
I am not sure if it is the cause of the hang, but resuming purge before UNLOCK TABLES was executed, in the middle of executing a single FLUSH TABLES that affects multiple tables, seems to be a potential source of corruption because a user could try to copy all InnoDB data files after FLUSH TABLES completed, and in this case at least one data file would be unlocked prematurely during the execution of FLUSH TABLES:
2023-06-08 12:49:04 13 [Note] InnoDB: Resuming purge
|
This was invoked by the following:
|
10.6 f569e06e03a7efa6050258a8d167de0aaa4e124c
|
#0 0x0000563c725aa47c in row_quiesce_table_complete (table=0x32161c058458, trx=trx@entry=0x30ed3e522680) at /data/Server/10.6B1/storage/innobase/include/ut0ut.h:327
|
#1 0x0000563c72b7c8a2 in ha_innobase::external_lock (this=0x5a2c2c0647f0, thd=0x5a2c2c000c58, lock_type=2) at /data/Server/10.6B1/storage/innobase/handler/ha_innodb.cc:16095
|
#2 0x0000563c72902618 in handler::ha_external_lock (this=0x5a2c2c0647f0, thd=thd@entry=0x5a2c2c000c58, lock_type=lock_type@entry=2) at /data/Server/10.6B1/sql/handler.cc:7128
|
#3 0x0000563c72a02eff in handler::ha_external_unlock (thd=0x5a2c2c000c58, this=<optimized out>) at /data/Server/10.6B1/sql/handler.h:3549
|
#4 unlock_external (thd=thd@entry=0x5a2c2c000c58, table=0x5a2c2c0108c0, count=<optimized out>) at /data/Server/10.6B1/sql/lock.cc:730
|
#5 0x0000563c72a03080 in mysql_unlock_tables (thd=0x5a2c2c000c58, sql_lock=0x5a2c2c0108a0, free_lock=free_lock@entry=false) at /data/Server/10.6B1/sql/lock.cc:435
|
#6 0x0000563c72a03ee2 in mysql_unlock_some_tables (thd=<optimized out>, table=table@entry=0x4ddb490dbc78, count=count@entry=1, flag=flag@entry=0) at /data/Server/10.6B1/sql/lock.cc:465
|
#7 0x0000563c72a03f71 in mysql_lock_remove (thd=thd@entry=0x5a2c2c000c58, locked=0x32161c072478, table=<optimized out>, table@entry=0x32161c64b9c8) at /data/Server/10.6B1/sql/lock.cc:564
|
#8 0x0000563c7267149f in close_all_tables_for_name (thd=thd@entry=0x5a2c2c000c58, share=<optimized out>, extra=extra@entry=HA_EXTRA_NOT_USED, skip_table=skip_table@entry=0x0) at /data/Server/10.6B1/sql/sql_base.cc:769
|
#9 0x0000563c726730e5 in close_cached_tables (thd=thd@entry=0x5a2c2c000c58, tables=tables@entry=0x0, wait_for_refresh=wait_for_refresh@entry=true, timeout=86400) at /data/Server/10.6B1/sql/sql_base.cc:397
|
#10 0x0000563c727fa671 in reload_acl_and_cache (thd=thd@entry=0x5a2c2c000c58, options=4, tables=tables@entry=0x0, write_to_binlog=write_to_binlog@entry=0x4ddb490dc0f0) at /data/Server/10.6B1/sql/sql_reload.cc:340
|
#11 0x0000563c726dec2d in mysql_execute_command (thd=0x5a2c2c000c58, is_called_from_prepared_stmt=<optimized out>) at /data/Server/10.6B1/sql/sql_parse.cc:5475
|
#12 0x0000563c726ce5c6 in mysql_parse (rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>, thd=0x5a2c2c000c58) at /data/Server/10.6B1/sql/sql_parse.cc:8036
|
Some code around the call to close_all_tables_for_name() was refactored in a commit that is part of MDEV-5336.
The original MySQL WL#5522 design introduced the syntax FLUSH TABLES…FOR EXPORT, but it turns out that a plain FLUSH TABLES would do exactly the same, because InnoDB is interpreting the lock mode F_RDLCK as FOR EXPORT, and lock_external() would use lock_mode=F_RDLCK in the basic case.
|