The maximum runtime (300s) of the RQG worker threads gets exceeded even though
the worker threads would exit in case this limit is already exceeded and the next query would have to be generated. This lets assume that the worker threads are since some long
timespan within the execution of a statement which has not finished yet.
The processlist shows the following
Id User Host db Command Time State Info Progress
1 system user NULL Daemon NULL InnoDB purge worker NULL 0.000
2 system user NULL Daemon NULL InnoDB purge coordinator NULL 0.000
3 system user NULL Daemon NULL InnoDB purge worker NULL 0.000
4 system user NULL Daemon NULL InnoDB purge worker NULL 0.000
5 system user NULL Daemon NULL InnoDB shutdown handler NULL 0.000
8 root localhost:56266 mysql Sleep 671 NULL 0.000
16 root localhost:56332 test Query 659 preparing for alter table ALTER TABLE t_09 DROP INDEX `Marvão_ftidx1`, ADD FULLTEXT INDEX `MarVão_ftidx1` ( col_text ), LOCK 0.000
17 root localhost:56334 test Query 655 setup ALTER TABLE t_06 ADD FULLTEXT INDEX `Marvão_ftidx1` ( col_text ), LOCK = SHARED /* E_R Thread2 QNO 0.000
18 root localhost:56336 test Query 643 setup ALTER TABLE t_07 ADD FULLTEXT INDEX `MarvãO_ftidx1` ( col_text ), LOCK = SHARED /* E_R Thread3 QNO 0.000
19 root localhost:56340 test Query 659 Unlocking tables ALTER TABLE t_02 DROP INDEX `Marvão_ftidx1`, ADD FULLTEXT INDEX `Marvão_ftidx1` ( col_text ), LOCK 0.000
20 root localhost:56342 test Query 657 Unlocking tables ALTER TABLE t_04 DROP INDEX `Marvão_ftidx1`, ADD FULLTEXT INDEX `Marvão_ftidx1` ( col_text ), LOCK 0.000
21 root localhost:56344 test Query 657 Unlocking tables ALTER TABLE t_05 DROP INDEX `MarVão_ftidx1`, ADD FULLTEXT INDEX `Marvão_ftidx1` ( col_text ), LOCK 0.000
23 root localhost:56348 test Query 659 committing alter table to storage engine ALTER TABLE t_03 DROP INDEX `Marvão_ftidx1`, ADD FULLTEXT INDEX `Marvão_ftidx1` ( col_text ), LOCK 0.000
24 root localhost:56350 test Query 653 Unlocking tables ALTER TABLE t_08 DROP INDEX `MarvãO_ftidx1`, ADD FULLTEXT INDEX `MArvão_ftidx1` ( col_text ), LOCK 0.000
25 root localhost:56352 test Query 659 preparing for alter table ALTER TABLE t_10 DROP INDEX `MArvão_ftidx1`, ADD FULLTEXT INDEX `MarVão_ftidx1` ( col_text ), LOCK 0.000
26 root localhost:56354 test Query 659 committing alter table to storage engine ALTER TABLE t_01 DROP INDEX `MArvão_ftidx1`, ADD FULLTEXT INDEX `Marvão_ftidx1` ( col_text ), LOCK 0.000
114 root localhost NULL Query 0 init SHOW PROCESSLIST 0.000
RQG declares "deadlock" in that case.
The problem reported here might be a sibling of already reported bugs where
server do not react on "mysqladmin ... shutdown" or SIGTERM.
Roughly the same effects was observed on 10.3 too.
In the moment I do not know if the Storage engine InnoDB or the server is guilty.
Matthias Leich
added a comment -
How to replay the problem?
1. git clone https://github.com/mleich1/rqg --branch experimental RQG
2. cd RQG
3. tar xvzf <path_to>/MDEV-19529.tgz
Unpacking of MDEV-19529.sh, MDEV-19529.yy and some patched RQG reporter lib/GenTest/Reporter/Deadlock1.pm
4. ./MDEV-19529.sh <path to your MariaDB binaries like /work/10.1/bld_debug>
At the end of the shellscript above is a "tail -f ...." which shows you the RQG protocol "storage/SINGLE_RQG/rqg.log" on screen.
In case you see a final
# 2019-09-12T19:15:42 [83031] VERDICT: replay <========
# 2019-09-12T19:15:42 [83031] DEBUG: 'du -sk /dev/shm/vardir/SINGLE_RQG | cut -f1' exited with value 0
# 2019-09-12T19:15:42 [83031] INFO: vardir_size : 1086236
than the RQG run has replayed the problem.
5. After the run has finished "storage/SINGLE_RQG/" will contain an "archive.tgz" containing all remainings of the data directory.
This archive should also contain a core file.
Background:
As soon as the reporter means to have identified a deadlock/server freeze
The criterion which should kick in is: Two threads wait already longer than 120s that their command finishes.
the reporter sends the signals SIGHUP and SIGSEGV to the DB server process.
6. storage/SINGLE_RQG/rqg.log contains already InnoDB Monitor output and some backtrace.
In case you want to inspect the core with GDB on your own than
1. cd storage/SINGLE_RQG/ ;tar xvzf archive.tgz
2. gdb -c dev/shm/vardir/SINGLE_RQG/1/data/core <path_to>/mysqld
The following stack trace has the main root cause for the hang of fts index. It looks like design flaw and there is no common table between the
threads.
Thread 28 (Thread 0x7f6c14e22700 (LWP 83125)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000055c9f65d1560 in os_cond_wait (cond=0x7f6be68f83c8, fast_mutex=0x7f6be68f8390) at /home/mleich/work/10.1/storage/xtradb/os/os0sync.cc:193
#2 0x000055c9f65d18d6 in os_event_wait_low (event=0x7f6be68f8390, reset_sig_count=1) at /home/mleich/work/10.1/storage/xtradb/os/os0sync.cc:474
#3 0x000055c9f67de251 in fts_optimize_remove_table (table=0x7f6be64b70f8) at /home/mleich/work/10.1/storage/xtradb/fts/fts0opt.cc:2667
#4 0x000055c9f67cae98 in fts_drop_index (table=0x7f6be64b70f8, index=0x7f6be6527a78, trx=0x7f6be68b7d78)
at /home/mleich/work/10.1/storage/xtradb/fts/fts0fts.cc:853
#5 0x000055c9f655fa97 in commit_cache_norebuild (ctx=0x7f6be6825b48, table=0x7f6be64bfc08, trx=0x7f6be68b7d78)
at /home/mleich/work/10.1/storage/xtradb/handler/handler0alter.cc:5645
#6 0x000055c9f655cb78 in ha_innobase::commit_inplace_alter_table (this=0x7f6be8088820, altered_table=0x7f6be682d008, ha_alter_info=0x7f6c14e1e790,
commit=true) at /home/mleich/work/10.1/storage/xtradb/handler/handler0alter.cc:6199
#7 0x000055c9f61f4573 in handler::ha_commit_inplace_alter_table (this=0x7f6be8088820, altered_table=0x7f6be682d008, ha_alter_info=0x7f6c14e1e790,
commit=true) at /home/mleich/work/10.1/sql/handler.cc:4292
#8 0x000055c9f606bb5e in mysql_inplace_alter_table (thd=0x7f6bf0e2b008, table_list=0x7f6be68241f0, table=0x7f6be64bfc08, altered_table=0x7f6be682d008,
at /home/mleich/work/10.1/storage/xtradb/include/sync0sync.ic:415
#7 0x000055c9f6763041 in dict_table_close (table=0x7f6be85fcb78, dict_locked=0, try_drop=0)
at /home/mleich/work/10.1/storage/xtradb/dict/dict0dict.cc:568
#8 0x000055c9f67dde6c in fts_optimize_table_bk (slot=0x7f6bec814140) at /home/mleich/work/10.1/storage/xtradb/fts/fts0opt.cc:2486
#9 0x000055c9f67de92f in fts_optimize_thread (arg=0x7f6c14445af8) at /home/mleich/work/10.1/storage/xtradb/fts/fts0opt.cc:2892
#10 0x00007f6c15b2d6da in start_thread (arg=0x7f6bee3fa700) at pthread_create.c:456
#11 0x00007f6c151d6d7f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
t22 thread tries to close the table for t_02. Pasting the stack trace for future references.
Thirunarayanan Balathandayuthapani
added a comment - The following stack trace has the main root cause for the hang of fts index. It looks like design flaw and there is no common table between the
threads.
Thread 28 (Thread 0x7f6c14e22700 (LWP 83125)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000055c9f65d1560 in os_cond_wait (cond=0x7f6be68f83c8, fast_mutex=0x7f6be68f8390) at /home/mleich/work/10.1/storage/xtradb/os/os0sync.cc:193
#2 0x000055c9f65d18d6 in os_event_wait_low (event=0x7f6be68f8390, reset_sig_count=1) at /home/mleich/work/10.1/storage/xtradb/os/os0sync.cc:474
#3 0x000055c9f67de251 in fts_optimize_remove_table (table=0x7f6be64b70f8) at /home/mleich/work/10.1/storage/xtradb/fts/fts0opt.cc:2667
#4 0x000055c9f67cae98 in fts_drop_index (table=0x7f6be64b70f8, index=0x7f6be6527a78, trx=0x7f6be68b7d78)
at /home/mleich/work/10.1/storage/xtradb/fts/fts0fts.cc:853
#5 0x000055c9f655fa97 in commit_cache_norebuild (ctx=0x7f6be6825b48, table=0x7f6be64bfc08, trx=0x7f6be68b7d78)
at /home/mleich/work/10.1/storage/xtradb/handler/handler0alter.cc:5645
#6 0x000055c9f655cb78 in ha_innobase::commit_inplace_alter_table (this=0x7f6be8088820, altered_table=0x7f6be682d008, ha_alter_info=0x7f6c14e1e790,
commit=true) at /home/mleich/work/10.1/storage/xtradb/handler/handler0alter.cc:6199
#7 0x000055c9f61f4573 in handler::ha_commit_inplace_alter_table (this=0x7f6be8088820, altered_table=0x7f6be682d008, ha_alter_info=0x7f6c14e1e790,
commit=true) at /home/mleich/work/10.1/sql/handler.cc:4292
#8 0x000055c9f606bb5e in mysql_inplace_alter_table (thd=0x7f6bf0e2b008, table_list=0x7f6be68241f0, table=0x7f6be64bfc08, altered_table=0x7f6be682d008,
ha_alter_info=0x7f6c14e1e790, inplace_supported=HA_ALTER_INPLACE_SHARED_LOCK_AFTER_PREPARE, target_mdl_request=0x7f6c14e1e800,
alter_ctx=0x7f6c14e1f3b0) at /home/mleich/work/10.1/sql/sql_table.cc:7209
#9 0x000055c9f6070710 in mysql_alter_table (thd=0x7f6bf0e2b008, new_db=0x7f6be68247e8 "test", new_name=0x0, create_info=0x7f6c14e1ffa0,
table_list=0x7f6be68241f0, alter_info=0x7f6c14e1ff10, order_num=0, order=0x0, ignore=false) at /home/mleich/work/10.1/sql/sql_table.cc:9077
#10 0x000055c9f60e3458 in Sql_cmd_alter_table::execute (this=0x7f6be68248f8, thd=0x7f6bf0e2b008) at /home/mleich/work/10.1/sql/sql_alter.cc:334
#11 0x000055c9f5faebcb in mysql_execute_command (thd=0x7f6bf0e2b008) at /home/mleich/work/10.1/sql/sql_parse.cc:5442
#12 0x000055c9f5fb37a1 in mysql_parse (thd=0x7f6bf0e2b008,
rawbuf=0x7f6be6824020 "ALTER TABLE t_01 DROP INDEX ftidx1, ADD FULLTEXT INDEX ftidx1 ( col_text ), LOCK = SHARED, ALGORITHM = INPLACE /* E_R Thread3 QNO 234 CON_ID 9 */", length=145, parser_state=0x7f6c14e21230) at /home/mleich/work/10.1/sql/sql_parse.cc:7209
#13 0x000055c9f5fa2953 in dispatch_command (command=COM_QUERY, thd=0x7f6bf0e2b008,
packet=0x7f6bf0e31009 "ALTER TABLE t_01 DROP INDEX ftidx1, ADD FULLTEXT INDEX ftidx1 ( col_text ), LOCK = SHARED, ALGORITHM = INPLACE /* E_R Thread3 QNO 234 CON_ID 9 */ ", packet_length=146) at /home/mleich/work/10.1/sql/sql_parse.cc:1499
#14 0x000055c9f5fa1711 in do_command (thd=0x7f6bf0e2b008) at /home/mleich/work/10.1/sql/sql_parse.cc:1131
#15 0x000055c9f60de893 in do_handle_one_connection (thd_arg=0x7f6bf0e2b008) at /home/mleich/work/10.1/sql/sql_connect.cc:1331
#16 0x000055c9f60de5c4 in handle_one_connection (arg=0x7f6bf0e2b008) at /home/mleich/work/10.1/sql/sql_connect.cc:1242
---Type <return> to continue, or q <return> to quit---
#17 0x00007f6c15b2d6da in start_thread (arg=0x7f6c14e22700) at pthread_create.c:456
#18 0x00007f6c151d6d7f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
Thread 22 (Thread 0x7f6bee3fa700 (LWP 83101)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000055c9f65d1560 in os_cond_wait (cond=0x7f6c144350b0, fast_mutex=0x7f6c14435078) at /home/mleich/work/10.1/storage/xtradb/os/os0sync.cc:193
#2 0x000055c9f65d18d6 in os_event_wait_low (event=0x7f6c14435078, reset_sig_count=409) at /home/mleich/work/10.1/storage/xtradb/os/os0sync.cc:474
#3 0x000055c9f669d2ee in sync_array_wait_event (arr=0x7f6c144bcb40, index=2) at /home/mleich/work/10.1/storage/xtradb/sync/sync0arr.cc:476
#4 0x000055c9f66a33e8 in mutex_spin_wait (_mutex=0x7f6c14435078, high_priority=false,
file_name=0x55c9f6b03bc0 "/home/mleich/work/10.1/storage/xtradb/dict/dict0dict.cc", line=568)
at /home/mleich/work/10.1/storage/xtradb/sync/sync0sync.cc:689
#5 0x000055c9f675ee5b in mutex_enter_func (mutex=0x7f6c14435078, file_name=0x55c9f6b03bc0 "/home/mleich/work/10.1/storage/xtradb/dict/dict0dict.cc",
line=568, priority=IB_DEFAULT_PRIO) at /home/mleich/work/10.1/storage/xtradb/include/sync0sync.ic:331
#6 0x000055c9f675efd0 in pfs_mutex_enter_func (mutex=0x7f6c14435078,
file_name=0x55c9f6b03bc0 "/home/mleich/work/10.1/storage/xtradb/dict/dict0dict.cc", line=568, priority=IB_DEFAULT_PRIO)
at /home/mleich/work/10.1/storage/xtradb/include/sync0sync.ic:415
#7 0x000055c9f6763041 in dict_table_close (table=0x7f6be85fcb78, dict_locked=0, try_drop=0)
at /home/mleich/work/10.1/storage/xtradb/dict/dict0dict.cc:568
#8 0x000055c9f67dde6c in fts_optimize_table_bk (slot=0x7f6bec814140) at /home/mleich/work/10.1/storage/xtradb/fts/fts0opt.cc:2486
#9 0x000055c9f67de92f in fts_optimize_thread (arg=0x7f6c14445af8) at /home/mleich/work/10.1/storage/xtradb/fts/fts0opt.cc:2892
#10 0x00007f6c15b2d6da in start_thread (arg=0x7f6bee3fa700) at pthread_create.c:456
#11 0x00007f6c151d6d7f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
t22 thread tries to close the table for t_02. Pasting the stack trace for future references.
When testing with RQG on MariaDB 10.1 + the patch mentioned above I got the
usual fraction of known open bugs.
Matthias Leich
added a comment - - edited
When testing with RQG on MariaDB 10.1 + the patch mentioned above I got the
usual fraction of known open bugs.
How to replay the problem?
1. git clone https://github.com/mleich1/rqg --branch experimental RQG
2. cd RQG
3. tar xvzf <path_to>/MDEV-19529.tgz
Unpacking of MDEV-19529.sh, MDEV-19529.yy and some patched RQG reporter lib/GenTest/Reporter/Deadlock1.pm
4. ./MDEV-19529.sh <path to your MariaDB binaries like /work/10.1/bld_debug>
At the end of the shellscript above is a "tail -f ...." which shows you the RQG protocol "storage/SINGLE_RQG/rqg.log" on screen.
In case you see a final
# 2019-09-12T19:15:42 [83031] VERDICT: replay <========
# 2019-09-12T19:15:42 [83031] DEBUG: 'du -sk /dev/shm/vardir/SINGLE_RQG | cut -f1' exited with value 0
# 2019-09-12T19:15:42 [83031] INFO: vardir_size : 1086236
than the RQG run has replayed the problem.
5. After the run has finished "storage/SINGLE_RQG/" will contain an "archive.tgz" containing all remainings of the data directory.
This archive should also contain a core file.
Background:
As soon as the reporter means to have identified a deadlock/server freeze
The criterion which should kick in is: Two threads wait already longer than 120s that their command finishes.
the reporter sends the signals SIGHUP and SIGSEGV to the DB server process.
6. storage/SINGLE_RQG/rqg.log contains already InnoDB Monitor output and some backtrace.
In case you want to inspect the core with GDB on your own than
1. cd storage/SINGLE_RQG/ ;tar xvzf archive.tgz
2. gdb -c dev/shm/vardir/SINGLE_RQG/1/data/core <path_to>/mysqld