|
Got one as well:
Thread 3 (Thread 0x7fc2dd490700 (LWP 20035)):
|
#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:215
|
#1 0x0000000000eed6fd in safe_cond_timedwait (cond=0x7fc30c1ebdf8, mp=0x7fc30c1ebd10, abstime=0x7fc2dd48e4b0, file=0xf8bbe0 "10.1/include/mysql/psi/mysql_thread.h", line=1202) at 10.1/mysys/thr_mutex.c:547
|
#2 0x00000000005dcfca in inline_mysql_cond_timedwait (that=0x7fc30c1ebdf8, mutex=0x7fc30c1ebd10, abstime=0x7fc2dd48e4b0, src_file=0xf8c2f0 "10.1/sql/slave.cc", src_line=791) at 10.1/include/mysql/psi/mysql_thread.h:1202
|
#3 0x00000000005debb2 in terminate_slave_thread (thd=0x7fc2dc0e3070, term_lock=0x7fc30c1ebd10, term_cond=0x7fc30c1ebdf8, slave_running=0x7fc30c1ec7ec, skip_lock=true) at 10.1/sql/slave.cc:791
|
#4 0x00000000005de554 in terminate_slave_threads (mi=0x7fc30c1e8000, thread_mask=3, skip_lock=true) at 10.1/sql/slave.cc:636
|
#5 0x00000000006b8fb7 in stop_slave (thd=0x7fc2eecf5070, mi=0x7fc30c1e8000, net_report=true) at 10.1/sql/sql_repl.cc:3186
|
#6 0x0000000000686841 in mysql_execute_command (thd=0x7fc2eecf5070) at 10.1/sql/sql_parse.cc:3530
|
#7 0x000000000069218b in mysql_parse (thd=0x7fc2eecf5070, rawbuf=0x7fc2e6ec5088 "STOP SLAVE SQL_THREAD, IO_THREAD", length=32, parser_state=0x7fc2dd48f110) at 10.1/sql/sql_parse.cc:7165
|
#8 0x0000000000681199 in dispatch_command (command=COM_QUERY, thd=0x7fc2eecf5070, packet=0x7fc2eec92071 "STOP SLAVE SQL_THREAD, IO_THREAD", packet_length=32) at 10.1/sql/sql_parse.cc:1462
|
#9 0x000000000067ff63 in do_command (thd=0x7fc2eecf5070) at 10.1/sql/sql_parse.cc:1090
|
#10 0x00000000007ad7a0 in do_handle_one_connection (thd_arg=0x7fc2eecf5070) at 10.1/sql/sql_connect.cc:1347
|
#11 0x00000000007ad4e5 in handle_one_connection (arg=0x7fc2eecf5070) at 10.1/sql/sql_connect.cc:1258
|
#12 0x00007fc30dab6e9a in start_thread (arg=0x7fc2dd490700) at pthread_create.c:308
|
#13 0x00007fc30d1e78bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
|
#14 0x0000000000000000 in ?? ()
|
Thread 5 (Thread 0x7fc2dde8b700 (LWP 20028)):
|
#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:215
|
#1 0x0000000000eed6fd in safe_cond_timedwait (cond=0x7fc2eefc8218, mp=0x7fc2eefc8168, abstime=0x7fc2dde87aa0, file=0xfd2760 "10.1/include/mysql/psi/mysql_thread.h", line=1202) at 10.1/mysys/thr_mutex.c:547
|
#2 0x00000000007b6784 in inline_mysql_cond_timedwait (that=0x7fc2eefc8218, mutex=0x7fc2eefc8168, abstime=0x7fc2dde87aa0, src_file=0xfd29c8 "10.1/sql/mdl.cc", src_line=1066) at 10.1/include/mysql/psi/mysql_thread.h:1202
|
#3 0x00000000007b7902 in MDL_wait::timed_wait (this=0x7fc2eefc8168, owner=0x7fc2eefc8120, abs_timeout=0x7fc2dde87aa0, set_status_on_timeout=false, wait_state_name=0x170e6f0) at 10.1/sql/mdl.cc:1066
|
#4 0x00000000007b92ab in MDL_context::acquire_lock (this=0x7fc2eefc8168, mdl_request=0x7fc2dde87b70, lock_wait_timeout=31536000) at 10.1/sql/mdl.cc:2039
|
#5 0x00000000007b9a2a in MDL_context::upgrade_shared_lock (this=0x7fc2eefc8168, mdl_ticket=0x7fc30c01f2c0, new_type=MDL_EXCLUSIVE, lock_wait_timeout=31536000) at 10.1/sql/mdl.cc:2232
|
#6 0x000000000073f186 in mysql_inplace_alter_table (thd=0x7fc2eefc8070, table_list=0x7fc2eefce128, table=0x7fc3072dcc70, altered_table=0x7fc2eefd5070, ha_alter_info=0x7fc2dde87eb0, inplace_supported=HA_ALTER_INPLACE_NO_LOCK_AFTER_PREPARE, target_mdl_request=0x7fc2dde89460, alter_ctx=0x7fc2dde87fd0) at 10.1/sql/sql_table.cc:6944
|
#7 0x0000000000743c9b in mysql_alter_table (thd=0x7fc2eefc8070, new_db=0x7fc2eefce708 "test", new_name=0x0, create_info=0x7fc2dde896a0, table_list=0x7fc2eefce128, alter_info=0x7fc2dde897d0, order_num=0, order=0x0, ignore=false) at 10.1/sql/sql_table.cc:8821
|
#8 0x00000000007b2325 in Sql_cmd_alter_table::execute (this=0x7fc2eefce8a8, thd=0x7fc2eefc8070) at 10.1/sql/sql_alter.cc:325
|
#9 0x000000000068de51 in mysql_execute_command (thd=0x7fc2eefc8070) at 10.1/sql/sql_parse.cc:5558
|
#10 0x000000000069218b in mysql_parse (thd=0x7fc2eefc8070, rawbuf=0x7fc2e6d4d1b9 "ALTER TABLE table11_innodb ADD COLUMN `default` INTEGER DEFAULT 0", length=65, parser_state=0x7fc2dde8a9f0) at 10.1/sql/sql_parse.cc:7165
|
#11 0x000000000099f402 in Query_log_event::do_apply_event (this=0x7fc2e6c1f230, rgi=0x7fc2e6df0800, query_arg=0x7fc2e6d4d1b9 "ALTER TABLE table11_innodb ADD COLUMN `default` INTEGER DEFAULT 0", q_len_arg=65) at 10.1/sql/log_event.cc:4287
|
#12 0x000000000099e6ed in Query_log_event::do_apply_event (this=0x7fc2e6c1f230, rgi=0x7fc2e6df0800) at 10.1/sql/log_event.cc:4013
|
#13 0x00000000005f0654 in Log_event::apply_event (this=0x7fc2e6c1f230, rgi=0x7fc2e6df0800) at 10.1/sql/log_event.h:1347
|
#14 0x00000000005e64df in apply_event_and_update_pos (ev=0x7fc2e6c1f230, thd=0x7fc2eefc8070, rgi=0x7fc2e6df0800, rpt=0x7fc2dc008380) at 10.1/sql/slave.cc:3274
|
#15 0x00000000008179e7 in rpt_handle_event (qev=0x7fc2e6db7a70, rpt=0x7fc2dc008380) at 10.1/sql/rpl_parallel.cc:49
|
#16 0x0000000000819b6c in handle_rpl_parallel_thread (arg=0x7fc2dc008380) at 10.1/sql/rpl_parallel.cc:942
|
#17 0x00007fc30dab6e9a in start_thread (arg=0x7fc2dde8b700) at pthread_create.c:308
|
#18 0x00007fc30d1e78bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
|
#19 0x0000000000000000 in ?? ()
|
Thread 6 (Thread 0x7fc2ddf1d700 (LWP 20027)):
|
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
|
#1 0x0000000000eed3f3 in safe_cond_wait (cond=0x7fc2e6df10e8, mp=0x7fc2e6df1038, file=0xf9c780 "10.1/include/mysql/psi/mysql_thread.h", line=1165) at 10.1/mysys/thr_mutex.c:493
|
#2 0x000000000063f9ce in inline_mysql_cond_wait (that=0x7fc2e6df10e8, mutex=0x7fc2e6df1038, src_file=0xf9cfe8 "10.1/sql/sql_class.cc", src_line=6853) at 10.1/include/mysql/psi/mysql_thread.h:1165
|
#3 0x000000000065004e in wait_for_commit::wait_for_prior_commit2 (this=0x7fc2e6df1038, thd=0x7fc2e6c95070) at 10.1/sql/sql_class.cc:6853
|
#4 0x000000000062fbc3 in wait_for_commit::wait_for_prior_commit (this=0x7fc2e6df1038, thd=0x7fc2e6c95070) at 10.1/sql/sql_class.h:1780
|
#5 0x000000000062fed2 in THD::wait_for_prior_commit (this=0x7fc2e6c95070) at 10.1/sql/sql_class.h:3781
|
#6 0x00000000008af50c in ha_commit_one_phase (thd=0x7fc2e6c95070, all=true) at 10.1/sql/handler.cc:1536
|
#7 0x00000000008aee1d in ha_commit_trans (thd=0x7fc2e6c95070, all=true) at 10.1/sql/handler.cc:1406
|
#8 0x00000000007c14e9 in trans_commit (thd=0x7fc2e6c95070) at 10.1/sql/transaction.cc:236
|
#9 0x00000000009a699d in Xid_log_event::do_apply_event (this=0x7fc2dc024570, rgi=0x7fc2e6df1000) at 10.1/sql/log_event.cc:7393
|
#10 0x00000000005f0654 in Log_event::apply_event (this=0x7fc2dc024570, rgi=0x7fc2e6df1000) at 10.1/sql/log_event.h:1347
|
#11 0x00000000005e64df in apply_event_and_update_pos (ev=0x7fc2dc024570, thd=0x7fc2e6c95070, rgi=0x7fc2e6df1000, rpt=0x7fc2dc0081b0) at 10.1/sql/slave.cc:3274
|
#12 0x00000000008179e7 in rpt_handle_event (qev=0x7fc2e6db8970, rpt=0x7fc2dc0081b0) at 10.1/sql/rpl_parallel.cc:49
|
#13 0x0000000000819b6c in handle_rpl_parallel_thread (arg=0x7fc2dc0081b0) at 10.1/sql/rpl_parallel.cc:942
|
#14 0x00007fc30dab6e9a in start_thread (arg=0x7fc2ddf1d700) at pthread_create.c:308
|
#15 0x00007fc30d1e78bd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
|
#16 0x0000000000000000 in ?? ()
|
All threads' bt attached as all_threads_bt.
10.1, git commit 53382ac128.
Slave command line:
00:00:59 /home/elenst/git/10.1/sql/mysqld --no-defaults --basedir=/home/elenst/git/10.1 --datadir=/home/elenst/test_results/analyze_slave/data --lc-messages-dir=/home/elenst/git/10.1/sql/share --character-sets-dir=/home/elenst/git/10.1/sql/share/charsets --tmpdir=/home/elenst/test_results/analyze_slave/tmp --core-file --max-allowed-packet=128Mb --port=10732 --socket=/home/elenst/test_results/analyze_slave/mysql.sock --pid-file=/home/elenst/test_results/analyze_slave/mysql.pid --general-log --general-log-file=/home/elenst/test_results/analyze_slave/mysql.log --server_id=2 --report-host=127.0.0.1 --report_port=10732 --log_output=FILE --slow_query_log --log_bin_trust_function_creators=1 --query_cache_size=64M --slave-skip-errors=1054,1317,1049,1305,1539,1505 --slave-parallel-mode=optimistic --binlog_commit_wait_count=20 --binlog_commit_wait_usec=1000000 --slave-parallel-threads=20 --max-binlog-size=4096 --innodb --sql-mode=no_engine_substitution
|
End of slave error log:
150502 21:33:31 [Note] Slave I/O thread: connected to master 'root@127.0.0.1:10730',replication starts at GTID position '0-1-65'
|
150502 21:33:31 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000004' at position 1755, relay log './mysql-relay-bin.000001' position: 4; GTID position '0-1-65'
|
150502 21:34:02 [Note] Error reading relay log event: slave SQL thread was killed
|
150502 21:34:02 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
|
150502 21:34:02 [Note] Slave I/O thread killed while reading event
|
150502 21:34:02 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.000012', position 3799; GTID position 0-1-241
|
150502 21:34:22 [Note] Slave I/O thread: connected to master 'root@127.0.0.1:10730',replication starts at GTID position '0-1-203'
|
150502 21:34:22 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000011' at position 534, relay log './mysql-relay-bin.000001' position: 4; GTID position '0-1-203'
|
150502 21:34:32 [Note] Error reading relay log event: slave SQL thread was killed
|
150503 5:35:08 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
|
150503 5:35:08 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.000020' at position 4146; GTID position '0-1-399'
|
150503 5:35:08 [ERROR] Slave I/O: error reconnecting to master 'root@127.0.0.1:10730' - retry-time: 1 retries: 86400 message: Can't connect to MySQL server on '127.0.0.1' (111 "Connection refused"), Internal MariaDB error code: 2003
|
150503 15:25:18 [Note] /home/elenst/git/10.1/sql/mysqld: Normal shutdown
|
|
150503 15:25:18 [Note] Event Scheduler: Purging the queue. 0 events
|
150503 15:25:19 [Note] Slave I/O thread killed during or after a reconnect done to recover from failed read
|
150503 15:25:19 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.000020', position 4146; GTID position 0-1-399
|
Test command line:
perl /home/elenst/bzr/randgen-mariadb-patches/runall-new.pl --no-mask --seed=1430530643 --threads=16 --duration=400 --queries=100M --reporters=QueryTimeout,Backtrace,ErrorLog,Deadlock,ReplicationThreadRestarter --redefine=conf/mariadb/general-workarounds.yy --redefine=conf/mariadb/10.0-features-redefine.yy --mysqld=--log_output=FILE --mysqld=--slow_query_log --mysqld=--log_bin_trust_function_creators=1 --mysqld=--query_cache_size=64M --engine=InnoDB --views --grammar=conf/runtime/metadata_stability.yy --gendata=conf/runtime/metadata_stability.zz --rpl_mode=statement --mysqld=--slave-skip-errors=1054,1317,1049,1305,1539,1505 --mysqld=--slave-parallel-mode=optimistic --mysqld=--binlog_commit_wait_count=20 --mysqld=--binlog_commit_wait_usec=1000000 --mysqld=--slave-parallel-threads=20 --use-gtid=current_pos --mysqld=--max-binlog-size=4096 --mtr-build-thread=73 --basedir1=/home/elenst/git/10.1 --vardir1=/home/elenst/test_results/analyze
|
|
|
Daniel, I agree that it is odd that there are two threads in
handle_slave_sql(). This indicates that two SQL threads are running. I do
not see how that is possible without using multi-source replication?
There are also two sets of parallel replication worker threads.
Anyway, this case seems unrelated to show slave status. What we see is 10
worker threads idle, and 10 worker threads in "Waiting for prior transaction
to start commit before starting next transaction".
(In the second case, there is an additional set of 5 idle and 15 waiting
worker threads).
This indicates that the parallel replication worker threads have become
stuck, independent of anything else going on. This causes STOP SLAVE to
block waiting for the stuck worker threads, and in turn SHOW STATUS to
block, but that seems unrelated. All the worker threads are idle, not
executing any transactions, and all waiting (incorrectly).
There were several bugs fixes recently where this could happen. Basically,
some internal lists of group_commit_orderer objects gets corrupt, and a
wakeup is missed, causing all threads to get stuck waiting. These should all
be fixed in 10.0.18, which I believe is due for release shortly?
We can think about trying to track down why it hangs, but I would prefer to
first see if 10.0.18 fixes it. Tracking down these hangs can be very hard,
and would require at least very carefully collected binlog around the
point of hang, SHOW FULL PROCESSLIST, and error log, all painstakingly
correct, complete, and matching each other.
When this condition happens, it is possible to recover by using KILL queries
against the threads stuck in "Waiting for prior transaction to start commit
before starting next transaction". Note that it may be necessary to kill
multiple times, as they have to be killed in the correct order, which is
unknown to the user.
|