Details
Description
Note: this JIRA was formerly titled rpl.rpl_parallel, binlog_encryption.rpl_parallel fails in buildbot with timeout in include.
A replication parallel worker thread can deadlock with another
connection running SHOW SLAVE STATUS. That is, if the replication
worker thread is in do_gco_wait() and is killed, it will already
hold the LOCK_parallel_entry, and during error reporting, try to
grab the err_lock. SHOW SLAVE STATUS, however, grabs these locks in
reverse order. It will initially grab the err_lock, and then try to
grab LOCK_parallel_entry. This leads to a deadlock when both threads
have grabbed their first lock without the second.
This lead to test errors in rpl.rpl_parallel and binlog_encryption.rpl_parallel.
http://buildbot.askmonty.org/buildbot/builders/work-amd64-valgrind/builds/9240/steps/test/logs/stdio
rpl.rpl_parallel 'mix,xtradb' w2 [ fail ]
|
Test ended at 2016-08-23 22:06:29
|
|
CURRENT_TEST: rpl.rpl_parallel
|
mysqltest: In included file "./include/wait_for_slave_param.inc":
|
included from ./include/wait_for_slave_sql_to_start.inc at line 32:
|
included from ./include/wait_for_slave_to_start.inc at line 27:
|
included from ./include/start_slave.inc at line 35:
|
included from /mnt/data/buildot/maria-slave/work-opensuse-amd64/build/mysql-test/suite/rpl/t/rpl_parallel.test at line 355:
|
At line 115: Timeout in include/wait_for_slave_param.inc
|
|
The result from queries just before the failure was:
|
< snip >
|
'group_commit_waiting_for_prior SIGNAL slave_queued3',
|
''))
|
master-bin.000002 1417 Xid 1 1444 COMMIT /* xid=370 */
|
master-bin.000002 1444 Gtid 1 1482 BEGIN GTID 0-1-15
|
master-bin.000002 1482 Query 1 1571 use `test`; INSERT INTO t2 VALUES (20)
|
master-bin.000002 1571 Query 1 1660 use `test`; INSERT INTO t1 VALUES (20)
|
master-bin.000002 1660 Query 1 1749 use `test`; INSERT INTO t2 VALUES (21)
|
master-bin.000002 1749 Query 1 1842 use `test`; INSERT INTO t3 VALUES (20, 20)
|
master-bin.000002 1842 Xid 1 1869 COMMIT /* xid=433 */
|
master-bin.000002 1869 Gtid 1 1907 BEGIN GTID 0-1-16
|
master-bin.000002 1907 Query 1 1999 use `test`; INSERT INTO t3 VALUES(21, 21)
|
master-bin.000002 1999 Xid 1 2026 COMMIT /* xid=438 */
|
master-bin.000002 2026 Gtid 1 2064 BEGIN GTID 0-1-17
|
master-bin.000002 2064 Query 1 2156 use `test`; INSERT INTO t3 VALUES(22, 22)
|
master-bin.000002 2156 Xid 1 2183 COMMIT /* xid=439 */
|
|
**** SHOW RELAYLOG EVENTS on server_1 ****
|
relaylog_name = 'No such row'
|
SHOW RELAYLOG EVENTS IN 'No such row';
|
Log_name Pos Event_type Server_id End_log_pos Info
|
|
More results from queries before failure can be found in /mnt/data/buildot/maria-slave/work-opensuse-amd64/build/mysql-test/var/2/log/rpl_parallel.log
|
Happens not very often, but quite regularly.
Attachments
Issue Links
- is duplicated by
-
MDEV-14277 binlog_encryption.rpl_parallel failed in buildbot, timeout, assertion `count > 0' failed
- Closed
-
MDEV-25450 rpl.rpl_parallel_gco_wait_kill failed in bb with timeout
- Closed
-
MDEV-31894 Optimize Check for Parallel Replication Worker Thread Idleness
- Closed
- relates to
-
MDEV-24086 binlog_encryption.rpl_parallel_stop_on_con_kill failed in buildbot with timeout in wait_for_slave_param
- Open
-
MDEV-7069 Fix buildbot failures in main server trees
- Stalled