[MDEV-7236] rpl.rpl_gtid_basic failed in buildbot with wait_condition timeout Created: 2014-11-30  Updated: 2014-12-02  Resolved: 2014-12-02

Status: Closed
Project: MariaDB Server
Component/s: Tests
Affects Version/s: 10.0
Fix Version/s: 10.0.16

Type: Bug Priority: Minor
Reporter: Elena Stepanova Assignee: Kristian Nielsen
Resolution: Fixed Votes: 0
Labels: buildbot, tests

Issue Links:
Relates
relates to MDEV-7069 Fix buildbot failures in main server ... Stalled

 Description   

Happened once so far (according to the cross-reference report):
http://buildbot.askmonty.org/buildbot/builders/bld-dan-release/builds/1867/steps/test/logs/stdio

rpl.rpl_gtid_basic 'innodb_plugin,stmt'  w3 [ fail ]
        Test ended at 2014-11-28 19:50:11
 
CURRENT_TEST: rpl.rpl_gtid_basic
mysqltest: In included file "./include/wait_for_slave_param.inc": 
included from ./include/wait_for_slave_sql_error.inc at line 41:
included from /opt/buildbot-slave/mariadb/dan_demeter2/build/mysql-test/suite/rpl/t/rpl_gtid_basic.test at line 471:
At line 115: Timeout in include/wait_for_slave_param.inc
 
The result from queries just before the failure was:
< snip >
master-bin.000001	2984	Query	1	3073	use `test`; INSERT INTO t1 VALUES (17)
master-bin.000001	3073	Xid	1	3100	COMMIT /* xid=399 */
master-bin.000001	3100	Gtid	1	3138	BEGIN GTID 2-1-1031
master-bin.000001	3138	Query	1	3227	use `test`; INSERT INTO t1 VALUES (18)
master-bin.000001	3227	Xid	1	3254	COMMIT /* xid=400 */
master-bin.000001	3254	Gtid	1	3292	BEGIN GTID 2-1-1032
master-bin.000001	3292	Query	1	3381	use `test`; INSERT INTO t1 VALUES (19)
master-bin.000001	3381	Xid	1	3408	COMMIT /* xid=401 */
master-bin.000001	3408	Gtid	1	3446	GTID 3-1-100
master-bin.000001	3446	Query	1	3544	use `test`; CREATE TABLE t2 (a INT PRIMARY KEY)
master-bin.000001	3544	Gtid	1	3582	GTID 3-1-101
master-bin.000001	3582	Query	1	3686	use `test`; DROP TABLE `t2` /* generated by server */
master-bin.000001	3686	Gtid	1	3724	BEGIN GTID 2-1-1040
master-bin.000001	3724	Query	1	3813	use `test`; INSERT INTO t1 VALUES (20)
master-bin.000001	3813	Xid	1	3840	COMMIT /* xid=421 */
 
**** SHOW RELAYLOG EVENTS on server_1 ****
relaylog_name = 'No such row'
SHOW RELAYLOG EVENTS IN 'No such row';
Log_name	Pos	Event_type	Server_id	End_log_pos	Info



 Comments   
Comment by Kristian Nielsen [ 2014-12-02 ]

So I checked the error log from the failing test.

What we see is that START SLAVE starts the SQL thread, but the I/O thread is
not started. This causes the wait to time out, as no events arrive from the
master.

I notice that just before this point, the test uses START SLAVE UNTIL, then
waits for the SQL thread to stop. But there is no wait for the IO thread to
stop. It seems plausible that the IO thread is still shutting down due to
UNTIL when the test reaches START SLAVE. Thus, START SLAVE does not start the
IO thread (it is not running), and a split-second after, the IO thread then
stops due to the UNTIL condition. This would leave to a failure like what is
seen.

EDIT: confirmed, I was able to reproduce with a sleep just before the I/O
thread marks itself as stopped.

Comment by Kristian Nielsen [ 2014-12-02 ]

Pushed to 10.0.16:

http://lists.askmonty.org/pipermail/commits/2014-December/007140.html

Generated at Thu Feb 08 07:18:02 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.