[MDEV-10632] rpl.rpl_parallel fails in buildbot, Failed to sync with master Created: 2016-08-21  Updated: 2017-02-21  Resolved: 2017-02-21

Status: Closed
Project: MariaDB Server
Component/s: Tests
Affects Version/s: 10.0
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Elena Stepanova
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocks
blocks MDEV-7069 Fix buildbot failures in main server ... Stalled
blocks MDEV-10550 Parallel replication can't sync with ... Closed
Sprint: 5.5.55

 Description   

http://buildbot.askmonty.org/buildbot/builders/p8-trusty-bintar-debug/builds/95/steps/test/logs/stdio

rpl.rpl_parallel 'innodb_plugin,mix'     w1 [ fail ]
        Test ended at 2016-08-14 05:19:15
 
CURRENT_TEST: rpl.rpl_parallel
mysqltest: In included file "./include/sync_with_master_gtid.inc": 
included from /var/lib/buildbot/maria-slave/p8-trusty-bintar-debug/build/mysql-test/suite/rpl/t/rpl_parallel.test at line 2371:
At line 44: Failed to sync with master
 
The result from queries just before the failure was:
< snip >
SET @old_dbug= @@SESSION.debug_dbug;
SET @commit_id= 20000;
SET SESSION debug_dbug="+d,binlog_force_commit_id";
SET SESSION debug_dbug=@old_dbug;
SELECT * FROM t7 ORDER BY a;
a	b
1	1
2	2
3	86
4	4
5	5
100	5
101	1
102	2
103	3
104	4
include/save_master_gtid.inc
include/start_slave.inc
include/sync_with_master_gtid.inc
Timeout in master_gtid_wait('1-1-3,0-1-1454,3-1-1,2-1-2', 120), current slave GTID position is: 1-1-3,0-1-1254,3-1-1,2-1-2.



 Comments   
Comment by Elena Stepanova [ 2017-01-03 ]

This failure was observed on two builders:

  • p8-trusty-bintar-debug till mid August 2016, about the time when the extremely slow slave p8-trusty-bb was replaced by a less slow one, power8-vlp04;
  • currently on xenial-amd64-valgrind – a valgrind builder which runs tests with high parallel value (--parallel=20 at the moment).

The problem is, apparently, a simple timing issue, when on a really slow builder the default 120 second timeout in sync_with_master_gtid.inc wasn't enough for the slave to catch up with master. Since we don't observe it anymore on a non-valgrind builders, let's ignore this possibility for now (that p8 was indeed extremely slow); and for valgrind runs, it makes sense to increase the timeout as it has already been done in some other places in MTR.

https://github.com/MariaDB/server/commit/80a4525b3a9f234043419ce2217880516fd3195b

Comment by Elena Stepanova [ 2017-02-21 ]

It happened again recently (Feb 17, 2017) on p8-trusty-bintar-debug:
http://askmonty.org/buildbot/builders/p8-trusty-bintar-debug/builds/560

Later P8 builders were switched to using --mem. After that, the average execution time for this test on this builder has gone down from ~150 sec to ~25 sec, so hopefully the timing issue is solved now (and as mentioned above, for valgrind builds the timeout was exceeded).

Generated at Thu Feb 08 07:43:44 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.