[MDEV-7430] rpl.rpl_gtid_crash still fails in buildbot Created: 2015-01-11  Updated: 2015-01-15  Resolved: 2015-01-15

Status: Closed
Project: MariaDB Server
Component/s: Tests
Affects Version/s: 10.0
Fix Version/s: 10.0.16

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Kristian Nielsen
Resolution: Fixed Votes: 0
Labels: buildbot, tests

Issue Links:
Relates
relates to MDEV-7069 Fix buildbot failures in main server ... Stalled

 Description   

http://buildbot.askmonty.org/buildbot/builders/p8-trusty-bintar-debug/builds/97/steps/test/logs/stdio

rpl.rpl_gtid_crash 'mix,xtradb'          w4 [ fail ]
        Test ended at 2015-01-06 09:50:42
 
CURRENT_TEST: rpl.rpl_gtid_crash
mysqltest: In included file "./include/sync_with_master_gtid.inc": 
included from /var/lib/buildbot/maria-slave/power8-vlp04-bintar-debug/build/mysql-test/suite/rpl/t/rpl_gtid_crash.test at line 78:
At line 44: Failed to sync with master
 
The result from queries just before the failure was:
< snip >
call mtr.add_suppression("InnoDB: Warning: database page corruption or a failed");
flush tables;
ALTER TABLE mysql.gtid_slave_pos ENGINE=InnoDB;
CREATE TABLE t1 (a INT PRIMARY KEY, b INT) ENGINE=InnoDB;
INSERT INTO t1 VALUES (1, 0);
include/stop_slave.inc
CHANGE MASTER TO master_host = '127.0.0.1', master_port = MASTER_PORT,
MASTER_USE_GTID=CURRENT_POS;
INSERT INTO t1 VALUES (2,1);
INSERT INTO t1 VALUES (3,1);
include/start_slave.inc
include/save_master_gtid.inc
SET SESSION debug_dbug="+d,crash_dispatch_command_before";
SELECT 1;
Got one of the listed errors
include/sync_with_master_gtid.inc
INSERT INTO t1 VALUES (1000, 3);
include/save_master_gtid.inc
include/sync_with_master_gtid.inc
Timeout in master_gtid_wait('0-1-211', 120), current slave GTID position is: 0-1-210.

The failure above is on P8. I'm not sure yet whether it's specific for P8 or not.



 Comments   
Comment by Kristian Nielsen [ 2015-01-15 ]

I think it might not be P8 specific, as this one on fulltest2 (which is x86/amd64) looks identical:

http://buildbot.askmonty.org/buildbot/builders/kvm-fulltest2/builds/3037/steps/test_6/logs/stdio

The failure does seem quite rate, though

Comment by Kristian Nielsen [ 2015-01-15 ]

Failure can be reproduced with this sleep in the code:

=== modified file 'sql/mysqld.cc'
--- sql/mysqld.cc	2014-11-18 21:25:47 +0000
+++ sql/mysqld.cc	2015-01-15 14:32:20 +0000
@@ -5212,6 +5212,7 @@ int mysqld_main(int argc, char **argv)
   }
 #endif
 
+fprintf(stderr, "XXX2 delay startup...\n"); my_sleep(11000000);
   orig_argc= argc;
   orig_argv= argv;
   my_getopt_use_args_separator= TRUE;

The problem seems to be just that mysql-test-run.pl configures the slave to
give up reconnecting after just 9 seconds (10 attempts with 1 second sleep
in-between). That is apparently too short in rare cases in our buildbot setup.

(The logs from the failures in buildbot confirm that the slave IO thread exits
after 9 seconds).

Comment by Kristian Nielsen [ 2015-01-15 ]

Pushed to 10.0.16:

http://lists.askmonty.org/pipermail/commits/2015-January/007270.html

Generated at Thu Feb 08 07:19:32 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.