[MDEV-17109] rpl.rpl_parallel_retry fails in buildbot with timeout in wait_condition.inc Created: 2018-08-31  Updated: 2021-01-12  Resolved: 2021-01-12

Status: Closed
Project: MariaDB Server
Component/s: Tests
Affects Version/s: 10.2, 10.3, 10.4, 10.5
Fix Version/s: 10.2.37, 10.3.28, 10.4.18, 10.5.9

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Andrei Elkin
Resolution: Fixed Votes: 1
Labels: None


 Description   

http://buildbot.askmonty.org/buildbot/builders/kvm-bintar-centos6-amd64-debug/builds/776

rpl.rpl_parallel_retry 'innodb,mix'      w1 [ fail ]
        Test ended at 2018-08-30 11:51:20
 
CURRENT_TEST: rpl.rpl_parallel_retry
--- /usr/local/mariadb-10.2.18-linux-x86_64/mysql-test/suite/rpl/r/rpl_parallel_retry.result	2018-08-30 10:54:56.000000000 +0300
+++ /usr/local/mariadb-10.2.18-linux-x86_64/mysql-test/suite/rpl/r/rpl_parallel_retry.reject	2018-08-30 11:51:19.584900740 +0300
@@ -384,6 +384,24 @@
 SET @@DEBUG_SYNC='now SIGNAL proceed_by_1001';
 # observe how it all ends
 # Wait for the workers to go home and check the result of applying
+Timeout in wait_condition.inc for SELECT count(*) = 0 FROM information_schema.processlist WHERE command = 'Slave_worker'
+Id	User	Host	db	Command	Time	State	Info	Progress
+1	system user		NULL	Daemon	NULL	InnoDB purge coordinator	NULL	0.000
+2	system user		NULL	Daemon	NULL	InnoDB purge worker	NULL	0.000
+3	system user		NULL	Daemon	NULL	InnoDB purge worker	NULL	0.000
+4	system user		NULL	Daemon	NULL	InnoDB shutdown handler	NULL	0.000
+5	system user		NULL	Daemon	NULL	InnoDB purge worker	NULL	0.000
+1273	root	localhost:39406	test	Query	0	init	show full processlist	0.000
+1274	root	localhost:39407	test	Sleep	37		NULL	0.000
+1341	root	localhost:39448	test	Sleep	31		NULL	0.000
+1342	root	localhost:39449	test	Sleep	31		NULL	0.000
+1343	system user		NULL	Slave_IO	36	Waiting for master to send event	NULL	0.000
+1346	system user		NULL	Slave_worker	31	Waiting for work from SQL thread	NULL	0.000
+1345	system user		NULL	Slave_worker	36	Waiting for work from SQL thread	NULL	0.000
+1347	system user		NULL	Slave_worker	36	Waiting for work from SQL thread	NULL	0.000
+1348	system user		NULL	Slave_worker	31	debug sync point: now	NULL	0.000
+1349	system user		NULL	Slave_worker	36	Waiting for work from SQL thread	NULL	0.000
+1344	system user		NULL	Slave_SQL	36	Reading event from the relay log	NULL	0.000
 # which is OK
 connection server_2;
 include/stop_slave.inc
 
mysqltest: Result length mismatch



 Comments   
Comment by Marko Mäkelä [ 2019-09-04 ]

Recent occurrence on 10.2: http://buildbot.askmonty.org/buildbot/builders/kvm-asan/builds/1979/steps/mtr_nm/logs/stdio

Comment by Aleksey Midenkov [ 2019-11-07 ]

http://buildbot.askmonty.org/buildbot/reports/cross_reference#branch=&revision=&platform=&dt=&bbnum=&typ=&info=&fail_name=^rpl.rpl_parallel_retry&fail_variant=&fail_info_short=&fail_info_full=&limit=50

Comment by Alice Sherepa [ 2020-08-31 ]

recent occurrence on 10.5 http://buildbot.askmonty.org/buildbot/builders/kvm-rpm-centos74-amd64-debug/builds/4789/steps/mtr/logs/stdio

Comment by Michael Widenius [ 2020-09-15 ]

I have pushed a proposed fix to bb-10.4-monty....

Comment by Michael Widenius [ 2020-09-25 ]

The fix seams to work; Have not got the problem in any of my late pushes.

Comment by Marko Mäkelä [ 2020-11-18 ]

The test still fails, also for 10.4 and 10.5 branches where the attempted fix is present. (It also fails on 10.2 and 10.3.)

Here are some recent 10.4 and 10.5 failures:
10.4 f0c990379565748c7faa6a068b992f61

CURRENT_TEST: rpl.rpl_parallel_retry
--- /usr/share/mysql-test/suite/rpl/r/rpl_parallel_retry.result	2020-11-17 12:02:19.000000000 +0000
+++ /dev/shm/var/1/log/rpl_parallel_retry.reject	2020-11-17 14:56:47.139770927 +0000
@@ -384,6 +384,24 @@
 SET @@DEBUG_SYNC='now SIGNAL proceed_by_1001';
 # observe how it all ends
 # Wait for the workers to go home and check the result of applying
+Timeout in wait_condition.inc for SELECT count(*) = 0 FROM information_schema.processlist WHERE command = 'Slave_worker'
+Id	User	Host	db	Command	Time	State	Info	Progress
+1	system user		NULL	Daemon	NULL	InnoDB purge worker	NULL	0.000
+2	system user		NULL	Daemon	NULL	InnoDB purge worker	NULL	0.000
+3	system user		NULL	Daemon	NULL	InnoDB purge worker	NULL	0.000
+4	system user		NULL	Daemon	NULL	InnoDB purge coordinator	NULL	0.000
+5	system user		NULL	Daemon	NULL	InnoDB shutdown handler	NULL	0.000
+813	root	localhost:45350	test	Query	0	Init	show full processlist	0.000
+814	root	localhost:45352	test	Sleep	40		NULL	0.000
+881	root	localhost:45412	test	Sleep	32		NULL	0.000
+882	root	localhost:45414	test	Sleep	32		NULL	0.000
+883	system user		NULL	Slave_IO	38	Waiting for master to send event	NULL	0.000
+885	system user		NULL	Slave_worker	38	Waiting for work from SQL thread	NULL	0.000
+886	system user		NULL	Slave_worker	38	Waiting for work from SQL thread	NULL	0.000
+888	system user		NULL	Slave_worker	35	debug sync point: now	NULL	0.000
+889	system user		NULL	Slave_worker	38	Waiting for work from SQL thread	NULL	0.000
+887	system user		NULL	Slave_worker	32	Waiting for work from SQL thread	NULL	0.000
+884	system user		NULL	Slave_SQL	38	Reading event from the relay log	NULL	0.000
 # which is OK
 connection server_2;
 include/stop_slave.inc

10.5 0fc0eb1e5b9015e51daef9709937be69

rpl.rpl_parallel_retry 'innodb,mix'      w3 [ fail ]
        Test ended at 2020-11-17 00:40:07
 
CURRENT_TEST: rpl.rpl_parallel_retry
--- /usr/share/mysql-test/suite/rpl/r/rpl_parallel_retry.result	2020-11-16 20:45:25.000000000 +0000
+++ /dev/shm/var/3/log/rpl_parallel_retry.reject	2020-11-17 00:40:07.210954097 +0000
@@ -384,6 +384,19 @@
 SET @@DEBUG_SYNC='now SIGNAL proceed_by_1001';
 # observe how it all ends
 # Wait for the workers to go home and check the result of applying
+Timeout in wait_condition.inc for SELECT count(*) = 0 FROM information_schema.processlist WHERE command = 'Slave_worker'
+Id	User	Host	db	Command	Time	State	Info	Progress
+711	root	localhost:38302	test	Query	0	starting	show full processlist	0.000
+712	root	localhost:38304	test	Sleep	37		NULL	0.000
+779	root	localhost:38410	test	Sleep	30		NULL	0.000
+780	root	localhost:38412	test	Sleep	30		NULL	0.000
+781	system user		NULL	Slave_IO	36	Waiting for master to send event	NULL	0.000
+784	system user		NULL	Slave_worker	36	Waiting for work from SQL thread	NULL	0.000
+785	system user		NULL	Slave_worker	36	Waiting for work from SQL thread	NULL	0.000
+786	system user		NULL	Slave_worker	30	debug sync point: now	NULL	0.000
+783	system user		NULL	Slave_worker	36	Waiting for work from SQL thread	NULL	0.000
+787	system user		NULL	Slave_worker	30	Waiting for work from SQL thread	NULL	0.000
+782	system user		NULL	Slave_SQL	36	Reading event from the relay log	NULL	0.000
 # which is OK
 connection server_2;
 include/stop_slave.inc
 
mysqltest: Result length mismatch

Comment by Alice Sherepa [ 2021-01-12 ]

fixed by 4e43e2f92d1b418dcc8 commit

Generated at Thu Feb 08 08:33:59 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.