[MDEV-10042] multi_source.info_logs sporadically fails in buildbot, wrong result Created: 2016-05-07  Updated: 2020-11-12  Resolved: 2020-11-12

Status: Closed
Project: MariaDB Server
Component/s: Replication, Tests
Affects Version/s: 10.0, 10.1, 10.2, 10.3, 10.4
Fix Version/s: 10.2.37, 10.3.28, 10.4.18, 10.5.9

Type: Bug Priority: Minor
Reporter: Elena Stepanova Assignee: Sujatha Sivakumar (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocks
blocks MDEV-7069 Fix buildbot failures in main server ... Stalled
Duplicate
is duplicated by MDEV-14478 multi_source.info_logs failed in bui... Closed
Relates
relates to MDEV-4633 multi_source.simple test fails sporad... Closed
relates to MDEV-10690 multi_source.reset_slave failed in bu... Closed
relates to MDEV-21290 multi_source.info_logs test fails Confirmed

 Description   

http://buildbot.askmonty.org/buildbot/builders/kvm-fulltest/builds/5931/steps/test_7/logs/stdio

CURRENT_TEST: multi_source.info_logs
--- /mnt/buildbot/build/mariadb-10.1.14/mysql-test/suite/multi_source/info_logs.result	2016-05-06 14:03:15.000000000 +0300
+++ /mnt/buildbot/build/mariadb-10.1.14/mysql-test/suite/multi_source/info_logs.reject	2016-05-06 19:49:25.000000000 +0300
@@ -86,7 +86,7 @@
 show all slaves status;
 Connection_name	Slave_SQL_State	Slave_IO_State	Master_Host	Master_User	Master_Port	Connect_Retry	Master_Log_File	Read_Master_Log_Pos	Relay_Log_File	Relay_Log_Pos	Relay_Master_Log_File	Slave_IO_Running	Slave_SQL_Running	Replicate_Do_DB	Replicate_Ignore_DB	Replicate_Do_Table	Replicate_Ignore_Table	Replicate_Wild_Do_Table	Replicate_Wild_Ignore_Table	Last_Errno	Last_Error	Skip_Counter	Exec_Master_Log_Pos	Relay_Log_Space	Until_Condition	Until_Log_File	Until_Log_Pos	Master_SSL_Allowed	Master_SSL_CA_File	Master_SSL_CA_Path	Master_SSL_Cert	Master_SSL_Cipher	Master_SSL_Key	Seconds_Behind_Master	Master_SSL_Verify_Server_Cert	Last_IO_Errno	Last_IO_Error	Last_SQL_Errno	Last_SQL_Error	Replicate_Ignore_Server_Ids	Master_Server_Id	Master_SSL_Crl	Master_SSL_Crlpath	Using_Gtid	Gtid_IO_Pos	Replicate_Do_Domain_Ids	Replicate_Ignore_Domain_Ids	Parallel_Mode	Retried_transactions	Max_relay_log_size	Executed_log_entries	Slave_received_heartbeats	Slave_heartbeat_period	Gtid_Slave_Pos
 	Slave has read all relay log; waiting for the slave I/O thread to update it	Waiting for master to send event	127.0.0.1	root	MYPORT_1	60	master-bin.000001	<read_master_log_pos>	relay.000002	<relay_log_pos>	master-bin.000001	Yes	Yes							0		0	<read_master_log_pos>	<relay_log_space1>	None		0	No						0	No	0		0			1			No				conservative	0	1073741824	7	0	60.000	
-MASTER 2.2	Slave has read all relay log; waiting for the slave I/O thread to update it	Waiting for master to send event	127.0.0.1	root	MYPORT_2	60	master-bin.000001	<read_master_log_pos>	relay-master@00202@002e2.000002	<relay_log_pos>	master-bin.000001	Yes	Yes							0		0	<read_master_log_pos>	<relay_log_space2>	None		0	No						0	No	0		0			2			No				conservative	0	1073741824	7	0	60.000	
+MASTER 2.2	Slave has read all relay log; waiting for the slave I/O thread to update it	Waiting for master to send event	127.0.0.1	root	MYPORT_2	60	master-bin.000001	<read_master_log_pos>	relay-master@00202@002e2.000002	<relay_log_pos>	master-bin.000001	Yes	Yes							0		0	<read_master_log_pos>	885	None		0	No						0	No	0		0			2			No				conservative	0	1073741824	7	0	60.000	
 include/wait_for_slave_to_start.inc
 set default_master_connection = 'MASTER 2.2';
 include/wait_for_slave_to_start.inc
 
mysqltest: Result length mismatch

It happens rather rarely.



 Comments   
Comment by Elena Stepanova [ 2016-06-09 ]

It happens rarely, according to cross-reference roughly once in a month. As of today, the last occasion was on May 6, 2016. Before that, April 30, March 13...

Debug and non-debug, ps-protocol and non-ps-protocol, 32-bit and 64-bit.

Does not depend on previous tests (can be first test after restart).

Comment by Elena Stepanova [ 2016-06-09 ]

Reproduced it locally by running with

perl ./mtr multi_source.info_logs --repeat=3000  --mem --force-restart

on 10.1 debug tree.

The failure happened upon 1764th execution.

Comment by Elena Stepanova [ 2017-10-29 ]

10.2 occurrence: http://buildbot.askmonty.org/buildbot/builders/kvm-deb-xenial-ppc64le/builds/1536

Comment by Marko Mäkelä [ 2019-09-04 ]

10.3 occurrence: http://buildbot.askmonty.org/buildbot/builders/kvm-bintar-xenial-x86/builds/2286/steps/mtr/logs/stdio

CURRENT_TEST: multi_source.info_logs
--- /usr/local/mariadb-10.3.18-linux-i686/mysql-test/suite/multi_source/info_logs.result	2019-09-03 14:50:51.000000000 -0400
+++ /usr/local/mariadb-10.3.18-linux-i686/mysql-test/suite/multi_source/info_logs.reject	2019-09-03 16:26:21.925372724 -0400
@@ -91,7 +91,7 @@
 show all slaves status;
 Connection_name	Slave_SQL_State	Slave_IO_State	Master_Host	Master_User	Master_Port	Connect_Retry	Master_Log_File	Read_Master_Log_Pos	Relay_Log_File	Relay_Log_Pos	Relay_Master_Log_File	Slave_IO_Running	Slave_SQL_Running	Replicate_Do_DB	Replicate_Ignore_DB	Replicate_Do_Table	Replicate_Ignore_Table	Replicate_Wild_Do_Table	Replicate_Wild_Ignore_Table	Last_Errno	Last_Error	Skip_Counter	Exec_Master_Log_Pos	Relay_Log_Space	Until_Condition	Until_Log_File	Until_Log_Pos	Master_SSL_Allowed	Master_SSL_CA_File	Master_SSL_CA_Path	Master_SSL_Cert	Master_SSL_Cipher	Master_SSL_Key	Seconds_Behind_Master	Master_SSL_Verify_Server_Cert	Last_IO_Errno	Last_IO_Error	Last_SQL_Errno	Last_SQL_Error	Replicate_Ignore_Server_Ids	Master_Server_Id	Master_SSL_Crl	Master_SSL_Crlpath	Using_Gtid	Gtid_IO_Pos	Replicate_Do_Domain_Ids	Replicate_Ignore_Domain_Ids	Parallel_Mode	SQL_Delay	SQL_Remaining_Delay	Slave_SQL_Running_State	Slave_DDL_Groups	Slave_Non_Transactional_Groups	Slave_Transactional_Groups	Retried_transactions	Max_relay_log_size	Executed_log_entries	Slave_received_heartbeats	Slave_heartbeat_period	Gtid_Slave_Pos
 	Slave has read all relay log; waiting for the slave I/O thread to update it	Waiting for master to send event	127.0.0.1	root	MYPORT_1	60	master-bin.000001	<read_master_log_pos>	relay.000002	<relay_log_pos>	master-bin.000001	Yes	Yes							0		0	<read_master_log_pos>	<relay_log_space1>	None		0	No						0	No	0		0			1			No				conservative	0	NULL	Slave has read all relay log; waiting for the slave I/O thread to update it	0	0	0	0	1073741824	7	0	60.000	
-MASTER 2.2	Slave has read all relay log; waiting for the slave I/O thread to update it	Waiting for master to send event	127.0.0.1	root	MYPORT_2	60	master-bin.000001	<read_master_log_pos>	relay-master@00202@002e2.000002	<relay_log_pos>	master-bin.000001	Yes	Yes							0		0	<read_master_log_pos>	<relay_log_space2>	None		0	No						0	No	0		0			2			No				conservative	0	NULL	Slave has read all relay log; waiting for the slave I/O thread to update it	0	0	0	0	1073741824	7	0	60.000	
+MASTER 2.2	Slave has read all relay log; waiting for the slave I/O thread to update it	Waiting for master to send event	127.0.0.1	root	MYPORT_2	60	master-bin.000001	<read_master_log_pos>	relay-master@00202@002e2.000002	<relay_log_pos>	master-bin.000001	Yes	Yes							0		0	<read_master_log_pos>	918	None		0	No						0	No	0		0			2			No				conservative	0	NULL	Slave has read all relay log; waiting for the slave I/O thread to update it	0	0	0	0	1073741824	7	0	60.000	
 include/wait_for_slave_to_start.inc
 set default_master_connection = 'MASTER 2.2';
 include/wait_for_slave_to_start.inc

As far as I can tell, the only change is that <relay_log_space2> was replaced with 918.
Side note: such multi-column output would be easier to compare with vertical_results.

Comment by Otto Kekäläinen [ 2020-04-09 ]

I saw a similar failure in http://buildbot.askmonty.org/buildbot/builders/kvm-deb-xenial-x86/builds/16256/steps/mtr/logs/stdio

multi_source.info_logs                   w4 [ fail ]
        Test ended at 2020-04-08 18:47:06
 
CURRENT_TEST: multi_source.info_logs
--- /usr/share/mysql/mysql-test/suite/multi_source/info_logs.result	2020-04-08 16:52:03.000000000 -0400
+++ /dev/shm/var/4/log/info_logs.reject	2020-04-08 18:47:06.554358650 -0400
@@ -90,7 +90,7 @@
 #
 show all slaves status;
 Connection_name	Slave_SQL_State	Slave_IO_State	Master_Host	Master_User	Master_Port	Connect_Retry	Master_Log_File	Read_Master_Log_Pos	Relay_Log_File	Relay_Log_Pos	Relay_Master_Log_File	Slave_IO_Running	Slave_SQL_Running	Replicate_Do_DB	Replicate_Ignore_DB	Replicate_Do_Table	Replicate_Ignore_Table	Replicate_Wild_Do_Table	Replicate_Wild_Ignore_Table	Last_Errno	Last_Error	Skip_Counter	Exec_Master_Log_Pos	Relay_Log_Space	Until_Condition	Until_Log_File	Until_Log_Pos	Master_SSL_Allowed	Master_SSL_CA_File	Master_SSL_CA_Path	Master_SSL_Cert	Master_SSL_Cipher	Master_SSL_Key	Seconds_Behind_Master	Master_SSL_Verify_Server_Cert	Last_IO_Errno	Last_IO_Error	Last_SQL_Errno	Last_SQL_Error	Replicate_Ignore_Server_Ids	Master_Server_Id	Master_SSL_Crl	Master_SSL_Crlpath	Using_Gtid	Gtid_IO_Pos	Replicate_Do_Domain_Ids	Replicate_Ignore_Domain_Ids	Parallel_Mode	SQL_Delay	SQL_Remaining_Delay	Slave_SQL_Running_State	Slave_DDL_Groups	Slave_Non_Transactional_Groups	Slave_Transactional_Groups	Retried_transactions	Max_relay_log_size	Executed_log_entries	Slave_received_heartbeats	Slave_heartbeat_period	Gtid_Slave_Pos
-	Slave has read all relay log; waiting for more updates	Waiting for master to send event	127.0.0.1	root	MYPORT_1	60	master-bin.000001	<read_master_log_pos>	relay.000002	<relay_log_pos>	master-bin.000001	Yes	Yes							0		0	<read_master_log_pos>	<relay_log_space1>	None		0	No						0	No	0		0			1			No				optimistic	0	NULL	Slave has read all relay log; waiting for more updates	0	0	0	0	1073741824	7	0	60.000	
+	Slave has read all relay log; waiting for more updates	Waiting for master to send event	127.0.0.1	root	MYPORT_1	60	master-bin.000001	<read_master_log_pos>	relay.000002	<relay_log_pos>	master-bin.000001	Yes	Yes							0		0	<read_master_log_pos>	884	None		0	No						0	No	0		0			1			No				optimistic	0	NULL	Slave has read all relay log; waiting for more updates	0	0	0	0	1073741824	7	0	60.000	
 MASTER 2.2	Slave has read all relay log; waiting for more updates	Waiting for master to send event	127.0.0.1	root	MYPORT_2	60	master-bin.000001	<read_master_log_pos>	relay-master@00202@002e2.000002	<relay_log_pos>	master-bin.000001	Yes	Yes							0		0	<read_master_log_pos>	<relay_log_space2>	None		0	No						0	No	0		0			2			No				optimistic	0	NULL	Slave has read all relay log; waiting for more updates	0	0	0	0	1073741824	7	0	60.000	
 include/wait_for_slave_to_start.inc
 set default_master_connection = 'MASTER 2.2';
 
mysqltest: Result length mismatch

Comment by Sujatha Sivakumar (Inactive) [ 2020-08-28 ]

This issue is duplicate of MDEV-4633

Writes to 'rli->log_space_total' needs to be synchronized. All modifications to 'rli->log_space_total' should be performed by holding 'rli->log_space_lock'. But IO thread doesn't acquire this mutex while writing events to relay log.
Hence there is a race condition between SQL THREAD and IO THREAD, when both try to modify the same variable. SQL_THREAD is trying to decrement 'rli->log_space_total' in 'purge_first_log' and IO_THREAD is trying to increment the 'rli->log_space_total' in 'queue_event' simultaneously . Hence test occasionally fails with result mismatch.

Comment by Sujatha Sivakumar (Inactive) [ 2020-11-12 ]

Fix implemented as part of MDEV-4633

Generated at Thu Feb 08 07:39:14 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.