Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-10042

multi_source.info_logs sporadically fails in buildbot, wrong result

Details

    Description

      http://buildbot.askmonty.org/buildbot/builders/kvm-fulltest/builds/5931/steps/test_7/logs/stdio

      CURRENT_TEST: multi_source.info_logs
      --- /mnt/buildbot/build/mariadb-10.1.14/mysql-test/suite/multi_source/info_logs.result	2016-05-06 14:03:15.000000000 +0300
      +++ /mnt/buildbot/build/mariadb-10.1.14/mysql-test/suite/multi_source/info_logs.reject	2016-05-06 19:49:25.000000000 +0300
      @@ -86,7 +86,7 @@
       show all slaves status;
       Connection_name	Slave_SQL_State	Slave_IO_State	Master_Host	Master_User	Master_Port	Connect_Retry	Master_Log_File	Read_Master_Log_Pos	Relay_Log_File	Relay_Log_Pos	Relay_Master_Log_File	Slave_IO_Running	Slave_SQL_Running	Replicate_Do_DB	Replicate_Ignore_DB	Replicate_Do_Table	Replicate_Ignore_Table	Replicate_Wild_Do_Table	Replicate_Wild_Ignore_Table	Last_Errno	Last_Error	Skip_Counter	Exec_Master_Log_Pos	Relay_Log_Space	Until_Condition	Until_Log_File	Until_Log_Pos	Master_SSL_Allowed	Master_SSL_CA_File	Master_SSL_CA_Path	Master_SSL_Cert	Master_SSL_Cipher	Master_SSL_Key	Seconds_Behind_Master	Master_SSL_Verify_Server_Cert	Last_IO_Errno	Last_IO_Error	Last_SQL_Errno	Last_SQL_Error	Replicate_Ignore_Server_Ids	Master_Server_Id	Master_SSL_Crl	Master_SSL_Crlpath	Using_Gtid	Gtid_IO_Pos	Replicate_Do_Domain_Ids	Replicate_Ignore_Domain_Ids	Parallel_Mode	Retried_transactions	Max_relay_log_size	Executed_log_entries	Slave_received_heartbeats	Slave_heartbeat_period	Gtid_Slave_Pos
       	Slave has read all relay log; waiting for the slave I/O thread to update it	Waiting for master to send event	127.0.0.1	root	MYPORT_1	60	master-bin.000001	<read_master_log_pos>	relay.000002	<relay_log_pos>	master-bin.000001	Yes	Yes							0		0	<read_master_log_pos>	<relay_log_space1>	None		0	No						0	No	0		0			1			No				conservative	0	1073741824	7	0	60.000	
      -MASTER 2.2	Slave has read all relay log; waiting for the slave I/O thread to update it	Waiting for master to send event	127.0.0.1	root	MYPORT_2	60	master-bin.000001	<read_master_log_pos>	relay-master@00202@002e2.000002	<relay_log_pos>	master-bin.000001	Yes	Yes							0		0	<read_master_log_pos>	<relay_log_space2>	None		0	No						0	No	0		0			2			No				conservative	0	1073741824	7	0	60.000	
      +MASTER 2.2	Slave has read all relay log; waiting for the slave I/O thread to update it	Waiting for master to send event	127.0.0.1	root	MYPORT_2	60	master-bin.000001	<read_master_log_pos>	relay-master@00202@002e2.000002	<relay_log_pos>	master-bin.000001	Yes	Yes							0		0	<read_master_log_pos>	885	None		0	No						0	No	0		0			2			No				conservative	0	1073741824	7	0	60.000	
       include/wait_for_slave_to_start.inc
       set default_master_connection = 'MASTER 2.2';
       include/wait_for_slave_to_start.inc
       
      mysqltest: Result length mismatch
      

      It happens rather rarely.

      Attachments

        Issue Links

          Activity

            elenst Elena Stepanova added a comment - - edited

            It happens rarely, according to cross-reference roughly once in a month. As of today, the last occasion was on May 6, 2016. Before that, April 30, March 13...

            Debug and non-debug, ps-protocol and non-ps-protocol, 32-bit and 64-bit.

            Does not depend on previous tests (can be first test after restart).

            elenst Elena Stepanova added a comment - - edited It happens rarely, according to cross-reference roughly once in a month. As of today, the last occasion was on May 6, 2016. Before that, April 30, March 13... Debug and non-debug, ps-protocol and non-ps-protocol, 32-bit and 64-bit. Does not depend on previous tests (can be first test after restart).

            Reproduced it locally by running with

            perl ./mtr multi_source.info_logs --repeat=3000  --mem --force-restart
            

            on 10.1 debug tree.

            The failure happened upon 1764th execution.

            elenst Elena Stepanova added a comment - Reproduced it locally by running with perl ./mtr multi_source.info_logs --repeat=3000 --mem --force-restart on 10.1 debug tree. The failure happened upon 1764th execution.
            elenst Elena Stepanova added a comment - 10.2 occurrence: http://buildbot.askmonty.org/buildbot/builders/kvm-deb-xenial-ppc64le/builds/1536
            marko Marko Mäkelä added a comment - - edited

            10.3 occurrence: http://buildbot.askmonty.org/buildbot/builders/kvm-bintar-xenial-x86/builds/2286/steps/mtr/logs/stdio

            CURRENT_TEST: multi_source.info_logs
            --- /usr/local/mariadb-10.3.18-linux-i686/mysql-test/suite/multi_source/info_logs.result	2019-09-03 14:50:51.000000000 -0400
            +++ /usr/local/mariadb-10.3.18-linux-i686/mysql-test/suite/multi_source/info_logs.reject	2019-09-03 16:26:21.925372724 -0400
            @@ -91,7 +91,7 @@
             show all slaves status;
             Connection_name	Slave_SQL_State	Slave_IO_State	Master_Host	Master_User	Master_Port	Connect_Retry	Master_Log_File	Read_Master_Log_Pos	Relay_Log_File	Relay_Log_Pos	Relay_Master_Log_File	Slave_IO_Running	Slave_SQL_Running	Replicate_Do_DB	Replicate_Ignore_DB	Replicate_Do_Table	Replicate_Ignore_Table	Replicate_Wild_Do_Table	Replicate_Wild_Ignore_Table	Last_Errno	Last_Error	Skip_Counter	Exec_Master_Log_Pos	Relay_Log_Space	Until_Condition	Until_Log_File	Until_Log_Pos	Master_SSL_Allowed	Master_SSL_CA_File	Master_SSL_CA_Path	Master_SSL_Cert	Master_SSL_Cipher	Master_SSL_Key	Seconds_Behind_Master	Master_SSL_Verify_Server_Cert	Last_IO_Errno	Last_IO_Error	Last_SQL_Errno	Last_SQL_Error	Replicate_Ignore_Server_Ids	Master_Server_Id	Master_SSL_Crl	Master_SSL_Crlpath	Using_Gtid	Gtid_IO_Pos	Replicate_Do_Domain_Ids	Replicate_Ignore_Domain_Ids	Parallel_Mode	SQL_Delay	SQL_Remaining_Delay	Slave_SQL_Running_State	Slave_DDL_Groups	Slave_Non_Transactional_Groups	Slave_Transactional_Groups	Retried_transactions	Max_relay_log_size	Executed_log_entries	Slave_received_heartbeats	Slave_heartbeat_period	Gtid_Slave_Pos
             	Slave has read all relay log; waiting for the slave I/O thread to update it	Waiting for master to send event	127.0.0.1	root	MYPORT_1	60	master-bin.000001	<read_master_log_pos>	relay.000002	<relay_log_pos>	master-bin.000001	Yes	Yes							0		0	<read_master_log_pos>	<relay_log_space1>	None		0	No						0	No	0		0			1			No				conservative	0	NULL	Slave has read all relay log; waiting for the slave I/O thread to update it	0	0	0	0	1073741824	7	0	60.000	
            -MASTER 2.2	Slave has read all relay log; waiting for the slave I/O thread to update it	Waiting for master to send event	127.0.0.1	root	MYPORT_2	60	master-bin.000001	<read_master_log_pos>	relay-master@00202@002e2.000002	<relay_log_pos>	master-bin.000001	Yes	Yes							0		0	<read_master_log_pos>	<relay_log_space2>	None		0	No						0	No	0		0			2			No				conservative	0	NULL	Slave has read all relay log; waiting for the slave I/O thread to update it	0	0	0	0	1073741824	7	0	60.000	
            +MASTER 2.2	Slave has read all relay log; waiting for the slave I/O thread to update it	Waiting for master to send event	127.0.0.1	root	MYPORT_2	60	master-bin.000001	<read_master_log_pos>	relay-master@00202@002e2.000002	<relay_log_pos>	master-bin.000001	Yes	Yes							0		0	<read_master_log_pos>	918	None		0	No						0	No	0		0			2			No				conservative	0	NULL	Slave has read all relay log; waiting for the slave I/O thread to update it	0	0	0	0	1073741824	7	0	60.000	
             include/wait_for_slave_to_start.inc
             set default_master_connection = 'MASTER 2.2';
             include/wait_for_slave_to_start.inc
            

            As far as I can tell, the only change is that <relay_log_space2> was replaced with 918.
            Side note: such multi-column output would be easier to compare with vertical_results.

            marko Marko Mäkelä added a comment - - edited 10.3 occurrence: http://buildbot.askmonty.org/buildbot/builders/kvm-bintar-xenial-x86/builds/2286/steps/mtr/logs/stdio CURRENT_TEST: multi_source.info_logs --- /usr/local/mariadb-10.3.18-linux-i686/mysql-test/suite/multi_source/info_logs.result 2019-09-03 14:50:51.000000000 -0400 +++ /usr/local/mariadb-10.3.18-linux-i686/mysql-test/suite/multi_source/info_logs.reject 2019-09-03 16:26:21.925372724 -0400 @@ -91,7 +91,7 @@ show all slaves status; Connection_name Slave_SQL_State Slave_IO_State Master_Host Master_User Master_Port Connect_Retry Master_Log_File Read_Master_Log_Pos Relay_Log_File Relay_Log_Pos Relay_Master_Log_File Slave_IO_Running Slave_SQL_Running Replicate_Do_DB Replicate_Ignore_DB Replicate_Do_Table Replicate_Ignore_Table Replicate_Wild_Do_Table Replicate_Wild_Ignore_Table Last_Errno Last_Error Skip_Counter Exec_Master_Log_Pos Relay_Log_Space Until_Condition Until_Log_File Until_Log_Pos Master_SSL_Allowed Master_SSL_CA_File Master_SSL_CA_Path Master_SSL_Cert Master_SSL_Cipher Master_SSL_Key Seconds_Behind_Master Master_SSL_Verify_Server_Cert Last_IO_Errno Last_IO_Error Last_SQL_Errno Last_SQL_Error Replicate_Ignore_Server_Ids Master_Server_Id Master_SSL_Crl Master_SSL_Crlpath Using_Gtid Gtid_IO_Pos Replicate_Do_Domain_Ids Replicate_Ignore_Domain_Ids Parallel_Mode SQL_Delay SQL_Remaining_Delay Slave_SQL_Running_State Slave_DDL_Groups Slave_Non_Transactional_Groups Slave_Transactional_Groups Retried_transactions Max_relay_log_size Executed_log_entries Slave_received_heartbeats Slave_heartbeat_period Gtid_Slave_Pos Slave has read all relay log; waiting for the slave I/O thread to update it Waiting for master to send event 127.0.0.1 root MYPORT_1 60 master-bin.000001 <read_master_log_pos> relay.000002 <relay_log_pos> master-bin.000001 Yes Yes 0 0 <read_master_log_pos> <relay_log_space1> None 0 No 0 No 0 0 1 No conservative 0 NULL Slave has read all relay log; waiting for the slave I/O thread to update it 0 0 0 0 1073741824 7 0 60.000 -MASTER 2.2 Slave has read all relay log; waiting for the slave I/O thread to update it Waiting for master to send event 127.0.0.1 root MYPORT_2 60 master-bin.000001 <read_master_log_pos> relay-master@00202@002e2.000002 <relay_log_pos> master-bin.000001 Yes Yes 0 0 <read_master_log_pos> <relay_log_space2> None 0 No 0 No 0 0 2 No conservative 0 NULL Slave has read all relay log; waiting for the slave I/O thread to update it 0 0 0 0 1073741824 7 0 60.000 +MASTER 2.2 Slave has read all relay log; waiting for the slave I/O thread to update it Waiting for master to send event 127.0.0.1 root MYPORT_2 60 master-bin.000001 <read_master_log_pos> relay-master@00202@002e2.000002 <relay_log_pos> master-bin.000001 Yes Yes 0 0 <read_master_log_pos> 918 None 0 No 0 No 0 0 2 No conservative 0 NULL Slave has read all relay log; waiting for the slave I/O thread to update it 0 0 0 0 1073741824 7 0 60.000 include/wait_for_slave_to_start.inc set default_master_connection = 'MASTER 2.2'; include/wait_for_slave_to_start.inc As far as I can tell, the only change is that <relay_log_space2> was replaced with 918 . Side note: such multi-column output would be easier to compare with vertical_results .

            I saw a similar failure in http://buildbot.askmonty.org/buildbot/builders/kvm-deb-xenial-x86/builds/16256/steps/mtr/logs/stdio

            multi_source.info_logs                   w4 [ fail ]
                    Test ended at 2020-04-08 18:47:06
             
            CURRENT_TEST: multi_source.info_logs
            --- /usr/share/mysql/mysql-test/suite/multi_source/info_logs.result	2020-04-08 16:52:03.000000000 -0400
            +++ /dev/shm/var/4/log/info_logs.reject	2020-04-08 18:47:06.554358650 -0400
            @@ -90,7 +90,7 @@
             #
             show all slaves status;
             Connection_name	Slave_SQL_State	Slave_IO_State	Master_Host	Master_User	Master_Port	Connect_Retry	Master_Log_File	Read_Master_Log_Pos	Relay_Log_File	Relay_Log_Pos	Relay_Master_Log_File	Slave_IO_Running	Slave_SQL_Running	Replicate_Do_DB	Replicate_Ignore_DB	Replicate_Do_Table	Replicate_Ignore_Table	Replicate_Wild_Do_Table	Replicate_Wild_Ignore_Table	Last_Errno	Last_Error	Skip_Counter	Exec_Master_Log_Pos	Relay_Log_Space	Until_Condition	Until_Log_File	Until_Log_Pos	Master_SSL_Allowed	Master_SSL_CA_File	Master_SSL_CA_Path	Master_SSL_Cert	Master_SSL_Cipher	Master_SSL_Key	Seconds_Behind_Master	Master_SSL_Verify_Server_Cert	Last_IO_Errno	Last_IO_Error	Last_SQL_Errno	Last_SQL_Error	Replicate_Ignore_Server_Ids	Master_Server_Id	Master_SSL_Crl	Master_SSL_Crlpath	Using_Gtid	Gtid_IO_Pos	Replicate_Do_Domain_Ids	Replicate_Ignore_Domain_Ids	Parallel_Mode	SQL_Delay	SQL_Remaining_Delay	Slave_SQL_Running_State	Slave_DDL_Groups	Slave_Non_Transactional_Groups	Slave_Transactional_Groups	Retried_transactions	Max_relay_log_size	Executed_log_entries	Slave_received_heartbeats	Slave_heartbeat_period	Gtid_Slave_Pos
            -	Slave has read all relay log; waiting for more updates	Waiting for master to send event	127.0.0.1	root	MYPORT_1	60	master-bin.000001	<read_master_log_pos>	relay.000002	<relay_log_pos>	master-bin.000001	Yes	Yes							0		0	<read_master_log_pos>	<relay_log_space1>	None		0	No						0	No	0		0			1			No				optimistic	0	NULL	Slave has read all relay log; waiting for more updates	0	0	0	0	1073741824	7	0	60.000	
            +	Slave has read all relay log; waiting for more updates	Waiting for master to send event	127.0.0.1	root	MYPORT_1	60	master-bin.000001	<read_master_log_pos>	relay.000002	<relay_log_pos>	master-bin.000001	Yes	Yes							0		0	<read_master_log_pos>	884	None		0	No						0	No	0		0			1			No				optimistic	0	NULL	Slave has read all relay log; waiting for more updates	0	0	0	0	1073741824	7	0	60.000	
             MASTER 2.2	Slave has read all relay log; waiting for more updates	Waiting for master to send event	127.0.0.1	root	MYPORT_2	60	master-bin.000001	<read_master_log_pos>	relay-master@00202@002e2.000002	<relay_log_pos>	master-bin.000001	Yes	Yes							0		0	<read_master_log_pos>	<relay_log_space2>	None		0	No						0	No	0		0			2			No				optimistic	0	NULL	Slave has read all relay log; waiting for more updates	0	0	0	0	1073741824	7	0	60.000	
             include/wait_for_slave_to_start.inc
             set default_master_connection = 'MASTER 2.2';
             
            mysqltest: Result length mismatch
            

            otto Otto Kekäläinen added a comment - I saw a similar failure in http://buildbot.askmonty.org/buildbot/builders/kvm-deb-xenial-x86/builds/16256/steps/mtr/logs/stdio multi_source.info_logs w4 [ fail ] Test ended at 2020-04-08 18:47:06   CURRENT_TEST: multi_source.info_logs --- /usr/share/mysql/mysql-test/suite/multi_source/info_logs.result 2020-04-08 16:52:03.000000000 -0400 +++ /dev/shm/var/4/log/info_logs.reject 2020-04-08 18:47:06.554358650 -0400 @@ -90,7 +90,7 @@ # show all slaves status; Connection_name Slave_SQL_State Slave_IO_State Master_Host Master_User Master_Port Connect_Retry Master_Log_File Read_Master_Log_Pos Relay_Log_File Relay_Log_Pos Relay_Master_Log_File Slave_IO_Running Slave_SQL_Running Replicate_Do_DB Replicate_Ignore_DB Replicate_Do_Table Replicate_Ignore_Table Replicate_Wild_Do_Table Replicate_Wild_Ignore_Table Last_Errno Last_Error Skip_Counter Exec_Master_Log_Pos Relay_Log_Space Until_Condition Until_Log_File Until_Log_Pos Master_SSL_Allowed Master_SSL_CA_File Master_SSL_CA_Path Master_SSL_Cert Master_SSL_Cipher Master_SSL_Key Seconds_Behind_Master Master_SSL_Verify_Server_Cert Last_IO_Errno Last_IO_Error Last_SQL_Errno Last_SQL_Error Replicate_Ignore_Server_Ids Master_Server_Id Master_SSL_Crl Master_SSL_Crlpath Using_Gtid Gtid_IO_Pos Replicate_Do_Domain_Ids Replicate_Ignore_Domain_Ids Parallel_Mode SQL_Delay SQL_Remaining_Delay Slave_SQL_Running_State Slave_DDL_Groups Slave_Non_Transactional_Groups Slave_Transactional_Groups Retried_transactions Max_relay_log_size Executed_log_entries Slave_received_heartbeats Slave_heartbeat_period Gtid_Slave_Pos - Slave has read all relay log; waiting for more updates Waiting for master to send event 127.0.0.1 root MYPORT_1 60 master-bin.000001 <read_master_log_pos> relay.000002 <relay_log_pos> master-bin.000001 Yes Yes 0 0 <read_master_log_pos> <relay_log_space1> None 0 No 0 No 0 0 1 No optimistic 0 NULL Slave has read all relay log; waiting for more updates 0 0 0 0 1073741824 7 0 60.000 + Slave has read all relay log; waiting for more updates Waiting for master to send event 127.0.0.1 root MYPORT_1 60 master-bin.000001 <read_master_log_pos> relay.000002 <relay_log_pos> master-bin.000001 Yes Yes 0 0 <read_master_log_pos> 884 None 0 No 0 No 0 0 1 No optimistic 0 NULL Slave has read all relay log; waiting for more updates 0 0 0 0 1073741824 7 0 60.000 MASTER 2.2 Slave has read all relay log; waiting for more updates Waiting for master to send event 127.0.0.1 root MYPORT_2 60 master-bin.000001 <read_master_log_pos> relay-master@00202@002e2.000002 <relay_log_pos> master-bin.000001 Yes Yes 0 0 <read_master_log_pos> <relay_log_space2> None 0 No 0 No 0 0 2 No optimistic 0 NULL Slave has read all relay log; waiting for more updates 0 0 0 0 1073741824 7 0 60.000 include/wait_for_slave_to_start.inc set default_master_connection = 'MASTER 2.2';   mysqltest: Result length mismatch

            This issue is duplicate of MDEV-4633

            Writes to 'rli->log_space_total' needs to be synchronized. All modifications to 'rli->log_space_total' should be performed by holding 'rli->log_space_lock'. But IO thread doesn't acquire this mutex while writing events to relay log.
            Hence there is a race condition between SQL THREAD and IO THREAD, when both try to modify the same variable. SQL_THREAD is trying to decrement 'rli->log_space_total' in 'purge_first_log' and IO_THREAD is trying to increment the 'rli->log_space_total' in 'queue_event' simultaneously . Hence test occasionally fails with result mismatch.

            sujatha.sivakumar Sujatha Sivakumar (Inactive) added a comment - This issue is duplicate of MDEV-4633 Writes to 'rli->log_space_total' needs to be synchronized. All modifications to 'rli->log_space_total' should be performed by holding 'rli->log_space_lock'. But IO thread doesn't acquire this mutex while writing events to relay log. Hence there is a race condition between SQL THREAD and IO THREAD, when both try to modify the same variable. SQL_THREAD is trying to decrement 'rli->log_space_total' in 'purge_first_log' and IO_THREAD is trying to increment the 'rli->log_space_total' in 'queue_event' simultaneously . Hence test occasionally fails with result mismatch.

            Fix implemented as part of MDEV-4633

            sujatha.sivakumar Sujatha Sivakumar (Inactive) added a comment - Fix implemented as part of MDEV-4633

            People

              sujatha.sivakumar Sujatha Sivakumar (Inactive)
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.