[MDEV-8154] rpl.show_status_stop_slave_race-7126 sporadically causes internal check failure Created: 2015-05-13  Updated: 2015-09-28  Resolved: 2015-05-13

Status: Closed
Project: MariaDB Server
Component/s: Tests
Affects Version/s: 10.0
Fix Version/s: 5.5.46, 10.0.20

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Elena Stepanova
Resolution: Fixed Votes: 0
Labels: None


 Description   

Reported by AL13N on IRC.

I am also getting the failure (sporadically, but often enough) on a release build, but not on a debug build.

worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019
rpl.show_status_stop_slave_race-7126 'mix' [ pass ]   2121
 
MTR's internal check of the test case 'rpl.show_status_stop_slave_race-7126' failed.
This means that the test case does not preserve the state that existed
before the test case was executed.  Most likely the test case did not
do a proper clean-up. It could also be caused by the previous test run
by this thread, if the server wasn't restarted.
This is the diff of the states of the servers before and after the
test case was executed:
mysqltest: Logging to '/data/releases/mariadb-10.0.18-linux-x86_64/mysql-test/var/tmp/check-mysqld_2.log'.
mysqltest: Results saved in '/data/releases/mariadb-10.0.18-linux-x86_64/mysql-test/var/tmp/check-mysqld_2.result'.
mysqltest: Connecting to server localhost:16001 (socket /data/releases/mariadb-10.0.18-linux-x86_64/mysql-test/var/tmp/mysqld.2.sock) as 'root', connection 'default', attempt 0 ...
mysqltest: ... Connected.
mysqltest: Start processing test commands from './include/check-testcase.test' ...
mysqltest: ... Done processing test commands.
--- /data/releases/mariadb-10.0.18-linux-x86_64/mysql-test/var/tmp/check-mysqld_2.result	2015-05-13 02:02:04.000000000 +0300
+++ /data/releases/mariadb-10.0.18-linux-x86_64/mysql-test/var/tmp/check-mysqld_2.reject	2015-05-13 02:02:06.000000000 +0300
@@ -32,8 +32,8 @@
 Master_SSL_Key	
 Seconds_Behind_Master	NULL
 Master_SSL_Verify_Server_Cert	No
-Last_IO_Errno	0
-Last_IO_Error	
+Last_IO_Errno	2003
+Last_IO_Error	error connecting to master 'root@127.0.0.1:16000' - retry-time: 1  retries: 10  message: Can't connect to MySQL server on '127.0.0.1' (4 "Interrupted system call")
 Last_SQL_Errno	0
 Last_SQL_Error	
 Replicate_Ignore_Server_Ids	
 
mysqltest: Result length mismatch

Here is what I think is happening:

The test runs "START SLAVE; STOP SLAVE" multiple times concurrently in two connections; so sometimes STOP from one connection interrupts starting from another connection, and the slave IO thread stops with the 'interrupted system call' error. If it happens to be the last state at the end of the concurrent flow, the test ends with this status because it doesn't have replication cleanup at the end.

The change below seems to fix the problem, but I need a confirmation that the "Interrupted system call" status is really allowed here.

diff --git a/mysql-test/suite/rpl/r/show_status_stop_slave_race-7126.result b/mysql-test/suite/rpl/r/show_status_stop_slave_race-7126.result
index e71bb2e..c2a0498 100644
--- a/mysql-test/suite/rpl/r/show_status_stop_slave_race-7126.result
+++ b/mysql-test/suite/rpl/r/show_status_stop_slave_race-7126.result
@@ -1,2 +1,4 @@
 include/master-slave.inc
 [connection master]
+start slave;
+include/rpl_end.inc
diff --git a/mysql-test/suite/rpl/t/show_status_stop_slave_race-7126.test b/mysql-test/suite/rpl/t/show_status_stop_slave_race-7126.test
index a79a188..38759c9 100644
--- a/mysql-test/suite/rpl/t/show_status_stop_slave_race-7126.test
+++ b/mysql-test/suite/rpl/t/show_status_stop_slave_race-7126.test
@@ -10,3 +10,10 @@
 --exec $MYSQL_SLAP --silent --socket=$SLAVE_MYSOCK -q "START SLAVE; STOP SLAVE; SHOW GLOBAL STATUS" -c 2 --number-of-queries=100 --create-s
 
 # All done.
+
+--connection slave
+start slave;
+
+--connection master
+--source include/rpl_end.inc
+



 Comments   
Comment by Elena Stepanova [ 2015-05-13 ]

knielsen, could you please take a look to confirm (or not confirm) that it's a test problem only?

Comment by Kristian Nielsen [ 2015-05-13 ]

It should be ok. "Interrupted system call" is how STOP SLAVE is able to
cause the IO thread to get out of waiting on the network connection to the
master.

Comment by Elena Stepanova [ 2015-05-13 ]

Pushed into 10.0:
https://github.com/MariaDB/server/commit/0b4231e9f19308c6b8b57e07e52c1cf711691ebd

Comment by Elena Stepanova [ 2015-09-28 ]

Also applied to 5.5

Generated at Thu Feb 08 07:25:02 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.