[MDEV-11861] Some replication tests quit dirty when skipped from inside the test Created: 2017-01-21  Updated: 2017-09-22  Resolved: 2017-09-22

Status: Closed
Project: MariaDB Server
Component/s: Tests
Affects Version/s: 10.0, 10.1, 10.2
Fix Version/s: 10.0.33, 10.1.27, 10.2.9

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Sergei Golubchik
Resolution: Fixed Votes: 0
Labels: None


 Description   

Some replication tests first establish and start replication, then find out that server options are not suitable and skip the rest of execution, thus leaving the replication running. It affects the next test started on the same servers when it attempts to configure replication and fails because it's not possible to do with replication already running.

Example:

perl ./mtr rpl.rpl_innodb_bug68220,row,xtradb rpl.rpl_not_null_innodb,row,xtradb --mysqld=--binlog-format=mixed
...
rpl.rpl_innodb_bug68220 'row,xtradb'     [ skipped ]  Not ROW format
rpl.rpl_not_null_innodb 'row,xtradb'     [ fail ]
        Test ended at 2017-01-21 16:38:53
 
CURRENT_TEST: rpl.rpl_not_null_innodb
mysqltest: In included file "./include/rpl_init.inc": 
included from ./include/master-slave.inc at line 38:
included from /data/src/10.1-bug/mysql-test/suite/rpl/t/rpl_not_null_innodb.test at line 15:
At line 165: query 'SET GLOBAL gtid_slave_pos= ""' failed: 1198: This operation cannot be performed as you have a running slave ''; run STOP SLAVE '' first
 
The result from queries just before the failure was:
include/master-slave.inc

The test rpl_innodb_bug68220.test starts this way:

--source include/have_innodb.inc
--source include/master-slave.inc
--source include/have_binlog_format_row.inc

An easy fix is to re-order include files, so that binlog format is checked before replication is configured:

--source include/have_innodb.inc
--source include/have_binlog_format_row.inc
--source include/master-slave.inc

perl ./mtr rpl.rpl_innodb_bug68220,row,xtradb rpl.rpl_not_null_innodb,row,xtradb --mysqld=--binlog-format=mixed
...
==============================================================================
 
TEST                                      RESULT   TIME (ms) or COMMENT
--------------------------------------------------------------------------
 
worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019
rpl.rpl_innodb_bug68220 'row,xtradb'     [ skipped ]  Not ROW format
rpl.rpl_not_null_innodb 'row,xtradb'     [ skipped ]  Not ROW format
--------------------------------------------------------------------------
The servers were restarted 1 times
Spent 0.000 of 11 seconds executing testcases
 
Completed: All 0 tests were successful.
 
2 tests were skipped, 2 by the test itself.

A smart way, however, is to find out why MTR doesn't check the environment after a skipped test and doesn't clean-up, as it would have done if the test failed rather than was skipped.



 Comments   
Comment by Elena Stepanova [ 2017-08-10 ]

perl ./mtr rpl.rpl_slave_load_remove_tmpfile,innodb_plugin,stmt rpl.rpl_stm_implicit_commit_binlog,innodb_plugin,stmt --noreorder --mem

rpl.rpl_slave_load_remove_tmpfile 'innodb_plugin,stmt' [ skipped ]  Test does not work with var being softlink
rpl.rpl_stm_implicit_commit_binlog 'innodb_plugin,stmt' [ fail ]
        Test ended at 2017-08-10 18:39:06
 
CURRENT_TEST: rpl.rpl_stm_implicit_commit_binlog
mysqltest: In included file "./include/rpl_init.inc": 
included from ./include/master-slave.inc at line 38:
included from /data/src/10.1-bug/mysql-test/suite/rpl/t/rpl_stm_implicit_commit_binlog.test at line 7:
At line 165: query 'SET GLOBAL gtid_slave_pos= ""' failed: 1198: This operation cannot be performed as you have a running slave ''; run STOP SLAVE '' first
 
The result from queries just before the failure was:
include/master-slave.inc
 
 - saving '/data/src/10.1-bug/mysql-test/var/log/rpl.rpl_stm_implicit_commit_binlog-innodb_plugin,stmt/' to '/data/src/10.1-bug/mysql-test/var/log/rpl.rpl_stm_implicit_commit_binlog-innodb_plugin,stmt/'
--------------------------------------------------------------------------
The servers were restarted 0 times
Spent 0.000 of 6 seconds executing testcases
 
Failure: Failed 1/1 tests, 0.00% were successful.

Comment by Kristian Nielsen [ 2017-08-11 ]

Right, currently master_slave.inc (or rpl_init.inc) must come after any have_xxx.inc checks that can cause the test case to be skipped.

The cleanup-after-failure involves restarting (and re-installing I think) all mysqld's, that would be too expensive to do for every skipped test.

It used to be that some have_xxx tests were interpreted before even running mysqltest. Then the test case never starts, and master-slave.inc cannot run. I prefered that as it is much faster, but I wonder if perhaps it is not like that anymore, since have_binlog_format_row can trigger this. Maybe that's also why this problem becomes worse?

It is in any case kind of wrong not to do skip tests before doing real work like starting slaves. Maybe just fix all of them at once? Using something like this to find them:

find mysql-test -type f -print0|xargs -0 grep -A10 -E 'master-slave\.inc|rpl_init\.inc' | grep -E 'source include/not_|source include/have_'

There are quite a few... but shouldn't be hard to just do all the reorderings once and for all?

Comment by Elena Stepanova [ 2017-08-11 ]

It used to be that some have_xxx tests were interpreted before even running mysqltest. Then the test case never starts, and master-slave.inc cannot run.

Probably some are still do, but it is not always possible. More often than not these files check the variables of the currently running server – so, the server must be started before the check is performed.

It is in any case kind of wrong not to do skip tests before doing real work like starting slaves

I agree that we should try to do it whenever possible, but again, I'm afraid it's not always possible. I was going to do it anyway, whenever possible, and see in how many cases it's not.

shouldn't be hard to just do all the reorderings once and for all?

Now, that's certainly impossible. We can fix existing tests, but there should certainly be new ones.

Comment by Kristian Nielsen [ 2017-08-11 ]

Now, that's certainly impossible. We can fix existing tests, but there should certainly be new ones.

Hehe, yes, that's true.

Maybe it would be best if check_testcase was also run for skipped tests?
From the code it looks like the "before" check_testase is run anyway.
I am thinking of this code:

      if ( $res == 0 ) {
	my $check_res;
	if ( $opt_check_testcases and
	     $check_res= check_testcase($tinfo, "after"))
	{
  ...
      elsif ( $res == 62 ) {
	# Testcase itself tell us to skip this one

Maybe add in the $res==62 case a call to check_testcase() similar to the $res==0 case?

Might make tests take longer, but perhaps not too much, that can be measured. And random failures that depent on test order are troublesome. Anyway, just a suggestion.

Generated at Thu Feb 08 07:53:14 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.