Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-11861

Some replication tests quit dirty when skipped from inside the test

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.0(EOL), 10.1(EOL), 10.2(EOL)
    • 10.0.33, 10.1.27, 10.2.9
    • Tests
    • None

    Description

      Some replication tests first establish and start replication, then find out that server options are not suitable and skip the rest of execution, thus leaving the replication running. It affects the next test started on the same servers when it attempts to configure replication and fails because it's not possible to do with replication already running.

      Example:

      perl ./mtr rpl.rpl_innodb_bug68220,row,xtradb rpl.rpl_not_null_innodb,row,xtradb --mysqld=--binlog-format=mixed
      ...
      rpl.rpl_innodb_bug68220 'row,xtradb'     [ skipped ]  Not ROW format
      rpl.rpl_not_null_innodb 'row,xtradb'     [ fail ]
              Test ended at 2017-01-21 16:38:53
       
      CURRENT_TEST: rpl.rpl_not_null_innodb
      mysqltest: In included file "./include/rpl_init.inc": 
      included from ./include/master-slave.inc at line 38:
      included from /data/src/10.1-bug/mysql-test/suite/rpl/t/rpl_not_null_innodb.test at line 15:
      At line 165: query 'SET GLOBAL gtid_slave_pos= ""' failed: 1198: This operation cannot be performed as you have a running slave ''; run STOP SLAVE '' first
       
      The result from queries just before the failure was:
      include/master-slave.inc
      

      The test rpl_innodb_bug68220.test starts this way:

      --source include/have_innodb.inc
      --source include/master-slave.inc
      --source include/have_binlog_format_row.inc
      

      An easy fix is to re-order include files, so that binlog format is checked before replication is configured:

      --source include/have_innodb.inc
      --source include/have_binlog_format_row.inc
      --source include/master-slave.inc
      

      perl ./mtr rpl.rpl_innodb_bug68220,row,xtradb rpl.rpl_not_null_innodb,row,xtradb --mysqld=--binlog-format=mixed
      ...
      ==============================================================================
       
      TEST                                      RESULT   TIME (ms) or COMMENT
      --------------------------------------------------------------------------
       
      worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019
      rpl.rpl_innodb_bug68220 'row,xtradb'     [ skipped ]  Not ROW format
      rpl.rpl_not_null_innodb 'row,xtradb'     [ skipped ]  Not ROW format
      --------------------------------------------------------------------------
      The servers were restarted 1 times
      Spent 0.000 of 11 seconds executing testcases
       
      Completed: All 0 tests were successful.
       
      2 tests were skipped, 2 by the test itself.
      

      A smart way, however, is to find out why MTR doesn't check the environment after a skipped test and doesn't clean-up, as it would have done if the test failed rather than was skipped.

      Attachments

        Activity

          elenst Elena Stepanova added a comment - - edited

          perl ./mtr rpl.rpl_slave_load_remove_tmpfile,innodb_plugin,stmt rpl.rpl_stm_implicit_commit_binlog,innodb_plugin,stmt --noreorder --mem
          

          rpl.rpl_slave_load_remove_tmpfile 'innodb_plugin,stmt' [ skipped ]  Test does not work with var being softlink
          rpl.rpl_stm_implicit_commit_binlog 'innodb_plugin,stmt' [ fail ]
                  Test ended at 2017-08-10 18:39:06
           
          CURRENT_TEST: rpl.rpl_stm_implicit_commit_binlog
          mysqltest: In included file "./include/rpl_init.inc": 
          included from ./include/master-slave.inc at line 38:
          included from /data/src/10.1-bug/mysql-test/suite/rpl/t/rpl_stm_implicit_commit_binlog.test at line 7:
          At line 165: query 'SET GLOBAL gtid_slave_pos= ""' failed: 1198: This operation cannot be performed as you have a running slave ''; run STOP SLAVE '' first
           
          The result from queries just before the failure was:
          include/master-slave.inc
           
           - saving '/data/src/10.1-bug/mysql-test/var/log/rpl.rpl_stm_implicit_commit_binlog-innodb_plugin,stmt/' to '/data/src/10.1-bug/mysql-test/var/log/rpl.rpl_stm_implicit_commit_binlog-innodb_plugin,stmt/'
          --------------------------------------------------------------------------
          The servers were restarted 0 times
          Spent 0.000 of 6 seconds executing testcases
           
          Failure: Failed 1/1 tests, 0.00% were successful.
          

          elenst Elena Stepanova added a comment - - edited perl ./mtr rpl.rpl_slave_load_remove_tmpfile,innodb_plugin,stmt rpl.rpl_stm_implicit_commit_binlog,innodb_plugin,stmt --noreorder --mem rpl.rpl_slave_load_remove_tmpfile 'innodb_plugin,stmt' [ skipped ] Test does not work with var being softlink rpl.rpl_stm_implicit_commit_binlog 'innodb_plugin,stmt' [ fail ] Test ended at 2017-08-10 18:39:06   CURRENT_TEST: rpl.rpl_stm_implicit_commit_binlog mysqltest: In included file "./include/rpl_init.inc": included from ./include/master-slave.inc at line 38: included from /data/src/10.1-bug/mysql-test/suite/rpl/t/rpl_stm_implicit_commit_binlog.test at line 7: At line 165: query 'SET GLOBAL gtid_slave_pos= ""' failed: 1198: This operation cannot be performed as you have a running slave ''; run STOP SLAVE '' first   The result from queries just before the failure was: include/master-slave.inc   - saving '/data/src/10.1-bug/mysql-test/var/log/rpl.rpl_stm_implicit_commit_binlog-innodb_plugin,stmt/' to '/data/src/10.1-bug/mysql-test/var/log/rpl.rpl_stm_implicit_commit_binlog-innodb_plugin,stmt/' -------------------------------------------------------------------------- The servers were restarted 0 times Spent 0.000 of 6 seconds executing testcases   Failure: Failed 1/1 tests, 0.00% were successful.

          Right, currently master_slave.inc (or rpl_init.inc) must come after any have_xxx.inc checks that can cause the test case to be skipped.

          The cleanup-after-failure involves restarting (and re-installing I think) all mysqld's, that would be too expensive to do for every skipped test.

          It used to be that some have_xxx tests were interpreted before even running mysqltest. Then the test case never starts, and master-slave.inc cannot run. I prefered that as it is much faster, but I wonder if perhaps it is not like that anymore, since have_binlog_format_row can trigger this. Maybe that's also why this problem becomes worse?

          It is in any case kind of wrong not to do skip tests before doing real work like starting slaves. Maybe just fix all of them at once? Using something like this to find them:

          find mysql-test -type f -print0|xargs -0 grep -A10 -E 'master-slave\.inc|rpl_init\.inc' | grep -E 'source include/not_|source include/have_'
          

          There are quite a few... but shouldn't be hard to just do all the reorderings once and for all?

          knielsen Kristian Nielsen added a comment - Right, currently master_slave.inc (or rpl_init.inc) must come after any have_xxx.inc checks that can cause the test case to be skipped. The cleanup-after-failure involves restarting (and re-installing I think) all mysqld's, that would be too expensive to do for every skipped test. It used to be that some have_xxx tests were interpreted before even running mysqltest. Then the test case never starts, and master-slave.inc cannot run. I prefered that as it is much faster, but I wonder if perhaps it is not like that anymore, since have_binlog_format_row can trigger this. Maybe that's also why this problem becomes worse? It is in any case kind of wrong not to do skip tests before doing real work like starting slaves. Maybe just fix all of them at once? Using something like this to find them: find mysql-test -type f -print0|xargs -0 grep -A10 -E 'master-slave\.inc|rpl_init\.inc' | grep -E 'source include/not_|source include/have_' There are quite a few... but shouldn't be hard to just do all the reorderings once and for all?

          It used to be that some have_xxx tests were interpreted before even running mysqltest. Then the test case never starts, and master-slave.inc cannot run.

          Probably some are still do, but it is not always possible. More often than not these files check the variables of the currently running server – so, the server must be started before the check is performed.

          It is in any case kind of wrong not to do skip tests before doing real work like starting slaves

          I agree that we should try to do it whenever possible, but again, I'm afraid it's not always possible. I was going to do it anyway, whenever possible, and see in how many cases it's not.

          shouldn't be hard to just do all the reorderings once and for all?

          Now, that's certainly impossible. We can fix existing tests, but there should certainly be new ones.

          elenst Elena Stepanova added a comment - It used to be that some have_xxx tests were interpreted before even running mysqltest. Then the test case never starts, and master-slave.inc cannot run. Probably some are still do, but it is not always possible. More often than not these files check the variables of the currently running server – so, the server must be started before the check is performed. It is in any case kind of wrong not to do skip tests before doing real work like starting slaves I agree that we should try to do it whenever possible, but again, I'm afraid it's not always possible. I was going to do it anyway, whenever possible, and see in how many cases it's not. shouldn't be hard to just do all the reorderings once and for all? Now, that's certainly impossible. We can fix existing tests, but there should certainly be new ones.

          Now, that's certainly impossible. We can fix existing tests, but there should certainly be new ones.

          Hehe, yes, that's true.

          Maybe it would be best if check_testcase was also run for skipped tests?
          From the code it looks like the "before" check_testase is run anyway.
          I am thinking of this code:

                if ( $res == 0 ) {
          	my $check_res;
          	if ( $opt_check_testcases and
          	     $check_res= check_testcase($tinfo, "after"))
          	{
            ...
                elsif ( $res == 62 ) {
          	# Testcase itself tell us to skip this one
          

          Maybe add in the $res==62 case a call to check_testcase() similar to the $res==0 case?

          Might make tests take longer, but perhaps not too much, that can be measured. And random failures that depent on test order are troublesome. Anyway, just a suggestion.

          knielsen Kristian Nielsen added a comment - Now, that's certainly impossible. We can fix existing tests, but there should certainly be new ones. Hehe, yes, that's true. Maybe it would be best if check_testcase was also run for skipped tests? From the code it looks like the "before" check_testase is run anyway. I am thinking of this code: if ( $res == 0 ) { my $check_res; if ( $opt_check_testcases and $check_res= check_testcase($tinfo, "after")) { ... elsif ( $res == 62 ) { # Testcase itself tell us to skip this one Maybe add in the $res==62 case a call to check_testcase() similar to the $res==0 case? Might make tests take longer, but perhaps not too much, that can be measured. And random failures that depent on test order are troublesome. Anyway, just a suggestion.

          People

            serg Sergei Golubchik
            elenst Elena Stepanova
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.