Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-14822

binlog.binlog_killed fails with wrong result

Details

    Description

      In travis, we build the server with the following config line:

      cmake . -DCMAKE_BUILD_TYPE=RelWithDebInfo -DWITH_SSL=system -DWITH_ZLIB=system -DPLUGIN_AWS_KEY_MANAGEMENT=DYNAMIC -DAWS_SDK_EXTERNAL_PROJECT=ON -DWITHOUT_MROONGA_STORAGE_ENGINE=True
      

      Due to how travis environment is set up, the test fails when compiling using clang 5.0.1 or with GCC 6.3.0

      Locally, I was able the reproduce it with the following failure:

      +++ /home/vicentiu/Workspace/MariaDB-10.3/mysql-test/suite/binlog/r/binlog_killed.reject	2017-12-30 15:35:00.079888311 +0200
      @@ -264,12 +264,6 @@
       master-bin.000002	#	Intvar	#	#	INSERT_ID=7
       master-bin.000002	#	Query	#	#	use `test`; INSERT INTO t2 VALUES (NULL, 1)
       master-bin.000002	#	Query	#	#	COMMIT
      -master-bin.000002	#	Gtid	#	#	BEGIN GTID #-#-#
      -master-bin.000002	#	Intvar	#	#	INSERT_ID=8
      -master-bin.000002	#	Query	#	#	use `test`; INSERT INTO t1 VALUES (NULL, 1)
      -master-bin.000002	#	Intvar	#	#	INSERT_ID=9
      -master-bin.000002	#	Query	#	#	use `test`; INSERT INTO t1 VALUES (NULL, 1)
      -master-bin.000002	#	Query	#	#	ROLLBACK
       connection default;
       disconnect con1;
       disconnect con2;
       
      mysqltest: Result length mismatch
      

      However, at times the failure is like this:

      57589899 +0200
      +++ /home/vicentiu/Workspace/MariaDB-10.3/mysql-test/suite/binlog/r/binlog_killed.reject	2017-12-30 15:36:12.180412517 +0200
      @@ -198,10 +198,6 @@
       master-bin.000002	#	Intvar	#	#	INSERT_ID=3
       master-bin.000002	#	Query	#	#	use `test`; INSERT INTO t2 VALUES (NULL, 1)
       master-bin.000002	#	Query	#	#	COMMIT
      -master-bin.000002	#	Gtid	#	#	BEGIN GTID #-#-#
      -master-bin.000002	#	Intvar	#	#	INSERT_ID=4
      -master-bin.000002	#	Query	#	#	use `test`; INSERT INTO t1 VALUES (NULL, 1)
      -master-bin.000002	#	Query	#	#	ROLLBACK
       disconnect con3;
       connect  con3, localhost, root,,;
       connection con3;
      @@ -264,12 +260,6 @@
       master-bin.000002	#	Intvar	#	#	INSERT_ID=7
       master-bin.000002	#	Query	#	#	use `test`; INSERT INTO t2 VALUES (NULL, 1)
       master-bin.000002	#	Query	#	#	COMMIT
      -master-bin.000002	#	Gtid	#	#	BEGIN GTID #-#-#
      -master-bin.000002	#	Intvar	#	#	INSERT_ID=8
      -master-bin.000002	#	Query	#	#	use `test`; INSERT INTO t1 VALUES (NULL, 1)
      -master-bin.000002	#	Intvar	#	#	INSERT_ID=9
      -master-bin.000002	#	Query	#	#	use `test`; INSERT INTO t1 VALUES (NULL, 1)
      -master-bin.000002	#	Query	#	#	ROLLBACK
       connection default;
       disconnect con1;
       disconnect con2;
       
      mysqltest: Result length mismatch
      

      The common denominator it seems is that when running a transaction with a transactional storage engine (Innodb), in some cases we don't immediately flush to the binlog. If the kill query signal happens earlier, we don't write the transaction at all (including the ROLLBACK event). The problem is that the test doesn't always fail. With --repeat=100, I was able to get it to reproduce once every 3-4 runs.

      Ultimately, the binlog contents are valid either way, but the test needs to be stabilized in some way.

      The sequence of queries is roughly this:

      create table t1 (a int auto_increment, b int, PRIMARY KEY (a)) ENGINE=InnoDB;
      create table t2 (a int auto_increment, b int, PRIMARY KEY (a)) ENGINE=MyISAM;
      flush logs;
      connect  con3, localhost, root,,;
      connection con3;
      BEGIN;
      INSERT INTO t2 VALUES (NULL, 1);
      INSERT INTO t1 VALUES (NULL, 1);
      connection con3;
      KILL $id_of_con3;
      -- Since t2 is non-transactional, the binlog should have those rows, however t1 changes should be rolled back.
      

      Attachments

        Activity

          cvicentiu Vicențiu Ciorbaru added a comment - CC: Elkin monty

          Reverting c4581735d0210beba0733b30df8dd994786663fe partially seems to have solved the problem.

           
          diff --git a/sql/log.cc b/sql/log.cc
          index bdf0b6fdc59d..fc4b961e136b 100644
          --- a/sql/log.cc
          +++ b/sql/log.cc
          @@ -4178,6 +4178,7 @@ bool MYSQL_BIN_LOG::reset_logs(THD *thd, bool create_new_log,
               mysql_mutex_unlock(&LOCK_xid_list);
             }
           
          +  mysql_mutex_lock(&LOCK_thread_count);
             /* Save variables so that we can reopen the log */
             save_name=name;
             name=0;					// Protect against free
          @@ -4285,6 +4286,8 @@ bool MYSQL_BIN_LOG::reset_logs(THD *thd, bool create_new_log,
             if (error == 1)
               name= const_cast<char*>(save_name);
           
          +  mysql_mutex_unlock(&LOCK_thread_count);
          +
             if (!is_relay_log)
             {
               xid_count_per_binlog *b;
          

          cvicentiu VicenÈ›iu Ciorbaru added a comment - Reverting c4581735d0210beba0733b30df8dd994786663fe partially seems to have solved the problem.   diff --git a/sql/log.cc b/sql/log.cc index bdf0b6fdc59d..fc4b961e136b 100644 --- a/sql/log.cc +++ b/sql/log.cc @@ -4178,6 +4178,7 @@ bool MYSQL_BIN_LOG::reset_logs(THD *thd, bool create_new_log, mysql_mutex_unlock(&LOCK_xid_list); } + mysql_mutex_lock(&LOCK_thread_count); /* Save variables so that we can reopen the log */ save_name=name; name=0; // Protect against free @@ -4285,6 +4286,8 @@ bool MYSQL_BIN_LOG::reset_logs(THD *thd, bool create_new_log, if (error == 1) name= const_cast<char*>(save_name); + mysql_mutex_unlock(&LOCK_thread_count); + if (!is_relay_log) { xid_count_per_binlog *b;

          While trying to reproduce the problem on GCC 7.2.1, I've gotten the following failure:

          --- /home/vicentiu/Workspace/MariaDB-10.3/mysql-test/suite/binlog/r/binlog_killed.result	2017-12-16 18:24:27.257589899 +0200
          +++ /home/vicentiu/Workspace/MariaDB-10.3/mysql-test/suite/binlog/r/binlog_killed.reject	2017-12-30 19:20:03.784691846 +0200
          @@ -193,11 +193,11 @@
           master-bin.000002	#	Format_desc	#	#	SERVER_VERSION, BINLOG_VERSION
           master-bin.000002	#	Gtid_list	#	#	[#-#-#]
           master-bin.000002	#	Binlog_checkpoint	#	#	master-bin.000001
          -master-bin.000002	#	Binlog_checkpoint	#	#	master-bin.000002
           master-bin.000002	#	Gtid	#	#	BEGIN GTID #-#-#
           master-bin.000002	#	Intvar	#	#	INSERT_ID=3
           master-bin.000002	#	Query	#	#	use `test`; INSERT INTO t2 VALUES (NULL, 1)
           master-bin.000002	#	Query	#	#	COMMIT
          +master-bin.000002	#	Binlog_checkpoint	#	#	master-bin.000002
           master-bin.000002	#	Gtid	#	#	BEGIN GTID #-#-#
           master-bin.000002	#	Intvar	#	#	INSERT_ID=4
           master-bin.000002	#	Query	#	#	use `test`; INSERT INTO t1 VALUES (NULL, 1)
          

          cvicentiu Vicențiu Ciorbaru added a comment - While trying to reproduce the problem on GCC 7.2.1, I've gotten the following failure: --- /home/vicentiu/Workspace/MariaDB-10.3/mysql-test/suite/binlog/r/binlog_killed.result 2017-12-16 18:24:27.257589899 +0200 +++ /home/vicentiu/Workspace/MariaDB-10.3/mysql-test/suite/binlog/r/binlog_killed.reject 2017-12-30 19:20:03.784691846 +0200 @@ -193,11 +193,11 @@ master-bin.000002 # Format_desc # # SERVER_VERSION, BINLOG_VERSION master-bin.000002 # Gtid_list # # [#-#-#] master-bin.000002 # Binlog_checkpoint # # master-bin.000001 -master-bin.000002 # Binlog_checkpoint # # master-bin.000002 master-bin.000002 # Gtid # # BEGIN GTID #-#-# master-bin.000002 # Intvar # # INSERT_ID=3 master-bin.000002 # Query # # use `test`; INSERT INTO t2 VALUES (NULL, 1) master-bin.000002 # Query # # COMMIT +master-bin.000002 # Binlog_checkpoint # # master-bin.000002 master-bin.000002 # Gtid # # BEGIN GTID #-#-# master-bin.000002 # Intvar # # INSERT_ID=4 master-bin.000002 # Query # # use `test`; INSERT INTO t1 VALUES (NULL, 1)

          Binlog checkpoints can happen any time. They are more or less random.

          To get rid of them, one should start scripts with 'reset master' or use the new mtr variable "--let $skip_checkpoint_events=1" which removes them from show_binlog_events

          monty Michael Widenius added a comment - Binlog checkpoints can happen any time. They are more or less random. To get rid of them, one should start scripts with 'reset master' or use the new mtr variable "--let $skip_checkpoint_events=1" which removes them from show_binlog_events

          Problem was timing between the thread that was killed and the thread that was
          reading the binary log.

          Updated the test to wait until the killed thread was properly terminated
          before checking what's in the binary log.

          To make check safe, I changed "threads_connected" to be updated after
          thd::cleanup() is done, to ensure that all binary logs updates are done
          before the variable is changed. This was mainly done to get the
          test deterministic and have now other real influence in how the server
          works.

          monty Michael Widenius added a comment - Problem was timing between the thread that was killed and the thread that was reading the binary log. Updated the test to wait until the killed thread was properly terminated before checking what's in the binary log. To make check safe, I changed "threads_connected" to be updated after thd::cleanup() is done, to ensure that all binary logs updates are done before the variable is changed. This was mainly done to get the test deterministic and have now other real influence in how the server works.

          People

            monty Michael Widenius
            cvicentiu Vicențiu Ciorbaru
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.