Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-8133

ALTER TABLE can perform the operation but escape the binary log

Details

    • 10.0.22, 10.0.24

    Description

      If ALTER TABLE is interrupted, it can happen that it performs the operation, e.g. adds a column, but is not written to the binary log, which of course causes further discrepancy.

      The MTR test below is for 10.1 only, because it uses max_statement_time, but the problem is not specific to it, same happens with KILL QUERY (see the RQG test which can be used for 10.0 as well).

      The MTR test highly depends on the machine timing, so it might need some tuning, e.g. more or less data in the table, longer or shorter max_statement_time, etc.

      MTR test case

      --source include/have_innodb.inc
      --source include/have_binlog_format_statement.inc
       
      --disable_query_log
       
      set sql_log_bin = 0;
       
      create table t1 (pk int auto_increment primary key, i int, c varchar(8)) engine=InnoDB;
      insert into t1 (i,c) values (1,'foo'),(2,'bar'),(3,'qux'),(4,'foobar');
      insert into t1 select null, i, c from t1;
      insert into t1 select null, i, c from t1;
      insert into t1 select null, i, c from t1;
      insert into t1 select null, i, c from t1;
      insert into t1 select null, i, c from t1;
      insert into t1 select null, i, c from t1;
      insert into t1 select null, i, c from t1;
      insert into t1 select null, i, c from t1;
      insert into t1 select null, i, c from t1;
      insert into t1 select null, i, c from t1;
       
      set sql_log_bin = 1;
       
      let $col_count = 3;
      let $master_pos = query_get_value(show master status, Position, 1);
       
      let $timeout = 3;
       
      while ($timeout)
      {
        eval set max_statement_time = $timeout;
        let $run = 10;
        while ($run)
        {
          let $colname = c$timeout-$run;
          echo # Adding $colname;
          --error 0,ER_STATEMENT_TIMEOUT,ER_QUERY_INTERRUPTED
          eval alter table t1 add `$colname` int;
          let $new_col_count = `select count(*) from information_schema.columns where table_name = 't1'`;
          echo # Error code $mysql_errno, new column count $new_col_count;
          if ($new_col_count > $col_count) {
           let $new_master_pos = query_get_value(show master status, Position, 1);
           if ($new_master_pos == $master_pos) 
           {
             show binlog events;
             show create table t1;
             echo "Column $colname was added, but ALTER statement was not written to the binlog";
             die "Binlog problem";
           }
           let $master_pos = $new_master_pos;
          }
          let $col_count = $new_col_count;
          dec $run;
        }
        dec $timeout;
      }
       
      drop table t1;

      Example of the failure

      ...
      # Adding c2-9
      # Error code 0, new column count 15
      # Adding c2-8
      # Error code 1968, new column count 16
      Log_name	Pos	Event_type	Server_id	End_log_pos	Info
      master-bin.000001	4	Format_desc	1	248	Server ver: 10.1.5-MariaDB-wsrep-debug-log, Binlog ver: 4
      ...
      master-bin.000001	1745	Gtid	1	1783	GTID 0-1-12
      master-bin.000001	1783	Query	1	1875	use `test`; alter table t1 add `c2-9` int
      Table	Create Table
      t1	CREATE TABLE `t1` (
        `pk` int(11) NOT NULL AUTO_INCREMENT,
        `i` int(11) DEFAULT NULL,
        `c` varchar(8) DEFAULT NULL,
        `c3-10` int(11) DEFAULT NULL,
        `c3-9` int(11) DEFAULT NULL,
        `c3-8` int(11) DEFAULT NULL,
        `c3-7` int(11) DEFAULT NULL,
        `c3-6` int(11) DEFAULT NULL,
        `c3-5` int(11) DEFAULT NULL,
        `c3-4` int(11) DEFAULT NULL,
        `c3-3` int(11) DEFAULT NULL,
        `c3-2` int(11) DEFAULT NULL,
        `c3-1` int(11) DEFAULT NULL,
        `c2-10` int(11) DEFAULT NULL,
        `c2-9` int(11) DEFAULT NULL,
        `c2-8` int(11) DEFAULT NULL,
        PRIMARY KEY (`pk`)
      ) ENGINE=InnoDB AUTO_INCREMENT=8179 DEFAULT CHARSET=latin1
      "Column c2-8 was added, but ALTER statement was not written to the binlog"

      For running the RQG test, clone lp:~elenst/randgen/mariadb-patches (the main randgen tree might miss some changes or fixes in the required components).

      RQG grammar alter.yy

      query:
      	ALTER TABLE _table ADD _letter[invariant] INT ; ALTER TABLE _table DROP _letter[invariant] ;

      RQG command line

      perl ./runall-new.pl --threads=1 --duration=200 --queries=100M --reporters=QueryTimeout,BinlogConsistency --grammar=alter.yy --engine=InnoDB --mysqld=--log-bin --rows=100,100,100,100,100,100,100,100,1000 --querytimeout=1 --basedir=<basedir> --vardir=<vardir> 

      The failure can come in two flavors: either it complains about binlog replaying error due to an unknown column, or reports multiple differences like

      -INSERT INTO `BB` VALUES (100,6,NULL,'2002-04-16','2002-04-16','00:43:36','00:43:36','2002-05-14 10:52:52','2002-05-14 10:52:52',NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL);
      +INSERT INTO `BB` VALUES (100,6,NULL,'2002-04-16','2002-04-16','00:43:36','00:43:36','2002-05-14 10:52:52','2002-05-14 10:52:52',NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL);

      where the only difference is the number of columns in the table.

      If neither happens, try to increase the number of threads on the command line.

      Attachments

        Activity

          Sorry, but I cannot repeat it with rqg. Tried to increase the number of threads — didn't help. I'd rather not use the first (mtr) test case, because I need to repeat it in 10.0.

          serg Sergei Golubchik added a comment - Sorry, but I cannot repeat it with rqg. Tried to increase the number of threads — didn't help. I'd rather not use the first (mtr) test case, because I need to repeat it in 10.0.
          elenst Elena Stepanova added a comment - - edited

          I have set it up on perro (the RQG test).

          • login as usual

            cd mdev8133
            . ./cmd

            The server is in ~/git/10.0-mdev8133

          elenst Elena Stepanova added a comment - - edited I have set it up on perro (the RQG test). login as usual cd mdev8133 . ./cmd The server is in ~/git/10.0-mdev8133

          I don't think this can be fixed completely. In-place ALTER cannot be undone and there are lots of actions (tables are opened, renamed, etc) that can fail after the in-place table modifications were done. After these failures we may end up with a half-done ALTER, for example, the table has the old name (if ALTER...RENAME) but a new structure.

          The only “fix” I can think of is to abort the replication. This can be easily achieved by binlogging ALTER together with the error code. There will be no error on the slave and the replication will stop. Not very intuitive behavior, though.

          serg Sergei Golubchik added a comment - I don't think this can be fixed completely. In-place ALTER cannot be undone and there are lots of actions (tables are opened, renamed, etc) that can fail after the in-place table modifications were done. After these failures we may end up with a half-done ALTER, for example, the table has the old name (if ALTER...RENAME) but a new structure. The only “fix” I can think of is to abort the replication. This can be easily achieved by binlogging ALTER together with the error code. There will be no error on the slave and the replication will stop. Not very intuitive behavior, though.

          Is it an upstream bug?

          serg Sergei Golubchik added a comment - Is it an upstream bug?

          I don't see any notes about it in my description or comments, apparently I didn't check. I will do it shortly.

          elenst Elena Stepanova added a comment - I don't see any notes about it in my description or comments, apparently I didn't check. I will do it shortly.

          Could not reproduce on MariaDB 5.5, MySQL 5.5, MySQL 5.6 (for the latter, ran 20 attempts of the RQG test – for MariaDB 10.x usually 1-2 is enough).

          elenst Elena Stepanova added a comment - Could not reproduce on MariaDB 5.5, MySQL 5.5, MySQL 5.6 (for the latter, ran 20 attempts of the RQG test – for MariaDB 10.x usually 1-2 is enough).

          People

            serg Sergei Golubchik
            elenst Elena Stepanova
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.