Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-8075

DROP TEMPORARY TABLE not marked as ddl, causing optimistic parallel replication to fail

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.1(EOL)
    • 10.1.25
    • Replication
    • None

    Description

      Note: I am only getting the failure with optimistic mode, although there is no guarantee it never happens in other circumstances.

      Stack trace from 10.1 commit 46816996

      10.1/sql/sql_string.h:207: char* String::c_ptr(): Assertion `!alloced || !Ptr || !Alloced_length || (Alloced_length >= (str_length + 1))' failed.
      150428 21:01:39 [ERROR] mysqld got signal 6 ;
       
      #6  0x00007f2216eab311 in *__GI___assert_fail (assertion=0x7f221a0ff4b0 "!alloced || !Ptr || !Alloced_length || (Alloced_length >= (str_length + 1))", file=<optimized out>, line=207, function=0x7f221a103160 "char* String::c_ptr()") at assert.c:81
      #7  0x00007f2219768e2d in String::c_ptr (this=0x7f21ec210158) at 10.1/sql/sql_string.h:206
      #8  0x00007f2219777c77 in open_temporary_table (thd=0x7f21f0423070, tl=0x7f21f045f0f8) at 10.1/sql/sql_base.cc:5972
      #9  0x00007f2219777db9 in open_temporary_tables (thd=0x7f21f0423070, tl_list=0x7f21f045f0f8) at 10.1/sql/sql_base.cc:6017
      #10 0x00007f22197d8632 in mysql_execute_command (thd=0x7f21f0423070) at 10.1/sql/sql_parse.cc:2842
      #11 0x00007f22197e5c1e in mysql_parse (thd=0x7f21f0423070, rawbuf=0x7f21e9da63f9 "UPDATE table_1 SET `col_int_key` = 6", length=36, parser_state=0x7f2216e7ea60) at 10.1/sql/sql_parse.cc:7165
      #12 0x00007f2219afb890 in Query_log_event::do_apply_event (this=0x7f21e9e10c70, rgi=0x7f21e9dec800, query_arg=0x7f21e9da63f9 "UPDATE table_1 SET `col_int_key` = 6", q_len_arg=36) at 10.1/sql/log_event.cc:4287
      #13 0x00007f2219afab15 in Query_log_event::do_apply_event (this=0x7f21e9e10c70, rgi=0x7f21e9dec800) at 10.1/sql/log_event.cc:4013
      #14 0x00007f22197432c0 in Log_event::apply_event (this=0x7f21e9e10c70, rgi=0x7f21e9dec800) at 10.1/sql/log_event.h:1347
      #15 0x00007f2219738e1b in apply_event_and_update_pos (ev=0x7f21e9e10c70, thd=0x7f21f0423070, rgi=0x7f21e9dec800, rpt=0x7f21f3ac3620) at 10.1/sql/slave.cc:3274
      #16 0x00007f221996fd13 in rpt_handle_event (qev=0x7f21e9e09170, rpt=0x7f21f3ac3620) at 10.1/sql/rpl_parallel.cc:49
      #17 0x00007f2219971fbc in handle_rpl_parallel_thread (arg=0x7f21f3ac3620) at 10.1/sql/rpl_parallel.cc:942
      #18 0x00007f2218ec6b50 in start_thread (arg=<optimized out>) at pthread_create.c:304
      #19 0x00007f2216f5b95d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112

      To reproduce, I

      • start master on a clean datadir, with --server-id=1 --log-bin=mysql-bin, with the attached mysql-bin.* files placed into the datadir;
      • start slave on a clean datadir with --server-id=2 --slave-parallel-mode=optimistic --slave-parallel-threads=8;
      • run change master to master_host='127.0.0.1', master_port=3306, master_user='root'; start slave on the slave;
      • wait.

      Takes a few minutes, crashes every time for me.

      Attachments

        1. mysql-bin.000001
          754 kB
        2. mysql-bin.index
          0.0 kB
        3. mysql-bin.state
          0.0 kB

        Activity

          knielsen Kristian Nielsen added a comment - - edited

          This is the binlog around where it fails (the assertion occurs during
          replication of GTID 0-1-3605):

          # at 770490
          #150427 20:41:36 server id 1  end_log_pos 770528        GTID 0-1-3604
          DROP /*!40005 TEMPORARY */ TABLE IF EXISTS `table_2`,`view_table10_innodb`,`table11_innodb`,`view_table11_innodb`,`table_1`
           
          # at 770714
          #150427 20:41:36 server id 1  end_log_pos 770752        GTID 0-1-3605 trans
          BEGIN
          UPDATE table_1 SET `col_int_key` = 6
          CREATE TEMPORARY TABLE table_2 ( `col_int_key` INTEGER, KEY (`col_int_key`) ) SELECT /* QUERY_ID: 10265600 */ /* RESULTSET_SAME_DATA_IN_EVERY_ROW */ func_3 ( 4 ) AS `col_int_key` FROM `table11_innodb`
          ROLLBACK

          I think what happens is that in GTID 0-1-3604 we drop a temporary table
          `table_1` that shadows the global `table_1`.

          So when we speculatively execute 0-1-3605, we get the wrong table (the
          temporary table that is supposed to be dropped in 0-1-3604), not the global
          one.

          (And even worse, we then wait for 0-1-3604 to drop the temporary table,
          after which we try to use it - causing the assertion seen).

          So this seems to show that DROP TEMPORARY TABLE needs to be marked as "ddl"
          in the binlog, as speculative apply is not safe after it - just like any
          other DDL statement.

          A standalone DROP TEMPORARY TABLE is marked as ddl already, but there are
          two more cases that do not. One is a DROP TEMPORARY inside a
          transaction. The other is the implicit DROP TEMPORARY TABLE that is done
          when a client connection is closed.

          knielsen Kristian Nielsen added a comment - - edited This is the binlog around where it fails (the assertion occurs during replication of GTID 0-1-3605): # at 770490 #150427 20:41:36 server id 1 end_log_pos 770528 GTID 0-1-3604 DROP /*!40005 TEMPORARY */ TABLE IF EXISTS `table_2`,`view_table10_innodb`,`table11_innodb`,`view_table11_innodb`,`table_1`   # at 770714 #150427 20:41:36 server id 1 end_log_pos 770752 GTID 0-1-3605 trans BEGIN UPDATE table_1 SET `col_int_key` = 6 CREATE TEMPORARY TABLE table_2 ( `col_int_key` INTEGER, KEY (`col_int_key`) ) SELECT /* QUERY_ID: 10265600 */ /* RESULTSET_SAME_DATA_IN_EVERY_ROW */ func_3 ( 4 ) AS `col_int_key` FROM `table11_innodb` ROLLBACK I think what happens is that in GTID 0-1-3604 we drop a temporary table `table_1` that shadows the global `table_1`. So when we speculatively execute 0-1-3605, we get the wrong table (the temporary table that is supposed to be dropped in 0-1-3604), not the global one. (And even worse, we then wait for 0-1-3604 to drop the temporary table, after which we try to use it - causing the assertion seen). So this seems to show that DROP TEMPORARY TABLE needs to be marked as "ddl" in the binlog, as speculative apply is not safe after it - just like any other DDL statement. A standalone DROP TEMPORARY TABLE is marked as ddl already, but there are two more cases that do not. One is a DROP TEMPORARY inside a transaction. The other is the implicit DROP TEMPORARY TABLE that is done when a client connection is closed.
          knielsen Kristian Nielsen added a comment - http://lists.askmonty.org/pipermail/commits/2015-May/007819.html

          I discovered a problem with the patch. Consider queries like these:

          begin; insert into t1 values (8); drop table t4;

          A transaction that is implicitly committed by a following DDL. With the
          patch, the transaction is errorneously marked as ddl, even though it is not.

          I will come up with a new patch that fixes this ...

          knielsen Kristian Nielsen added a comment - I discovered a problem with the patch. Consider queries like these: begin; insert into t1 values (8); drop table t4; A transaction that is implicitly committed by a following DDL. With the patch, the transaction is errorneously marked as ddl, even though it is not. I will come up with a new patch that fixes this ...
          knielsen Kristian Nielsen added a comment - https://lists.launchpad.net/maria-developers/msg08571.html

          I have now modified the patch for latest 10.1, fixed some issues with timezone and also added a fix for the ALTER TABLE.
          I checked the variable no_2pc in detail and it looks you got it right, with reset().
          I will verify with Sergei before pushing.

          monty Michael Widenius added a comment - I have now modified the patch for latest 10.1, fixed some issues with timezone and also added a fix for the ALTER TABLE. I checked the variable no_2pc in detail and it looks you got it right, with reset(). I will verify with Sergei before pushing.

          Ready to be reviewed. Most things are same as original patch and should be ok.

          Only the usage of to_2pc should be checked. (I have done an initial check and it looks ok).

          monty Michael Widenius added a comment - Ready to be reviewed. Most things are same as original patch and should be ok. Only the usage of to_2pc should be checked. (I have done an initial check and it looks ok).

          Fix reviewed and pushed

          monty Michael Widenius added a comment - Fix reviewed and pushed

          People

            monty Michael Widenius
            elenst Elena Stepanova
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.