Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-8075

DROP TEMPORARY TABLE not marked as ddl, causing optimistic parallel replication to fail

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.1(EOL)
    • 10.1.25
    • Replication
    • None

    Description

      Note: I am only getting the failure with optimistic mode, although there is no guarantee it never happens in other circumstances.

      Stack trace from 10.1 commit 46816996

      10.1/sql/sql_string.h:207: char* String::c_ptr(): Assertion `!alloced || !Ptr || !Alloced_length || (Alloced_length >= (str_length + 1))' failed.
      150428 21:01:39 [ERROR] mysqld got signal 6 ;
       
      #6  0x00007f2216eab311 in *__GI___assert_fail (assertion=0x7f221a0ff4b0 "!alloced || !Ptr || !Alloced_length || (Alloced_length >= (str_length + 1))", file=<optimized out>, line=207, function=0x7f221a103160 "char* String::c_ptr()") at assert.c:81
      #7  0x00007f2219768e2d in String::c_ptr (this=0x7f21ec210158) at 10.1/sql/sql_string.h:206
      #8  0x00007f2219777c77 in open_temporary_table (thd=0x7f21f0423070, tl=0x7f21f045f0f8) at 10.1/sql/sql_base.cc:5972
      #9  0x00007f2219777db9 in open_temporary_tables (thd=0x7f21f0423070, tl_list=0x7f21f045f0f8) at 10.1/sql/sql_base.cc:6017
      #10 0x00007f22197d8632 in mysql_execute_command (thd=0x7f21f0423070) at 10.1/sql/sql_parse.cc:2842
      #11 0x00007f22197e5c1e in mysql_parse (thd=0x7f21f0423070, rawbuf=0x7f21e9da63f9 "UPDATE table_1 SET `col_int_key` = 6", length=36, parser_state=0x7f2216e7ea60) at 10.1/sql/sql_parse.cc:7165
      #12 0x00007f2219afb890 in Query_log_event::do_apply_event (this=0x7f21e9e10c70, rgi=0x7f21e9dec800, query_arg=0x7f21e9da63f9 "UPDATE table_1 SET `col_int_key` = 6", q_len_arg=36) at 10.1/sql/log_event.cc:4287
      #13 0x00007f2219afab15 in Query_log_event::do_apply_event (this=0x7f21e9e10c70, rgi=0x7f21e9dec800) at 10.1/sql/log_event.cc:4013
      #14 0x00007f22197432c0 in Log_event::apply_event (this=0x7f21e9e10c70, rgi=0x7f21e9dec800) at 10.1/sql/log_event.h:1347
      #15 0x00007f2219738e1b in apply_event_and_update_pos (ev=0x7f21e9e10c70, thd=0x7f21f0423070, rgi=0x7f21e9dec800, rpt=0x7f21f3ac3620) at 10.1/sql/slave.cc:3274
      #16 0x00007f221996fd13 in rpt_handle_event (qev=0x7f21e9e09170, rpt=0x7f21f3ac3620) at 10.1/sql/rpl_parallel.cc:49
      #17 0x00007f2219971fbc in handle_rpl_parallel_thread (arg=0x7f21f3ac3620) at 10.1/sql/rpl_parallel.cc:942
      #18 0x00007f2218ec6b50 in start_thread (arg=<optimized out>) at pthread_create.c:304
      #19 0x00007f2216f5b95d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112

      To reproduce, I

      • start master on a clean datadir, with --server-id=1 --log-bin=mysql-bin, with the attached mysql-bin.* files placed into the datadir;
      • start slave on a clean datadir with --server-id=2 --slave-parallel-mode=optimistic --slave-parallel-threads=8;
      • run change master to master_host='127.0.0.1', master_port=3306, master_user='root'; start slave on the slave;
      • wait.

      Takes a few minutes, crashes every time for me.

      Attachments

        1. mysql-bin.000001
          754 kB
        2. mysql-bin.index
          0.0 kB
        3. mysql-bin.state
          0.0 kB

        Activity

          elenst Elena Stepanova created issue -
          elenst Elena Stepanova made changes -
          Field Original Value New Value
          Description _Note: I am only getting the failure with optimistic mode, although there is no guarantee it never happens in other circumstances._

          {noformat:title=Stack trace from 10.1 commit 46816996}
          10.1/sql/sql_string.h:207: char* String::c_ptr(): Assertion `!alloced || !Ptr || !Al
          loced_length || (Alloced_length >= (str_length + 1))' failed.
          150428 21:01:39 [ERROR] mysqld got signal 6 ;

          #6 0x00007f2216eab311 in *__GI___assert_fail (assertion=0x7f221a0ff4b0 "!alloced || !Ptr || !Alloced_length || (Alloced_length >= (str_length + 1))", file=<optimized out>, line=207, function=0x7f221a103160 "char* String::c_ptr()") at assert.c:81
          #7 0x00007f2219768e2d in String::c_ptr (this=0x7f21ec210158) at 10.1/sql/sql_string.h:206
          #8 0x00007f2219777c77 in open_temporary_table (thd=0x7f21f0423070, tl=0x7f21f045f0f8) at 10.1/sql/sql_base.cc:5972
          #9 0x00007f2219777db9 in open_temporary_tables (thd=0x7f21f0423070, tl_list=0x7f21f045f0f8) at 10.1/sql/sql_base.cc:6017
          #10 0x00007f22197d8632 in mysql_execute_command (thd=0x7f21f0423070) at 10.1/sql/sql_parse.cc:2842
          #11 0x00007f22197e5c1e in mysql_parse (thd=0x7f21f0423070, rawbuf=0x7f21e9da63f9 "UPDATE table_1 SET `col_int_key` = 6", length=36, parser_state=0x7f2216e7ea60) at 10.1/sql/sql_parse.cc:7165
          #12 0x00007f2219afb890 in Query_log_event::do_apply_event (this=0x7f21e9e10c70, rgi=0x7f21e9dec800, query_arg=0x7f21e9da63f9 "UPDATE table_1 SET `col_int_key` = 6", q_len_arg=36) at 10.1/sql/log_event.cc:4287
          #13 0x00007f2219afab15 in Query_log_event::do_apply_event (this=0x7f21e9e10c70, rgi=0x7f21e9dec800) at 10.1/sql/log_event.cc:4013
          #14 0x00007f22197432c0 in Log_event::apply_event (this=0x7f21e9e10c70, rgi=0x7f21e9dec800) at 10.1/sql/log_event.h:1347
          #15 0x00007f2219738e1b in apply_event_and_update_pos (ev=0x7f21e9e10c70, thd=0x7f21f0423070, rgi=0x7f21e9dec800, rpt=0x7f21f3ac3620) at 10.1/sql/slave.cc:3274
          #16 0x00007f221996fd13 in rpt_handle_event (qev=0x7f21e9e09170, rpt=0x7f21f3ac3620) at 10.1/sql/rpl_parallel.cc:49
          #17 0x00007f2219971fbc in handle_rpl_parallel_thread (arg=0x7f21f3ac3620) at 10.1/sql/rpl_parallel.cc:942
          #18 0x00007f2218ec6b50 in start_thread (arg=<optimized out>) at pthread_create.c:304
          #19 0x00007f2216f5b95d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
          {noformat}

          To reproduce, I
          - start master on a clean datadir, with {{--server-id=1 --log-bin=mysql-bin}}, with the attached mysql-bin.* files placed into the datadir;
          - start slave on a clean datadir with {{--server-id=2 --slave-parallel-mode=optimistic --slave-parallel-threads=8}};
          - run {{change master to master_host='127.0.0.1', master_port=3306, master_user='root'; start slave}} on the slave;
          - wait.

          Takes a few minutes, crashes every time for me.
          _Note: I am only getting the failure with optimistic mode, although there is no guarantee it never happens in other circumstances._

          {noformat:title=Stack trace from 10.1 commit 46816996}
          10.1/sql/sql_string.h:207: char* String::c_ptr(): Assertion `!alloced || !Ptr || !Alloced_length || (Alloced_length >= (str_length + 1))' failed.
          150428 21:01:39 [ERROR] mysqld got signal 6 ;

          #6 0x00007f2216eab311 in *__GI___assert_fail (assertion=0x7f221a0ff4b0 "!alloced || !Ptr || !Alloced_length || (Alloced_length >= (str_length + 1))", file=<optimized out>, line=207, function=0x7f221a103160 "char* String::c_ptr()") at assert.c:81
          #7 0x00007f2219768e2d in String::c_ptr (this=0x7f21ec210158) at 10.1/sql/sql_string.h:206
          #8 0x00007f2219777c77 in open_temporary_table (thd=0x7f21f0423070, tl=0x7f21f045f0f8) at 10.1/sql/sql_base.cc:5972
          #9 0x00007f2219777db9 in open_temporary_tables (thd=0x7f21f0423070, tl_list=0x7f21f045f0f8) at 10.1/sql/sql_base.cc:6017
          #10 0x00007f22197d8632 in mysql_execute_command (thd=0x7f21f0423070) at 10.1/sql/sql_parse.cc:2842
          #11 0x00007f22197e5c1e in mysql_parse (thd=0x7f21f0423070, rawbuf=0x7f21e9da63f9 "UPDATE table_1 SET `col_int_key` = 6", length=36, parser_state=0x7f2216e7ea60) at 10.1/sql/sql_parse.cc:7165
          #12 0x00007f2219afb890 in Query_log_event::do_apply_event (this=0x7f21e9e10c70, rgi=0x7f21e9dec800, query_arg=0x7f21e9da63f9 "UPDATE table_1 SET `col_int_key` = 6", q_len_arg=36) at 10.1/sql/log_event.cc:4287
          #13 0x00007f2219afab15 in Query_log_event::do_apply_event (this=0x7f21e9e10c70, rgi=0x7f21e9dec800) at 10.1/sql/log_event.cc:4013
          #14 0x00007f22197432c0 in Log_event::apply_event (this=0x7f21e9e10c70, rgi=0x7f21e9dec800) at 10.1/sql/log_event.h:1347
          #15 0x00007f2219738e1b in apply_event_and_update_pos (ev=0x7f21e9e10c70, thd=0x7f21f0423070, rgi=0x7f21e9dec800, rpt=0x7f21f3ac3620) at 10.1/sql/slave.cc:3274
          #16 0x00007f221996fd13 in rpt_handle_event (qev=0x7f21e9e09170, rpt=0x7f21f3ac3620) at 10.1/sql/rpl_parallel.cc:49
          #17 0x00007f2219971fbc in handle_rpl_parallel_thread (arg=0x7f21f3ac3620) at 10.1/sql/rpl_parallel.cc:942
          #18 0x00007f2218ec6b50 in start_thread (arg=<optimized out>) at pthread_create.c:304
          #19 0x00007f2216f5b95d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
          {noformat}

          To reproduce, I
          - start master on a clean datadir, with {{--server-id=1 --log-bin=mysql-bin}}, with the attached mysql-bin.* files placed into the datadir;
          - start slave on a clean datadir with {{--server-id=2 --slave-parallel-mode=optimistic --slave-parallel-threads=8}};
          - run {{change master to master_host='127.0.0.1', master_port=3306, master_user='root'; start slave}} on the slave;
          - wait.

          Takes a few minutes, crashes every time for me.
          knielsen Kristian Nielsen made changes -
          Status Open [ 1 ] In Progress [ 3 ]
          knielsen Kristian Nielsen added a comment - - edited

          This is the binlog around where it fails (the assertion occurs during
          replication of GTID 0-1-3605):

          # at 770490
          #150427 20:41:36 server id 1  end_log_pos 770528        GTID 0-1-3604
          DROP /*!40005 TEMPORARY */ TABLE IF EXISTS `table_2`,`view_table10_innodb`,`table11_innodb`,`view_table11_innodb`,`table_1`
           
          # at 770714
          #150427 20:41:36 server id 1  end_log_pos 770752        GTID 0-1-3605 trans
          BEGIN
          UPDATE table_1 SET `col_int_key` = 6
          CREATE TEMPORARY TABLE table_2 ( `col_int_key` INTEGER, KEY (`col_int_key`) ) SELECT /* QUERY_ID: 10265600 */ /* RESULTSET_SAME_DATA_IN_EVERY_ROW */ func_3 ( 4 ) AS `col_int_key` FROM `table11_innodb`
          ROLLBACK

          I think what happens is that in GTID 0-1-3604 we drop a temporary table
          `table_1` that shadows the global `table_1`.

          So when we speculatively execute 0-1-3605, we get the wrong table (the
          temporary table that is supposed to be dropped in 0-1-3604), not the global
          one.

          (And even worse, we then wait for 0-1-3604 to drop the temporary table,
          after which we try to use it - causing the assertion seen).

          So this seems to show that DROP TEMPORARY TABLE needs to be marked as "ddl"
          in the binlog, as speculative apply is not safe after it - just like any
          other DDL statement.

          A standalone DROP TEMPORARY TABLE is marked as ddl already, but there are
          two more cases that do not. One is a DROP TEMPORARY inside a
          transaction. The other is the implicit DROP TEMPORARY TABLE that is done
          when a client connection is closed.

          knielsen Kristian Nielsen added a comment - - edited This is the binlog around where it fails (the assertion occurs during replication of GTID 0-1-3605): # at 770490 #150427 20:41:36 server id 1 end_log_pos 770528 GTID 0-1-3604 DROP /*!40005 TEMPORARY */ TABLE IF EXISTS `table_2`,`view_table10_innodb`,`table11_innodb`,`view_table11_innodb`,`table_1`   # at 770714 #150427 20:41:36 server id 1 end_log_pos 770752 GTID 0-1-3605 trans BEGIN UPDATE table_1 SET `col_int_key` = 6 CREATE TEMPORARY TABLE table_2 ( `col_int_key` INTEGER, KEY (`col_int_key`) ) SELECT /* QUERY_ID: 10265600 */ /* RESULTSET_SAME_DATA_IN_EVERY_ROW */ func_3 ( 4 ) AS `col_int_key` FROM `table11_innodb` ROLLBACK I think what happens is that in GTID 0-1-3604 we drop a temporary table `table_1` that shadows the global `table_1`. So when we speculatively execute 0-1-3605, we get the wrong table (the temporary table that is supposed to be dropped in 0-1-3604), not the global one. (And even worse, we then wait for 0-1-3604 to drop the temporary table, after which we try to use it - causing the assertion seen). So this seems to show that DROP TEMPORARY TABLE needs to be marked as "ddl" in the binlog, as speculative apply is not safe after it - just like any other DDL statement. A standalone DROP TEMPORARY TABLE is marked as ddl already, but there are two more cases that do not. One is a DROP TEMPORARY inside a transaction. The other is the implicit DROP TEMPORARY TABLE that is done when a client connection is closed.
          knielsen Kristian Nielsen made changes -
          Summary Assertion `!alloced || !Ptr || !Al loced_length || (Alloced_length >= (str_length + 1))' failed in String::c_ptr() DROP TEMPORARY TABLE not marked as ddl, causing optimistic parallel replication to fail
          knielsen Kristian Nielsen added a comment - http://lists.askmonty.org/pipermail/commits/2015-May/007819.html

          I discovered a problem with the patch. Consider queries like these:

          begin; insert into t1 values (8); drop table t4;

          A transaction that is implicitly committed by a following DDL. With the
          patch, the transaction is errorneously marked as ddl, even though it is not.

          I will come up with a new patch that fixes this ...

          knielsen Kristian Nielsen added a comment - I discovered a problem with the patch. Consider queries like these: begin; insert into t1 values (8); drop table t4; A transaction that is implicitly committed by a following DDL. With the patch, the transaction is errorneously marked as ddl, even though it is not. I will come up with a new patch that fixes this ...
          knielsen Kristian Nielsen made changes -
          Summary DROP TEMPORARY TABLE not marked as ddl, causing optimistic parallel replication to fail NEED REVIEW: DROP TEMPORARY TABLE not marked as ddl, causing optimistic parallel replication to fail
          knielsen Kristian Nielsen made changes -
          Status In Progress [ 3 ] Stalled [ 10000 ]
          knielsen Kristian Nielsen added a comment - https://lists.launchpad.net/maria-developers/msg08571.html
          ratzpo Rasmus Johansson (Inactive) made changes -
          Workflow MariaDB v2 [ 60721 ] MariaDB v3 [ 66265 ]

          I have now modified the patch for latest 10.1, fixed some issues with timezone and also added a fix for the ALTER TABLE.
          I checked the variable no_2pc in detail and it looks you got it right, with reset().
          I will verify with Sergei before pushing.

          monty Michael Widenius added a comment - I have now modified the patch for latest 10.1, fixed some issues with timezone and also added a fix for the ALTER TABLE. I checked the variable no_2pc in detail and it looks you got it right, with reset(). I will verify with Sergei before pushing.
          monty Michael Widenius made changes -
          Assignee Kristian Nielsen [ knielsen ] Michael Widenius [ monty ]

          Ready to be reviewed. Most things are same as original patch and should be ok.

          Only the usage of to_2pc should be checked. (I have done an initial check and it looks ok).

          monty Michael Widenius added a comment - Ready to be reviewed. Most things are same as original patch and should be ok. Only the usage of to_2pc should be checked. (I have done an initial check and it looks ok).
          monty Michael Widenius made changes -
          Status Stalled [ 10000 ] In Review [ 10002 ]

          Fix reviewed and pushed

          monty Michael Widenius added a comment - Fix reviewed and pushed
          monty Michael Widenius made changes -
          Fix Version/s 10.1.25 [ 22542 ]
          Fix Version/s 10.1 [ 16100 ]
          Resolution Fixed [ 1 ]
          Status In Review [ 10002 ] Closed [ 6 ]
          serg Sergei Golubchik made changes -
          Summary NEED REVIEW: DROP TEMPORARY TABLE not marked as ddl, causing optimistic parallel replication to fail DROP TEMPORARY TABLE not marked as ddl, causing optimistic parallel replication to fail
          serg Sergei Golubchik made changes -
          Workflow MariaDB v3 [ 66265 ] MariaDB v4 [ 149109 ]

          People

            monty Michael Widenius
            elenst Elena Stepanova
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.