[MDEV-8075] DROP TEMPORARY TABLE not marked as ddl, causing optimistic parallel replication to fail Created: 2015-04-28  Updated: 2017-07-04  Resolved: 2017-07-04

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.1
Fix Version/s: 10.1.25

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Michael Widenius
Resolution: Fixed Votes: 0
Labels: None

Attachments: File mysql-bin.000001     File mysql-bin.index     File mysql-bin.state    

 Description   

Note: I am only getting the failure with optimistic mode, although there is no guarantee it never happens in other circumstances.

Stack trace from 10.1 commit 46816996

10.1/sql/sql_string.h:207: char* String::c_ptr(): Assertion `!alloced || !Ptr || !Alloced_length || (Alloced_length >= (str_length + 1))' failed.
150428 21:01:39 [ERROR] mysqld got signal 6 ;
 
#6  0x00007f2216eab311 in *__GI___assert_fail (assertion=0x7f221a0ff4b0 "!alloced || !Ptr || !Alloced_length || (Alloced_length >= (str_length + 1))", file=<optimized out>, line=207, function=0x7f221a103160 "char* String::c_ptr()") at assert.c:81
#7  0x00007f2219768e2d in String::c_ptr (this=0x7f21ec210158) at 10.1/sql/sql_string.h:206
#8  0x00007f2219777c77 in open_temporary_table (thd=0x7f21f0423070, tl=0x7f21f045f0f8) at 10.1/sql/sql_base.cc:5972
#9  0x00007f2219777db9 in open_temporary_tables (thd=0x7f21f0423070, tl_list=0x7f21f045f0f8) at 10.1/sql/sql_base.cc:6017
#10 0x00007f22197d8632 in mysql_execute_command (thd=0x7f21f0423070) at 10.1/sql/sql_parse.cc:2842
#11 0x00007f22197e5c1e in mysql_parse (thd=0x7f21f0423070, rawbuf=0x7f21e9da63f9 "UPDATE table_1 SET `col_int_key` = 6", length=36, parser_state=0x7f2216e7ea60) at 10.1/sql/sql_parse.cc:7165
#12 0x00007f2219afb890 in Query_log_event::do_apply_event (this=0x7f21e9e10c70, rgi=0x7f21e9dec800, query_arg=0x7f21e9da63f9 "UPDATE table_1 SET `col_int_key` = 6", q_len_arg=36) at 10.1/sql/log_event.cc:4287
#13 0x00007f2219afab15 in Query_log_event::do_apply_event (this=0x7f21e9e10c70, rgi=0x7f21e9dec800) at 10.1/sql/log_event.cc:4013
#14 0x00007f22197432c0 in Log_event::apply_event (this=0x7f21e9e10c70, rgi=0x7f21e9dec800) at 10.1/sql/log_event.h:1347
#15 0x00007f2219738e1b in apply_event_and_update_pos (ev=0x7f21e9e10c70, thd=0x7f21f0423070, rgi=0x7f21e9dec800, rpt=0x7f21f3ac3620) at 10.1/sql/slave.cc:3274
#16 0x00007f221996fd13 in rpt_handle_event (qev=0x7f21e9e09170, rpt=0x7f21f3ac3620) at 10.1/sql/rpl_parallel.cc:49
#17 0x00007f2219971fbc in handle_rpl_parallel_thread (arg=0x7f21f3ac3620) at 10.1/sql/rpl_parallel.cc:942
#18 0x00007f2218ec6b50 in start_thread (arg=<optimized out>) at pthread_create.c:304
#19 0x00007f2216f5b95d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112

To reproduce, I

  • start master on a clean datadir, with --server-id=1 --log-bin=mysql-bin, with the attached mysql-bin.* files placed into the datadir;
  • start slave on a clean datadir with --server-id=2 --slave-parallel-mode=optimistic --slave-parallel-threads=8;
  • run change master to master_host='127.0.0.1', master_port=3306, master_user='root'; start slave on the slave;
  • wait.

Takes a few minutes, crashes every time for me.



 Comments   
Comment by Kristian Nielsen [ 2015-04-29 ]

This is the binlog around where it fails (the assertion occurs during
replication of GTID 0-1-3605):

# at 770490
#150427 20:41:36 server id 1  end_log_pos 770528        GTID 0-1-3604
DROP /*!40005 TEMPORARY */ TABLE IF EXISTS `table_2`,`view_table10_innodb`,`table11_innodb`,`view_table11_innodb`,`table_1`
 
# at 770714
#150427 20:41:36 server id 1  end_log_pos 770752        GTID 0-1-3605 trans
BEGIN
UPDATE table_1 SET `col_int_key` = 6
CREATE TEMPORARY TABLE table_2 ( `col_int_key` INTEGER, KEY (`col_int_key`) ) SELECT /* QUERY_ID: 10265600 */ /* RESULTSET_SAME_DATA_IN_EVERY_ROW */ func_3 ( 4 ) AS `col_int_key` FROM `table11_innodb`
ROLLBACK

I think what happens is that in GTID 0-1-3604 we drop a temporary table
`table_1` that shadows the global `table_1`.

So when we speculatively execute 0-1-3605, we get the wrong table (the
temporary table that is supposed to be dropped in 0-1-3604), not the global
one.

(And even worse, we then wait for 0-1-3604 to drop the temporary table,
after which we try to use it - causing the assertion seen).

So this seems to show that DROP TEMPORARY TABLE needs to be marked as "ddl"
in the binlog, as speculative apply is not safe after it - just like any
other DDL statement.

A standalone DROP TEMPORARY TABLE is marked as ddl already, but there are
two more cases that do not. One is a DROP TEMPORARY inside a
transaction. The other is the implicit DROP TEMPORARY TABLE that is done
when a client connection is closed.

Comment by Kristian Nielsen [ 2015-05-06 ]

http://lists.askmonty.org/pipermail/commits/2015-May/007819.html

Comment by Kristian Nielsen [ 2015-05-07 ]

I discovered a problem with the patch. Consider queries like these:

begin; insert into t1 values (8); drop table t4;

A transaction that is implicitly committed by a following DDL. With the
patch, the transaction is errorneously marked as ddl, even though it is not.

I will come up with a new patch that fixes this ...

Comment by Kristian Nielsen [ 2015-05-15 ]

https://lists.launchpad.net/maria-developers/msg08571.html

Comment by Michael Widenius [ 2017-05-28 ]

I have now modified the patch for latest 10.1, fixed some issues with timezone and also added a fix for the ALTER TABLE.
I checked the variable no_2pc in detail and it looks you got it right, with reset().
I will verify with Sergei before pushing.

Comment by Michael Widenius [ 2017-05-29 ]

Ready to be reviewed. Most things are same as original patch and should be ok.

Only the usage of to_2pc should be checked. (I have done an initial check and it looks ok).

Comment by Michael Widenius [ 2017-07-04 ]

Fix reviewed and pushed

Generated at Thu Feb 08 07:24:27 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.