[MDEV-8147] Assertion `m_lock_type == 2' failed in handler::ha_close() during parallel replication Created: 2015-05-12  Updated: 2015-05-26  Resolved: 2015-05-26

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.0, 10.1
Fix Version/s: 10.0.20, 10.1.5

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Kristian Nielsen
Resolution: Fixed Votes: 0
Labels: None

Attachments: File mysql-bin.000001     File mysql-bin.index     File mysql-bin.state    

 Description   

Note: Unfortunately I can't check if the problem is reproducible on 10.0, because the binlog uses syntax introduced in 10.1 with table encryption.

Stack trace is from 10.1 commit 8c54182e + patch http://lists.askmonty.org/pipermail/commits/2015-May/007819.html ; the binlog was produced on 10.1 commit 2f25c653 + the same patch + patch for MDEV-8045.

sql/handler.cc:2579: int handler::ha_close(): Assertion `m_lock_type == 2' failed.
150512 17:14:41 [ERROR] mysqld got signal 6 ;
 
#6  0x00007f0d3fb15311 in *__GI___assert_fail (assertion=0x7f0d42e723ce "m_lock_type == 2", file=<optimized out>, line=2579, function=0x7f0d42e76170 "int handler::ha_close()") at assert.c:81
#7  0x00007f0d4267c967 in handler::ha_close (this=0x7f0d14cae888) at 10.1-patched/sql/handler.cc:2579
#8  0x00007f0d425344e4 in closefrm (table=0x7f0d0e82c070, free_share=false) at 10.1-patched/sql/table.cc:3012
#9  0x00007f0d423e08b0 in close_temporary (table=0x7f0d0e82c070, free_share=true, delete_table=false) at 10.1-patched/sql/sql_base.cc:1812
#10 0x00007f0d42566527 in Relay_log_info::close_temporary_tables (this=0x7f0d2d488c40) at 10.1-patched/sql/rpl_rli.cc:1069
#11 0x00007f0d4276d5fa in Start_log_event_v3::do_apply_event (this=0x7f0d1409c1b0, rgi=0x7f0d14020800) at 10.1-patched/sql/log_event.cc:4728
#12 0x00007f0d4276e217 in Format_description_log_event::do_apply_event (this=0x7f0d1409c1b0, rgi=0x7f0d14020800) at 10.1-patched/sql/log_event.cc:5103
#13 0x00007f0d423b3a3a in Log_event::apply_event (this=0x7f0d1409c1b0, rgi=0x7f0d14020800) at 10.1-patched/sql/log_event.h:1347
#14 0x00007f0d423a95af in apply_event_and_update_pos (ev=0x7f0d1409c1b0, thd=0x7f0d14049070, rgi=0x7f0d14020800, rpt=0x0) at 10.1-patched/sql/slave.cc:3274
#15 0x00007f0d425e0843 in rpt_handle_event (qev=0x7f0d146d6570, rpt=0x0) at 10.1-patched/sql/rpl_parallel.cc:49
#16 0x00007f0d425e635c in rpl_parallel::do_event (this=0x7f0d2d48bf40, serial_rgi=0x7f0d14020800, ev=0x7f0d1409c1b0, event_size=244) at 10.1-patched/sql/rpl_parallel.cc:2357
#17 0x00007f0d423a9bbf in exec_relay_log_event (thd=0x7f0d14049070, rli=0x7f0d2d488c40, serial_rgi=0x7f0d14020800) at 10.1-patched/sql/slave.cc:3525
#18 0x00007f0d423ad0d0 in handle_slave_sql (arg=0x7f0d2d487000) at 10.1-patched/sql/slave.cc:4677
#19 0x00007f0d41b30b50 in start_thread (arg=<optimized out>) at pthread_create.c:304
#20 0x00007f0d3fbc595d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112

To reproduce:

  • start master with all default options + --log-bin=mysql-bin --server-id=1, with the attached mysql-bin.* files in the datadir;
  • start slave with all default options + --server-id=2 --plugin-load-add=file_key_management.so --file_key_management_filename=<your basedir>/mysql-test/std_data/keys.txt --slave_parallel_threads=8;
  • set up replication (either old-fashioned or GTID, doesn't matter);
  • wait.


 Comments   
Comment by Kristian Nielsen [ 2015-05-20 ]

Hm, unfortunately it did not reproduce for me in like 5 or 10 runs (the slave just replicated to completion without errors).
EDIT: Ok, it reproduces fine for me. I was doing START SLAVE UNTIL <end of binlog>, but the assertion happens only on the first event in the following binlog

Comment by Kristian Nielsen [ 2015-05-21 ]

Here is an MTR test case. It triggers the assertion reliably for me by
adding the below sleep in the code.

It looks like the wait_for_workers_idle() function returns too early. It
does wait_for_prior_commit(), at which point all transactions are committed
(at least written into binlog), but table locks may not yet be
released. This allows the close_temporary_tables() for a restart FD event to
run ahead and touch not yet closed temporary tables.

--source include/have_partition.inc
--source include/have_innodb.inc
--source include/master-slave.inc
 
--connection server_2
SET @old_parallel_threads=@@GLOBAL.slave_parallel_threads;
--source include/stop_slave.inc
SET GLOBAL slave_parallel_threads=8;
 
--connection server_1
CREATE TABLE E (
			pk INTEGER AUTO_INCREMENT,
			col_int_nokey INTEGER /*! NULL */,
			col_int_key INTEGER /*! NULL */,
 
			col_date_key DATE /*! NULL */,
			col_date_nokey DATE /*! NULL */,
 
			col_time_key TIME /*! NULL */,
			col_time_nokey TIME /*! NULL */,
 
			col_datetime_key DATETIME /*! NULL */,
			col_datetime_nokey DATETIME /*! NULL */,
 
			col_varchar_key VARCHAR(1) /*! NULL */,
			col_varchar_nokey VARCHAR(1) /*! NULL */,
 
			PRIMARY KEY (pk),
			KEY (col_int_key),
			KEY (col_date_key),
			KEY (col_time_key),
			KEY (col_datetime_key),
			KEY (col_varchar_key, col_int_key)
		)  ENGINE=InnoDB;
 
ALTER TABLE `E` PARTITION BY KEY() PARTITIONS 5;
ALTER TABLE `E` REMOVE PARTITIONING;
--write_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
wait
EOF
--shutdown_server 30
--source include/wait_until_disconnected.inc
--connection default
--source include/wait_until_disconnected.inc
--append_file $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
restart:
EOF
--enable_reconnect
--source include/wait_until_connected_again.inc
--connection server_1
--enable_reconnect
--source include/wait_until_connected_again.inc
CREATE TABLE t1 (a INT PRIMARY KEY);
--save_master_pos
 
--connection server_2
--source include/start_slave.inc
--sync_with_master
 
 
# Clean up.
--connection server_2
--source include/stop_slave.inc
SET GLOBAL slave_parallel_threads=@old_parallel_threads;
--source include/start_slave.inc
 
--connection server_1
DROP TABLE `E`;
DROP TABLE t1;
--source include/rpl_end.inc

diff --git a/sql/sql_class.cc b/sql/sql_class.cc
index 7de565d..fbb2716 100644
--- a/sql/sql_class.cc
+++ b/sql/sql_class.cc
@@ -6994,6 +6994,7 @@ wait_for_commit::wakeup_subsequent_commits2(int wakeup_error)
     a mutex), so no extra explicit barrier is needed here.
   */
   wakeup_subsequent_commits_running= false;
+my_sleep(21000);
 }
 
 

Comment by Kristian Nielsen [ 2015-05-26 ]

http://lists.askmonty.org/pipermail/commits/2015-May/007945.html

Comment by Kristian Nielsen [ 2015-05-26 ]

Pushed to 10.0 and 10.1

Generated at Thu Feb 08 07:24:59 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.