[MDEV-6233] Parallel replication crash Created: 2014-05-12  Updated: 2014-06-30  Resolved: 2014-06-30

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.0.10
Fix Version/s: 10.0.13

Type: Bug Priority: Major
Reporter: Hartmut Holzgraefe Assignee: Kristian Nielsen
Resolution: Duplicate Votes: 0
Labels: None
Environment:

linux (ubuntu 14.04 x86_64)



 Description   

mysqld crashed after performing the following steps:

on the master:

# start master mysqld with log-bin but without setting server_id
* GRANT REPLICATION SLAVE ON *.* TO "repl"@"%" IDENTIFIED BY "secret";

on the slave:

CHANGE MASTER TO MASTER_USER='repl',MASTER_PASSWORD='secret', MASTER_='192.168.23.15';
SET GLOBAL slave_parallel_threads = 2;
START SLAVE;
SHOW SLAVE STATUS\G
#  -> complains about same server_id on master and slave;
SET GLOBAL SERVER_ID=23;
STOP SLAVE; START SLAVE;
SHOW SLAVE STATUS\G
#  -> now complains about server_id not set on the master side

on the master:

SET GLOBAL SERVER_ID=42;

on the slave:

STOP SLAVE; START SLAVE;
# --> crash!

error log shows:

140512 22:07:59 [ERROR] Slave SQL: Error 'Duplicate entry '%-test-' for key 'PRIMARY'' on query. Default database: 'mysql'. Query: 'INSERT INTO db SELECT * FROM tmp_db WHERE @had_db_table=0;', Internal MariaDB error code: 1062
140512 22:07:59 [Warning] Slave: Duplicate entry '%-test-' for key 'PRIMARY' Error_code: 1062
140512 22:07:59 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'hartmut-server-bin.000001' position 63913
mysqld: /home/hartmut/projects/mariadb/releases/mariadb-10.0.10/sql/sql_base.cc:900: void close_thread_tables(THD*): Assertion `thd->transaction.stmt.is_empty() || thd->in_sub_stmt || (thd->state_flags & Open_tables_state::BACKUPS_AVAIL)' failed.
[...]
mysys/stacktrace.c:246(my_print_stacktrace)[0xdeb358]
sql/signal_handler.cc:155(handle_fatal_signal)[0x82633f]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7f8bc4ed3340]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x39)[0x7f8bc3cedf79]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f8bc3cf1388]
/lib/x86_64-linux-gnu/libc.so.6(+0x2fe36)[0x7f8bc3ce6e36]
/lib/x86_64-linux-gnu/libc.so.6(+0x2fee2)[0x7f8bc3ce6ee2]
sql/sql_base.cc:903(close_thread_tables(THD*))[0x5d95d7]
sql/rpl_gtid.cc:689(rpl_slave_state::record_gtid(THD*, rpl_gtid const*, unsigned long long, bool, bool))[0x7afecb]
sql/rpl_gtid.cc:80(rpl_slave_state::record_and_update_gtid(THD*, rpl_group_info*))[0x7ae91c]
sql/rpl_rli.cc:1268(Relay_log_info::stmt_done(unsigned long long, long, THD*, rpl_group_info*))[0x749f24]
sql/log_event.cc:975(Log_event::do_update_pos(rpl_group_info*))[0x909159]
sql/log_event.cc:4485(Query_log_event::do_update_pos(rpl_group_info*))[0x90f573]
sql/log_event.h:1355(Log_event::update_pos(rpl_group_info*))[0x5b0f7e]
sql/slave.cc:3277(apply_event_and_update_pos(Log_event*, THD*, rpl_group_info*, rpl_parallel_thread*))[0x5a7e70]
sql/rpl_parallel.cc:46(rpt_handle_event)[0x7b3b62]
sql/rpl_parallel.cc:500(handle_rpl_parallel_thread)[0x7b4abb]
perfschema/pfs.cc:1855(pfs_spawn_thread)[0x9e893e]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7f8bc4ecb182]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f8bc3db230d]



 Comments   
Comment by Elena Stepanova [ 2014-05-26 ]

Hi Hartmut,

Would you be, by any chance, able to provide the binlog?

Comment by Elena Stepanova [ 2014-06-01 ]

I tried to reproduce it, but didn't get the crash.
Also, from the error log, it doesn't look like it happens upon slave startup, but rather upon the slave SQL error caused by the duplicate key. So, probably just running the described scenario on a clean master/slave pair won't do the trick.

Comment by Elena Stepanova [ 2014-06-30 ]

Assigning to knielsen just in case he can ingeniously figure out the problem from the stack trace alone.

Comment by Kristian Nielsen [ 2014-06-30 ]

I'm pretty sure this is a duplicate of MDEV-6386. It is caused by some incorrect/missing error handling on the duplicate key error.

MDEV-6386 was fixed in 10.0.13, which is however not yet released, but it is in latest 10.0 bzr.

I will close as a duplicate, but please reopen if it turns out to still happen with 10.0.13 / latest bzr.

Thanks for the bug report, any help to get the last issues in parallel replication ironed out is greatly appreciated.

Generated at Thu Feb 08 07:10:21 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.