Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
N/A
-
None
-
None
Description
With the ugly test below, started with slave-parallel-threads=16 and repeated multiple times, I am getting sporadic failures.
I am not 100% sure the problem does not exist in 10.0 tree, but I wasn't able to reproduce it there so far, and neither could I reproduce it without slave-parallel-threads.
140710 12:26:47 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='', master_port='3306', master_log_file='', master_log_pos='4'. New state master_host='127.0.0.1', master_port='16000', master_log_file='master-bin.000001', master_log_pos='4'.
|
140710 12:26:47 [Note] Slave SQL thread initialized, starting replication in log 'master-bin.000001' at position 4, relay log './slave-relay-bin.0
|
00001' position: 4
|
140710 12:26:47 [Note] Slave I/O thread: connected to master 'root@127.0.0.1:16000',replication started in log 'master-bin.000001' at position 4
|
140710 12:26:47 [ERROR] Slave SQL: Query partially completed on the master (error on master: 1317) and was aborted. There is a chance that your master is inconsistent at this point. If you are sure that your master is ok, run this query manually on the slave and then restart the slave with SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; START SLAVE; . Query: 'INSERT INTO t1 VALUES (SLEEP(3)),(SLEEP(1))', Gtid 0-1-3, Internal MariaDB error code: 1317
|
mysqld: 10.0-knielsen/sql/sql_error.h:708: uint Diagnostics_area::sql_errno() const: Assertion `m_status == DA_ERROR' failed.
|
140710 12:26:47 [ERROR] mysqld got signal 6 ;
|
#5 0x00007fb1870e56f0 in *__GI_abort () at abort.c:92
|
#6 0x00007fb1870db621 in *__GI___assert_fail (assertion=0xf0abda "m_status == DA_ERROR", file=<optimized out>, line=708, function=0xf0c0e0 "uint
|
Diagnostics_area::sql_errno() const") at assert.c:81
|
#7 0x00000000005d3caa in Diagnostics_area::sql_errno (this=0x7fb17552b0d0) at 10.0-knielsen/sql/sql_error.h:708
|
#8 0x00000000007f0627 in convert_kill_to_deadlock_error (rgi=0x7fb1790ed800) at 10.0-knielsen/sql/rpl_parallel.cc:237
|
#9 0x00000000007f194a in handle_rpl_parallel_thread (arg=0x7fb1754b35f0) at 10.0-knielsen/sql/rpl_parallel.cc:727
|
#10 0x0000000000cc1d8a in pfs_spawn_thread (arg=0x7fb1754b5270) at 10.0-knielsen/storage/perfschema/pfs.cc:1860
|
#11 0x00007fb188c92b50 in start_thread (arg=<optimized out>) at pthread_create.c:304
|
#12 0x00007fb18718aa7d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
|
# I run it as
|
perl ./mtr bug.t1 \
|
--repeat=100 \
|
--force-restart \
|
--nocheck-testcases \
|
--nowarnings \
|
--force \
|
--mysqld=--slave-parallel-threads=16
|
--source include/master-slave.inc
|
--source include/have_binlog_format_statement.inc
|
--disable_abort_on_error
|
|
CREATE TABLE t1 (i INT); |
CREATE TRIGGER tr AFTER INSERT ON t1 FOR EACH ROW SET binlog_format = ROW; |
|
--let $conid = `SELECT CONNECTION_ID()`
|
|
--send
|
INSERT INTO t1 VALUES (SLEEP(3)),(SLEEP(1)); |
|
--connection master1
|
eval KILL QUERY $conid;
|
|
--connection master
|
--reap
|
SHOW BINLOG EVENTS;
|
|
--sync_slave_with_master
|
revision-id: knielsen@knielsen-hq.org-20140708135903-rmf1qd77gqc01gkj
|
revno: 4282
|
branch-nick: 10.0-knielsen
|
Ok, I see what's going on here, I think.
The query is MyISAM, it partially executes and is then killed.
Such query is logged to the binlog along with an error code.
On the slave, it is seen that the query got a kill error. There is no way to
safely replicate such statement on the slave to get same results as on master.
So the slave does not execute the query, but logs an error message to the
error log instead for the DBA to try and clean up the mess.
In this case, there is a bug in the new code. The code does not set an actual
error (with my_error()), but it still tries to read the error code. This is
wrong (though harmless in a release build), so a debug build causes the
assertion.
I found the below test case that can more reliably reproduce. The trick seems
to be that the actual MyISAM ignores the kill (probably to avoid leaving
database in inconsistent state?), but the trigger catches the kill can causes
it to get the error.
--source include/master-slave.inc
--source include/have_binlog_format_statement.inc
--disable_abort_on_error
CREATE TABLE t1 (i INT);
CREATE TRIGGER tr AFTER INSERT ON t1 FOR EACH ROW SET @a = SLEEP(1);
--let $conid = `SELECT CONNECTION_ID()`
--send
INSERT INTO t1 VALUES (SLEEP(0.1)), (SLEEP(0.1)), (SLEEP(0.1)), (SLEEP(0.1)), (SLEEP(0.1));
--connection master1
SELECT SLEEP(2);
eval KILL QUERY $conid;
--connection master
--reap
SHOW BINLOG EVENTS;
--sync_slave_with_master
--connection master
DROP TABLE t1;
--source include/rpl_end.inc