[MDEV-6435] Assertion `m_status == DA_ERROR' failed in Diagnostics_area::sql_errno() with parallel replication - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: N/A
Fix Version/s: 10.0.13
Component/s: None
Labels:
None

Description

With the ugly test below, started with slave-parallel-threads=16 and repeated multiple times, I am getting sporadic failures.

I am not 100% sure the problem does not exist in 10.0 tree, but I wasn't able to reproduce it there so far, and neither could I reproduce it without slave-parallel-threads.

140710 12:26:47 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='', master_port='3306', master_log_file='', master_log_pos='4'. New state master_host='127.0.0.1', master_port='16000', master_log_file='master-bin.000001', master_log_pos='4'.

140710 12:26:47 [Note] Slave SQL thread initialized, starting replication in log 'master-bin.000001' at position 4, relay log './slave-relay-bin.0

00001' position: 4

140710 12:26:47 [Note] Slave I/O thread: connected to master 'root@127.0.0.1:16000',replication started in log 'master-bin.000001' at position 4

140710 12:26:47 [ERROR] Slave SQL: Query partially completed on the master (error on master: 1317) and was aborted. There is a chance that your master is inconsistent at this point. If you are sure that your master is ok, run this query manually on the slave and then restart the slave with SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; START SLAVE; . Query: 'INSERT INTO t1 VALUES (SLEEP(3)),(SLEEP(1))', Gtid 0-1-3, Internal MariaDB error code: 1317

mysqld: 10.0-knielsen/sql/sql_error.h:708: uint Diagnostics_area::sql_errno() const: Assertion `m_status == DA_ERROR' failed.

140710 12:26:47 [ERROR] mysqld got signal 6 ;

#5  0x00007fb1870e56f0 in *__GI_abort () at abort.c:92

#6  0x00007fb1870db621 in *__GI___assert_fail (assertion=0xf0abda "m_status == DA_ERROR", file=<optimized out>, line=708, function=0xf0c0e0 "uint

Diagnostics_area::sql_errno() const") at assert.c:81

#7  0x00000000005d3caa in Diagnostics_area::sql_errno (this=0x7fb17552b0d0) at 10.0-knielsen/sql/sql_error.h:708

#8  0x00000000007f0627 in convert_kill_to_deadlock_error (rgi=0x7fb1790ed800) at 10.0-knielsen/sql/rpl_parallel.cc:237

#9  0x00000000007f194a in handle_rpl_parallel_thread (arg=0x7fb1754b35f0) at 10.0-knielsen/sql/rpl_parallel.cc:727

#10 0x0000000000cc1d8a in pfs_spawn_thread (arg=0x7fb1754b5270) at 10.0-knielsen/storage/perfschema/pfs.cc:1860

#11 0x00007fb188c92b50 in start_thread (arg=<optimized out>) at pthread_create.c:304

#12 0x00007fb18718aa7d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112

# I run it as

perl ./mtr bug.t1 \

--repeat=100 \

--force-restart  \

--nocheck-testcases \

--nowarnings \

--force \

--mysqld=--slave-parallel-threads=16

--source include/master-slave.inc

--source include/have_binlog_format_statement.inc

--disable_abort_on_error

CREATE TABLE t1 (i INT);

CREATE TRIGGER tr AFTER INSERT ON t1 FOR EACH ROW SET binlog_format = ROW;

--let $conid = `SELECT CONNECTION_ID()`

--send

INSERT INTO t1 VALUES (SLEEP(3)),(SLEEP(1));

--connection master1

eval KILL QUERY $conid;

--connection master

--reap

SHOW BINLOG EVENTS;

--sync_slave_with_master

revision-id: knielsen@knielsen-hq.org-20140708135903-rmf1qd77gqc01gkj

revno: 4282

branch-nick: 10.0-knielsen

Attachments

Activity

Ascending order - Click to sort in descending order

Kristian Nielsen added a comment - 2014-07-10 14:03

Ok, I see what's going on here, I think.

The query is MyISAM, it partially executes and is then killed.
Such query is logged to the binlog along with an error code.
On the slave, it is seen that the query got a kill error. There is no way to
safely replicate such statement on the slave to get same results as on master.

So the slave does not execute the query, but logs an error message to the
error log instead for the DBA to try and clean up the mess.

In this case, there is a bug in the new code. The code does not set an actual
error (with my_error()), but it still tries to read the error code. This is
wrong (though harmless in a release build), so a debug build causes the
assertion.

I found the below test case that can more reliably reproduce. The trick seems
to be that the actual MyISAM ignores the kill (probably to avoid leaving
database in inconsistent state?), but the trigger catches the kill can causes
it to get the error.

--source include/master-slave.inc

--source include/have_binlog_format_statement.inc

--disable_abort_on_error

CREATE TABLE t1 (i INT);

CREATE TRIGGER tr AFTER INSERT ON t1 FOR EACH ROW SET @a = SLEEP(1);

--let $conid = `SELECT CONNECTION_ID()`

--send

INSERT INTO t1 VALUES (SLEEP(0.1)), (SLEEP(0.1)), (SLEEP(0.1)), (SLEEP(0.1)), (SLEEP(0.1));

--connection master1

SELECT SLEEP(2);

eval KILL QUERY $conid;

--connection master

--reap

SHOW BINLOG EVENTS;

--sync_slave_with_master

--connection master

DROP TABLE t1;

--source include/rpl_end.inc

Kristian Nielsen added a comment - 2014-07-10 14:03 Ok, I see what's going on here, I think. The query is MyISAM, it partially executes and is then killed. Such query is logged to the binlog along with an error code. On the slave, it is seen that the query got a kill error. There is no way to safely replicate such statement on the slave to get same results as on master. So the slave does not execute the query, but logs an error message to the error log instead for the DBA to try and clean up the mess. In this case, there is a bug in the new code. The code does not set an actual error (with my_error()), but it still tries to read the error code. This is wrong (though harmless in a release build), so a debug build causes the assertion. I found the below test case that can more reliably reproduce. The trick seems to be that the actual MyISAM ignores the kill (probably to avoid leaving database in inconsistent state?), but the trigger catches the kill can causes it to get the error. --source include/master-slave.inc --source include/have_binlog_format_statement.inc --disable_abort_on_error CREATE TABLE t1 (i INT); CREATE TRIGGER tr AFTER INSERT ON t1 FOR EACH ROW SET @a = SLEEP(1); --let $conid = `SELECT CONNECTION_ID()` --send INSERT INTO t1 VALUES (SLEEP(0.1)), (SLEEP(0.1)), (SLEEP(0.1)), (SLEEP(0.1)), (SLEEP(0.1)); --connection master1 SELECT SLEEP(2); eval KILL QUERY $conid; --connection master --reap SHOW BINLOG EVENTS; --sync_slave_with_master --connection master DROP TABLE t1; --source include/rpl_end.inc

Kristian Nielsen added a comment - 2014-07-10 14:57

Pushed to lp:~maria-captains/maria/10.0-knielsen

(This bug is in code that was never pushed to a main tree)

Kristian Nielsen added a comment - 2014-07-10 14:57 Pushed to lp:~maria-captains/maria/10.0-knielsen (This bug is in code that was never pushed to a main tree)

People

Assignee:: Kristian Nielsen

Reporter:: Elena Stepanova

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 2014-07-10 11:40

Updated:: 2014-07-10 14:57

Resolved:: 2014-07-10 14:57

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server