Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6435

Assertion `m_status == DA_ERROR' failed in Diagnostics_area::sql_errno() with parallel replication

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • N/A
    • 10.0.13
    • None
    • None

    Description

      With the ugly test below, started with slave-parallel-threads=16 and repeated multiple times, I am getting sporadic failures.

      I am not 100% sure the problem does not exist in 10.0 tree, but I wasn't able to reproduce it there so far, and neither could I reproduce it without slave-parallel-threads.

      140710 12:26:47 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='', master_port='3306', master_log_file='', master_log_pos='4'. New state master_host='127.0.0.1', master_port='16000', master_log_file='master-bin.000001', master_log_pos='4'.
      140710 12:26:47 [Note] Slave SQL thread initialized, starting replication in log 'master-bin.000001' at position 4, relay log './slave-relay-bin.0
      00001' position: 4
      140710 12:26:47 [Note] Slave I/O thread: connected to master 'root@127.0.0.1:16000',replication started in log 'master-bin.000001' at position 4
      140710 12:26:47 [ERROR] Slave SQL: Query partially completed on the master (error on master: 1317) and was aborted. There is a chance that your master is inconsistent at this point. If you are sure that your master is ok, run this query manually on the slave and then restart the slave with SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1; START SLAVE; . Query: 'INSERT INTO t1 VALUES (SLEEP(3)),(SLEEP(1))', Gtid 0-1-3, Internal MariaDB error code: 1317
      mysqld: 10.0-knielsen/sql/sql_error.h:708: uint Diagnostics_area::sql_errno() const: Assertion `m_status == DA_ERROR' failed.
      140710 12:26:47 [ERROR] mysqld got signal 6 ;

      #5  0x00007fb1870e56f0 in *__GI_abort () at abort.c:92
      #6  0x00007fb1870db621 in *__GI___assert_fail (assertion=0xf0abda "m_status == DA_ERROR", file=<optimized out>, line=708, function=0xf0c0e0 "uint 
      Diagnostics_area::sql_errno() const") at assert.c:81
      #7  0x00000000005d3caa in Diagnostics_area::sql_errno (this=0x7fb17552b0d0) at 10.0-knielsen/sql/sql_error.h:708
      #8  0x00000000007f0627 in convert_kill_to_deadlock_error (rgi=0x7fb1790ed800) at 10.0-knielsen/sql/rpl_parallel.cc:237
      #9  0x00000000007f194a in handle_rpl_parallel_thread (arg=0x7fb1754b35f0) at 10.0-knielsen/sql/rpl_parallel.cc:727
      #10 0x0000000000cc1d8a in pfs_spawn_thread (arg=0x7fb1754b5270) at 10.0-knielsen/storage/perfschema/pfs.cc:1860
      #11 0x00007fb188c92b50 in start_thread (arg=<optimized out>) at pthread_create.c:304
      #12 0x00007fb18718aa7d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112

      # I run it as
      perl ./mtr bug.t1 \
      --repeat=100 \
      --force-restart  \
      --nocheck-testcases \
      --nowarnings \
      --force \
      --mysqld=--slave-parallel-threads=16

      --source include/master-slave.inc
      --source include/have_binlog_format_statement.inc
      --disable_abort_on_error
       
      CREATE TABLE t1 (i INT);
      CREATE TRIGGER tr AFTER INSERT ON t1 FOR EACH ROW SET binlog_format = ROW;
       
      --let $conid = `SELECT CONNECTION_ID()`
       
      --send
      INSERT INTO t1 VALUES (SLEEP(3)),(SLEEP(1));
       
      --connection master1
      eval KILL QUERY $conid;
       
      --connection master
      --reap
      SHOW BINLOG EVENTS;
       
      --sync_slave_with_master

      revision-id: knielsen@knielsen-hq.org-20140708135903-rmf1qd77gqc01gkj
      revno: 4282
      branch-nick: 10.0-knielsen

      Attachments

        Activity

          Ok, I see what's going on here, I think.

          The query is MyISAM, it partially executes and is then killed.
          Such query is logged to the binlog along with an error code.
          On the slave, it is seen that the query got a kill error. There is no way to
          safely replicate such statement on the slave to get same results as on master.

          So the slave does not execute the query, but logs an error message to the
          error log instead for the DBA to try and clean up the mess.

          In this case, there is a bug in the new code. The code does not set an actual
          error (with my_error()), but it still tries to read the error code. This is
          wrong (though harmless in a release build), so a debug build causes the
          assertion.

          I found the below test case that can more reliably reproduce. The trick seems
          to be that the actual MyISAM ignores the kill (probably to avoid leaving
          database in inconsistent state?), but the trigger catches the kill can causes
          it to get the error.

          --source include/master-slave.inc
          --source include/have_binlog_format_statement.inc
          --disable_abort_on_error
           
          CREATE TABLE t1 (i INT);
          CREATE TRIGGER tr AFTER INSERT ON t1 FOR EACH ROW SET @a = SLEEP(1);
           
          --let $conid = `SELECT CONNECTION_ID()`
           
          --send
          INSERT INTO t1 VALUES (SLEEP(0.1)), (SLEEP(0.1)), (SLEEP(0.1)), (SLEEP(0.1)), (SLEEP(0.1));
           
          --connection master1
          SELECT SLEEP(2);
          eval KILL QUERY $conid;
           
          --connection master
          --reap
          SHOW BINLOG EVENTS;
           
          --sync_slave_with_master
           
          --connection master
          DROP TABLE t1;
          --source include/rpl_end.inc

          knielsen Kristian Nielsen added a comment - Ok, I see what's going on here, I think. The query is MyISAM, it partially executes and is then killed. Such query is logged to the binlog along with an error code. On the slave, it is seen that the query got a kill error. There is no way to safely replicate such statement on the slave to get same results as on master. So the slave does not execute the query, but logs an error message to the error log instead for the DBA to try and clean up the mess. In this case, there is a bug in the new code. The code does not set an actual error (with my_error()), but it still tries to read the error code. This is wrong (though harmless in a release build), so a debug build causes the assertion. I found the below test case that can more reliably reproduce. The trick seems to be that the actual MyISAM ignores the kill (probably to avoid leaving database in inconsistent state?), but the trigger catches the kill can causes it to get the error. --source include/master-slave.inc --source include/have_binlog_format_statement.inc --disable_abort_on_error   CREATE TABLE t1 (i INT); CREATE TRIGGER tr AFTER INSERT ON t1 FOR EACH ROW SET @a = SLEEP(1);   --let $conid = `SELECT CONNECTION_ID()`   --send INSERT INTO t1 VALUES (SLEEP(0.1)), (SLEEP(0.1)), (SLEEP(0.1)), (SLEEP(0.1)), (SLEEP(0.1));   --connection master1 SELECT SLEEP(2); eval KILL QUERY $conid;   --connection master --reap SHOW BINLOG EVENTS;   --sync_slave_with_master   --connection master DROP TABLE t1; --source include/rpl_end.inc

          Pushed to lp:~maria-captains/maria/10.0-knielsen

          (This bug is in code that was never pushed to a main tree)

          knielsen Kristian Nielsen added a comment - Pushed to lp:~maria-captains/maria/10.0-knielsen (This bug is in code that was never pushed to a main tree)

          People

            knielsen Kristian Nielsen
            elenst Elena Stepanova
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.