Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-3924

Galera: Fatal error on trx replay or assertion `m_status == DA_ERROR' failure in Diagnostics_area::sql _errno()



    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 5.5.28a-galera
    • None
    • None


      General description
      RQG grammars and data and server command lines follow

      I have a 3-node cluster (2 servers and the arbitrator).
      Test data consists of one table with two columns, a PK and a non-unique key on a char(1) column. The table contains 20 rows.

      Test flow on the 1st node:
      2 threads
      one runs UPDATE <table> SET <non-pk> = ... ORDER BY ... LIMIT 8
      another one runs KILL QUERY <first thread>

      Test flow on the 2nd node:
      Single thread runs UPDATE <table> SET <non-pk> = ... LIMIT 3

      After a few minutes on a release build replication on the first node fails with

      [ERROR] Slave SQL: Error executing row event: 'Query execution was interrupted', Error_code: 1317
      [Warning] WSREP: RBR event 2 Update_rows apply warning: 1317, 9845
      [Warning] WSREP: failed to replay trx: source: 6b5ee6b1-418e-11e2-0800-2723c17dce38 version: 2 local: 1 state: REPLAYING flags: 1 conn_id: 7 trx_id: 30651 seqnos (l: 9880, g: 9845, s: 9843, d: 9844, ts: 1355009139336500474)
      [Warning] WSREP: Failed to apply app buffer: d<CC><C3>P^S^A, seqno: 9845, status: WSREP_FATAL
               at galera/src/replicator_smm.cpp:apply_wscoll():49
               at galera/src/replicator_smm.cpp:apply_trx_ws():120
      [ERROR] WSREP: trx_replay failed for: 5, query: UPDATE `table20_innodb_int_autoinc` SET `col_char_1_key` = 'l' ORDER BY `col_char_1_key`,`pk` LIMIT 5
      [ERROR] Aborting

      After that the server hangs, it doesn't shut down, but doesn't accept any connections either.

      On a debug version, it aborts with

      mysqld: maria-5.5-galera/sql/sql_error.h:76: uint Diagnostics_area::sql_errno() const: Assertion `m_status == DA_ERROR' failed.
      [ERROR] mysqld got signal 6 ;

      #6  0x00007fe6359edd4d in __GI___assert_fail (assertion=0xd495da "m_status == DA_ERR
      OR", file=<optimized out>, line=76, function=<optimized out>) at assert.c:81
      #7  0x000000000057bf15 in Diagnostics_area::sql_errno (this=0x7fe600004240) at /home
      #8  0x00000000008cd766 in Rows_log_event::do_apply_event (this=0x40f6450, rli=0x40f3
      300) at maria-5.5-galera/sql/log_event.cc:8280
      #9  0x0000000000592e8c in Log_event::apply_event (this=0x40f6450, rli=0x40f3300) at maria-5.5-galera/sql/log_event.h:1230
      #10 0x00000000006284e9 in wsrep_apply_rbr (thd=0x7fe600000910, rbr_buf=0x40f61f0 "T\231\303P\023\001", buf_len=0) at maria-5.5-galera/sql/sql_parse.cc:8098
      #11 0x0000000000628ad5 in wsrep_apply_cb (ctx=0x7fe600000910, buf=0x40f61f0, buf_len=168, global_seqno=637) at maria-5.5-galera/sql/sql_parse.cc:8177
      #12 0x00007fe634d31abf in apply_wscoll (trx=..., apply_cb=0x628a27 <wsrep_apply_cb(void*, void const*, unsigned long, long)>, recv_ctx=0x7fe600000910) at galera/src/replicator_smm.cpp:37
      #13 apply_trx_ws (recv_ctx=0x7fe600000910, apply_cb=0x628a27 <wsrep_apply_cb(void*, void const*, unsigned long, long)>, commit_cb=0x628d32 <wsrep_commit_cb(void*, long, bool)>, trx=...) at galera/src/replicator_smm.cpp:81
      #14 0x00007fe634d3600f in galera::ReplicatorSMM::replay_trx (this=0x291ded0, trx=0x41cd280, trx_ctx=0x7fe600000910) at galera/src/replicator_smm.cpp:821
      #15 0x00007fe634d4db76 in galera_replay_trx (gh=<optimized out>, trx_handle=<optimized out>, recv_ctx=0x7fe600000910) at galera/src/wsrep_provider.cpp:658
      #16 0x000000000062330b in wsrep_mysql_parse (thd=0x7fe600000910, rawbuf=0x4083178 "UPDATE `table20_innodb_int_autoinc` SET `col_char_1_key` = 'l' ORDER BY `col_char_1_key`,`pk` LIMIT 5", length=101, parser_state=0x7fe60789b550) at maria-5.5-galera/sql/sql_parse.cc:6085
      #17 0x000000000061543b in dispatch_command (command=COM_QUERY, thd=0x7fe600000910, packet=0x7fe600006561 "UPDATE `table20_innodb_int_autoinc` SET `col_char_1_key` = 'l' ORDER BY `col_char_1_key`,`pk` LIMIT 5", packet_length=101) at maria-5.5-galera/sql/sql_parse.cc:1230
      #18 0x000000000061429e in do_command (thd=0x7fe600000910) at maria-5.5-galera/sql/sql_parse.cc:890
      #19 0x000000000071c1b8 in do_handle_one_connection (thd_arg=0x7fe600000910) at maria-5.5-galera/sql/sql_connect.cc:1278

      Trying to get some variables.
      Some pointers may be invalid and cause the dump to abort.
      Query (0x4083178): UPDATE `table20_innodb_int_autoinc` SET `col_char_1_key` = 'l' ORDER BY `col_char_1_key`,`pk` LIMIT 5
      Connection ID (thread ID): 7
      Status: KILL_QUERY

      branch: maria-5.5-galera
      revision-id: seppo.jaakola@codership.com-20121130113629-lhwlr2ncrib15h18
      date: 2012-11-30 13:36:29 +0200
      revno: 3358

      Command lines:

      maria-5.5-galera/sql/mysqld --defaults-file=maria-5.5-galera/mydef1.cnf --datadir=maria-5.5-galera/data1 --wsrep_provider=galera-23.2.2-src/libgalera_smm.so --wsrep_sst_method=rsync --core --default-storage-engine=InnoDB --innodb_autoinc_lock_mode=2 --innodb_locks_unsafe_for_binlog=1 --binlog-format=row --innodb_flush_log_at_trx_commit=0 --log-error=log.err --basedir=maria-5.5-galera/ --port=8306 --loose-lc-messages-dir=maria-5.5-galera/sql/share --socket=/tmp/elenst-galera-1.sock --tmpdir=maria-5.5-galera/data1/tmp --general-log=1 --wsrep_cluster_address=gcomm:// --core --log-bin=master-bin
      maria-5.5-galera/sql/mysqld --defaults-file=maria-5.5-galera/mydef2.cnf --datadir=maria-5.5-galera/data2 --wsrep_provider=galera-23.2.2-src/libgalera_smm.so --wsrep_sst_method=rsync --core --default-storage-engine=InnoDB --innodb_autoinc_lock_mode=2 --innodb_locks_unsafe_for_binlog=1 --binlog-format=row --innodb_flush_log_at_trx_commit=0 --log-error=log.err --basedir=maria-5.5-galera/ --port=8307 --loose-lc-messages-dir=maria-5.5-galera/sql/share --socket=/tmp/elenst-galera-2.sock --tmpdir=maria-5.5-galera/data2/tmp --general-log=1 --wsrep_cluster_address=gcomm:// --core --log-bin=master-bin

      (mydefX.cnf files are irrelevant, they only contain datadirs and ports).

      The test is run via RQG, one instance per node. The data file is the same for both instances, grammars slightly differ.

      RQG command lines:

      perl gentest.pl --gendata=1.zz --threads=2 --queries=100M --duration=21600 --dsn=dbi:mysql:host= --grammar=1a.yy
      perl gentest.pl --gendata=1.zz  --threads=1 --queries=100M --duration=21600 --dsn=dbi:mysql:host= --grammar=1b.yy

      data file (1.zz):

      $tables = {
              rows => [ 20 ],
              engines => [ 'InnoDB' ]
      $fields = {
              types => [ 'char(1)' ],
              pk => [ 'int' ],
              indexes => [ 'key' ]
      $data = {
              numbers => [ 'digit' ],
              strings => [ 'letter' ]

      Grammar for the 1st node (1a.yy):

              SELECT CONNECTION_ID() INTO @killer;
              KILL QUERY @killer - 1 ;
              UPDATE _table SET _field_no_pk = _varchar(1) ORDER BY _field_list LIMIT 8 ;

      Grammar for the 2nd node (1b.yy):

              UPDATE _table SET _field_no_pk = _char(1) LIMIT 3 ;

      An example of GRA file produced upon the failure is attached.




            seppo Seppo Jaakola
            elenst Elena Stepanova
            0 Vote for this issue
            2 Start watching this issue



              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.