[MDEV-3924] Galera: Fatal error on trx replay or assertion `m_status == DA_ERROR' failure in Diagnostics_area::sql _errno() - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 5.5.28a-galera
Fix Version/s: None
Component/s: None
Labels:
- galera

Description

General description
RQG grammars and data and server command lines follow

I have a 3-node cluster (2 servers and the arbitrator).
Test data consists of one table with two columns, a PK and a non-unique key on a char(1) column. The table contains 20 rows.

Test flow on the 1st node:
2 threads
one runs UPDATE <table> SET <non-pk> = ... ORDER BY ... LIMIT 8
another one runs KILL QUERY <first thread>

Test flow on the 2nd node:
Single thread runs UPDATE <table> SET <non-pk> = ... LIMIT 3

After a few minutes on a release build replication on the first node fails with

[ERROR] Slave SQL: Error executing row event: 'Query execution was interrupted', Error_code: 1317

[Warning] WSREP: RBR event 2 Update_rows apply warning: 1317, 9845

[Warning] WSREP: failed to replay trx: source: 6b5ee6b1-418e-11e2-0800-2723c17dce38 version: 2 local: 1 state: REPLAYING flags: 1 conn_id: 7 trx_id: 30651 seqnos (l: 9880, g: 9845, s: 9843, d: 9844, ts: 1355009139336500474)

[Warning] WSREP: Failed to apply app buffer: d<CC><C3>P^S^A, seqno: 9845, status: WSREP_FATAL

         at galera/src/replicator_smm.cpp:apply_wscoll():49

         at galera/src/replicator_smm.cpp:apply_trx_ws():120

[ERROR] WSREP: trx_replay failed for: 5, query: UPDATE `table20_innodb_int_autoinc` SET `col_char_1_key` = 'l' ORDER BY `col_char_1_key`,`pk` LIMIT 5

[ERROR] Aborting

After that the server hangs, it doesn't shut down, but doesn't accept any connections either.

On a debug version, it aborts with

mysqld: maria-5.5-galera/sql/sql_error.h:76: uint Diagnostics_area::sql_errno() const: Assertion `m_status == DA_ERROR' failed.

[ERROR] mysqld got signal 6 ;

#6  0x00007fe6359edd4d in __GI___assert_fail (assertion=0xd495da "m_status == DA_ERR

OR", file=<optimized out>, line=76, function=<optimized out>) at assert.c:81

#7  0x000000000057bf15 in Diagnostics_area::sql_errno (this=0x7fe600004240) at /home

/elenst/maria-5.5-galera/sql/sql_error.h:76

#8  0x00000000008cd766 in Rows_log_event::do_apply_event (this=0x40f6450, rli=0x40f3

300) at maria-5.5-galera/sql/log_event.cc:8280

#9  0x0000000000592e8c in Log_event::apply_event (this=0x40f6450, rli=0x40f3300) at maria-5.5-galera/sql/log_event.h:1230

#10 0x00000000006284e9 in wsrep_apply_rbr (thd=0x7fe600000910, rbr_buf=0x40f61f0 "T\231\303P\023\001", buf_len=0) at maria-5.5-galera/sql/sql_parse.cc:8098

#11 0x0000000000628ad5 in wsrep_apply_cb (ctx=0x7fe600000910, buf=0x40f61f0, buf_len=168, global_seqno=637) at maria-5.5-galera/sql/sql_parse.cc:8177

#12 0x00007fe634d31abf in apply_wscoll (trx=..., apply_cb=0x628a27 <wsrep_apply_cb(void*, void const*, unsigned long, long)>, recv_ctx=0x7fe600000910) at galera/src/replicator_smm.cpp:37

#13 apply_trx_ws (recv_ctx=0x7fe600000910, apply_cb=0x628a27 <wsrep_apply_cb(void*, void const*, unsigned long, long)>, commit_cb=0x628d32 <wsrep_commit_cb(void*, long, bool)>, trx=...) at galera/src/replicator_smm.cpp:81

#14 0x00007fe634d3600f in galera::ReplicatorSMM::replay_trx (this=0x291ded0, trx=0x41cd280, trx_ctx=0x7fe600000910) at galera/src/replicator_smm.cpp:821

#15 0x00007fe634d4db76 in galera_replay_trx (gh=<optimized out>, trx_handle=<optimized out>, recv_ctx=0x7fe600000910) at galera/src/wsrep_provider.cpp:658

#16 0x000000000062330b in wsrep_mysql_parse (thd=0x7fe600000910, rawbuf=0x4083178 "UPDATE `table20_innodb_int_autoinc` SET `col_char_1_key` = 'l' ORDER BY `col_char_1_key`,`pk` LIMIT 5", length=101, parser_state=0x7fe60789b550) at maria-5.5-galera/sql/sql_parse.cc:6085

#17 0x000000000061543b in dispatch_command (command=COM_QUERY, thd=0x7fe600000910, packet=0x7fe600006561 "UPDATE `table20_innodb_int_autoinc` SET `col_char_1_key` = 'l' ORDER BY `col_char_1_key`,`pk` LIMIT 5", packet_length=101) at maria-5.5-galera/sql/sql_parse.cc:1230

#18 0x000000000061429e in do_command (thd=0x7fe600000910) at maria-5.5-galera/sql/sql_parse.cc:890

#19 0x000000000071c1b8 in do_handle_one_connection (thd_arg=0x7fe600000910) at maria-5.5-galera/sql/sql_connect.cc:1278

Trying to get some variables.

Some pointers may be invalid and cause the dump to abort.

Query (0x4083178): UPDATE `table20_innodb_int_autoinc` SET `col_char_1_key` = 'l' ORDER BY `col_char_1_key`,`pk` LIMIT 5

Connection ID (thread ID): 7

Status: KILL_QUERY

branch: maria-5.5-galera

revision-id: seppo.jaakola@codership.com-20121130113629-lhwlr2ncrib15h18

date: 2012-11-30 13:36:29 +0200

revno: 3358

Command lines:

maria-5.5-galera/sql/mysqld --defaults-file=maria-5.5-galera/mydef1.cnf --datadir=maria-5.5-galera/data1 --wsrep_provider=galera-23.2.2-src/libgalera_smm.so --wsrep_sst_method=rsync --core --default-storage-engine=InnoDB --innodb_autoinc_lock_mode=2 --innodb_locks_unsafe_for_binlog=1 --binlog-format=row --innodb_flush_log_at_trx_commit=0 --log-error=log.err --basedir=maria-5.5-galera/ --port=8306 --loose-lc-messages-dir=maria-5.5-galera/sql/share --socket=/tmp/elenst-galera-1.sock --tmpdir=maria-5.5-galera/data1/tmp --general-log=1 --wsrep_cluster_address=gcomm:// --core --log-bin=master-bin

maria-5.5-galera/sql/mysqld --defaults-file=maria-5.5-galera/mydef2.cnf --datadir=maria-5.5-galera/data2 --wsrep_provider=galera-23.2.2-src/libgalera_smm.so --wsrep_sst_method=rsync --core --default-storage-engine=InnoDB --innodb_autoinc_lock_mode=2 --innodb_locks_unsafe_for_binlog=1 --binlog-format=row --innodb_flush_log_at_trx_commit=0 --log-error=log.err --basedir=maria-5.5-galera/ --port=8307 --loose-lc-messages-dir=maria-5.5-galera/sql/share --socket=/tmp/elenst-galera-2.sock --tmpdir=maria-5.5-galera/data2/tmp --general-log=1 --wsrep_cluster_address=gcomm://127.0.0.1:4567?gmcast.listen_addr=tcp://127.0.0.1:4566 --core --log-bin=master-bin

(mydefX.cnf files are irrelevant, they only contain datadirs and ports).

The test is run via RQG, one instance per node. The data file is the same for both instances, grammars slightly differ.

RQG command lines:

perl gentest.pl --gendata=1.zz --threads=2 --queries=100M --duration=21600 --dsn=dbi:mysql:host=127.0.0.1:port=8306:user=root:database=test --grammar=1a.yy

perl gentest.pl --gendata=1.zz  --threads=1 --queries=100M --duration=21600 --dsn=dbi:mysql:host=127.0.0.1:port=8307:user=root:database=test --grammar=1b.yy

data file (1.zz):

$tables = {

        rows => [ 20 ],

        engines => [ 'InnoDB' ]

};

$fields = {

        types => [ 'char(1)' ],

        pk => [ 'int' ],

        indexes => [ 'key' ]

};

$data = {

        numbers => [ 'digit' ],

        strings => [ 'letter' ]

Grammar for the 1st node (1a.yy):

thread2_init:

        SELECT CONNECTION_ID() INTO @killer;

thread2:

        KILL QUERY @killer - 1 ;

query:

        UPDATE _table SET _field_no_pk = _varchar(1) ORDER BY _field_list LIMIT 8 ;

Grammar for the 2nd node (1b.yy):

query:

        UPDATE _table SET _field_no_pk = _char(1) LIMIT 3 ;

An example of GRA file produced upon the failure is attached.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

GRA_7_48495.log
2012-12-09 02:54
0.2 kB
Elena Stepanova

Activity

People

Assignee:: Seppo Jaakola

Reporter:: Elena Stepanova

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2012-12-09 02:54

Updated:: 2013-05-27 23:09

Resolved:: 2013-05-27 23:09

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Activity

People

Dates

Git Integration