[MDEV-4328] maria-galera: Error in Log_event::read_log_event() Found invalid event in binary log Created: 2013-03-26  Updated: 2015-03-28  Resolved: 2013-03-26

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: 5.5.29-galera
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Henrik Ingo Assignee: Seppo Jaakola
Resolution: Fixed Votes: 0
Labels: galera
Environment:
  1. uname -a
    Linux skycluster_2.localdomain 3.2.39-6.88.amzn1.x86_64 #1 SMP Sat Mar 2 05:13:37 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
    Server version: 5.5.29-MariaDB-log MariaDB Server, wsrep_23.7.3.rXXXX
    Variable_name: wsrep_provider_version
    Value: 23.2.4(r147)

Attachments: File GRA_files.tgz     File my.cnf     Text File show variables all of them.txt     Text File show variables.txt     File test_sysbench.sh    

 Description   

The following bug surfaced by running sysbench 0.5 via the attached script (test_sysbench). Note that the script will do a few other things too, such as DROP and CREATE DATABASE. This may not be significant.

Sysbench is run against node 1, with a passive slave node 2. During the sysbench prepare run, one of the CREATE TABLE statements will almost always fail with the following in error log on node 2:

130325 21:33:49 [ERROR] Error in Log_event::read_log_event(): 'Found invalid event in binary log', data_len: 543, event_type: 0
130325 21:33:49 [ERROR] WSREP: applier could not read binlog event, seqno: 7835447, len: -241
130325 21:33:49 [Warning] WSREP: Ignoring error for TO isolated action: source: a63b2d56-958e-11e2-0800-5afc9b8110d7 version: 2 local: 0 state: APPLYING flags: 65 conn_id: 6 trx_id: -1 seqnos (l: 8, g: 7835447, s: 7835445, d: 7835446, ts: 1364247229361613486)
130325 21:33:49 [ERROR] Slave SQL: Error 'Table 'sbtest.sbtest1' doesn't exist' on query. Default database: 'sbtest'. Query: 'CREATE INDEX k on sbtest1(k)', Error_code: 1146
130325 21:33:49 [Warning] WSREP: RBR event 2 Query apply warning: 1, 7835455
130325 21:33:49 [Warning] WSREP: Ignoring error for TO isolated action: source: a63b2d56-958e-11e2-0800-5afc9b8110d7 version: 2 local: 0 state: APPLYING flags: 65 conn_id: 6 trx_id: -1 seqnos (l: 16, g: 7835455, s: 7835447, d: 7835454, ts: 1364247229401653436)
130325 21:33:50 [Warning] WSREP: BF applier failed to open_and_lock_tables: 1146, fatal: 0 wsrep = (exec_mode: 1 conflict_state: 0 seqno: 7835464)
130325 21:33:50 [ERROR] Slave SQL: Error executing row event: 'Table 'sbtest.sbtest1' doesn't exist', Error_code: 1146
130325 21:33:50 [Warning] WSREP: RBR event 2 Write_rows apply warning: 1146, 7835464
130325 21:33:50 [ERROR] WSREP: Failed to apply trx: source: a63b2d56-958e-11e2-0800-5afc9b8110d7 version: 2 local: 0 state: APPLYING flags: 1 conn_id: 6 trx_id: 23477316 seqnos (l: 25, g: 7835464, s: 7835463, d: 7835455, ts: 1364247229606629441)
130325 21:33:50 [ERROR] WSREP: Failed to apply app buffer: seqno: 7835464, status: WSREP_FATAL
         at galera/src/replicator_smm.cpp:apply_wscoll():53
         at galera/src/replicator_smm.cpp:apply_trx_ws():120
130325 21:33:50 [ERROR] WSREP: Node consistency compromized, aborting...

This is the binlog from node 1 after restart:

/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#130325 20:57:52 server id 1  end_log_pos 245   Start: binlog v 4, server v 5.5.29-MariaDB-log created 130325 20:57:52 at startup
# Warning: this binlog is either in use or was not closed properly.
ROLLBACK/*!*/;
BINLOG '
ULpQUQ8BAAAA8QAAAPUAAAABAAQANS41LjI5LU1hcmlhREItbG9nAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAABQulBREzgNAAgAEgAEBAQEEgAA2QAEGggAAAAICAgCAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAvuiXAw==
'/*!*/;
# at 245
#130325 21:33:49 server id 1  end_log_pos 345   Query   thread_id=3     exec_time=0     error_code=0
SET TIMESTAMP=1364247229/*!*/;
SET @@session.pseudo_thread_id=3/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=0/*!*/;
SET @@session.auto_increment_increment=2, @@session.auto_increment_offset=1/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=8/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
DROP DATABASE IF EXISTS sbtest
/*!*/;
# at 345
#130325 21:33:49 server id 1  end_log_pos 437   Query   thread_id=4     exec_time=0     error_code=0
SET TIMESTAMP=1364247229/*!*/;
CREATE DATABASE sbtest
/*!*/;
# at 437
#130325 21:33:49 server id 1  end_log_pos 743   Query   thread_id=5     exec_time=0     error_code=0
use `sbtest`/*!*/;
SET TIMESTAMP=1364247229/*!*/;
/*!\C latin1 *//*!*/;
SET @@session.character_set_client=8,@@session.collation_connection=8,@@session.collation_server=8/*!*/;
CREATE TABLE sbtest2 (
id INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
k INTEGER UNSIGNED DEFAULT '0' NOT NULL,
c CHAR(120) DEFAULT '' NOT NULL,
pad CHAR(60) DEFAULT '' NOT NULL,
PRIMARY KEY (id)
) /*! ENGINE = innodb MAX_ROWS = 1000000 */
/*!*/;
# at 743
#130325 21:33:49 server id 1  end_log_pos 1049  Query   thread_id=6     exec_time=0     error_code=0
SET TIMESTAMP=1364247229/*!*/;
CREATE TABLE sbtest1 (
id INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
k INTEGER UNSIGNED DEFAULT '0' NOT NULL,
c CHAR(120) DEFAULT '' NOT NULL,
pad CHAR(60) DEFAULT '' NOT NULL,
PRIMARY KEY (id)
) /*! ENGINE = innodb MAX_ROWS = 1000000 */
/*!*/;
# at 1049
#130325 21:33:49 server id 1  end_log_pos 1355  Query   thread_id=9     exec_time=0     error_code=0
SET TIMESTAMP=1364247229/*!*/;
CREATE TABLE sbtest4 (
id INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
k INTEGER UNSIGNED DEFAULT '0' NOT NULL,
c CHAR(120) DEFAULT '' NOT NULL,
pad CHAR(60) DEFAULT '' NOT NULL,
PRIMARY KEY (id)
) /*! ENGINE = innodb MAX_ROWS = 1000000 */
/*!*/;

Note that sbtest1 is the second CREATE TABLE statement that was committed. All other tables from sbtest2 to sbtest3 have been successfully created on the slave. sbtest2 even received one insert before node shuts down due to insert to sbtest1 failing.

Note that setting wsrep_debug=1 makes this issue go away. (Ie it is a usable workaround!)



 Comments   
Comment by Henrik Ingo [ 2013-03-26 ]

The GRA files. The first one is the error, the two others are only a consequence of the table missing.

Comment by Henrik Ingo [ 2013-03-26 ]

Elena: Just FYI, this surfaced in test run by Ivan Zoratti. I then helped him to verify it.

Comment by Elena Stepanova [ 2013-03-26 ]

Hi Henrik,

Do you happen to have the cnf file or full show variables output (not just wsrep)?
I can't reproduce it so far in my environment, so something might be missing.
Also, do you know which version of galera was used? Maybe I have a different one, I tried 23.2.2 and 23.2.4.

Comment by Henrik Ingo [ 2013-03-26 ]

Attaching my.cnf and show variables.

Galera version is 23.2.4 and this is an Amazon Linux AMI (see Environment).

Comment by Seppo Jaakola [ 2013-03-26 ]

Confirmed, reproduced with MariaDB Galera Cluster 5.5.29. But, cannot reproduce with upstream Galera Cluster revisions.

Here is a simple command for reproducing:

sysbench --test=tests/db/parallel_prepare.lua --oltp_tables_count=100 --oltp-table-size=10 --num-threads=10 --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-user=root run

Comment by Seppo Jaakola [ 2013-03-26 ]

This bug is regression from: http://bazaar.launchpad.net/~maria-captains/maria/maria-5.5-galera/revision/3389

Comment by Seppo Jaakola [ 2013-03-26 ]

Fix has been pushed in revision: http://bazaar.launchpad.net/~maria-captains/maria/maria-5.5-galera/revision/3391

Comment by Seppo Jaakola [ 2013-03-26 ]

The bug was due to a race condition happening when several concurrent DDL statements were processed, sysbench's parallel_prepare causes such work load

Generated at Thu Feb 08 06:55:34 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.