Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-4328

maria-galera: Error in Log_event::read_log_event() Found invalid event in binary log

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 5.5.29-galera
    • None
    • None

    Description

      The following bug surfaced by running sysbench 0.5 via the attached script (test_sysbench). Note that the script will do a few other things too, such as DROP and CREATE DATABASE. This may not be significant.

      Sysbench is run against node 1, with a passive slave node 2. During the sysbench prepare run, one of the CREATE TABLE statements will almost always fail with the following in error log on node 2:

      130325 21:33:49 [ERROR] Error in Log_event::read_log_event(): 'Found invalid event in binary log', data_len: 543, event_type: 0
      130325 21:33:49 [ERROR] WSREP: applier could not read binlog event, seqno: 7835447, len: -241
      130325 21:33:49 [Warning] WSREP: Ignoring error for TO isolated action: source: a63b2d56-958e-11e2-0800-5afc9b8110d7 version: 2 local: 0 state: APPLYING flags: 65 conn_id: 6 trx_id: -1 seqnos (l: 8, g: 7835447, s: 7835445, d: 7835446, ts: 1364247229361613486)
      130325 21:33:49 [ERROR] Slave SQL: Error 'Table 'sbtest.sbtest1' doesn't exist' on query. Default database: 'sbtest'. Query: 'CREATE INDEX k on sbtest1(k)', Error_code: 1146
      130325 21:33:49 [Warning] WSREP: RBR event 2 Query apply warning: 1, 7835455
      130325 21:33:49 [Warning] WSREP: Ignoring error for TO isolated action: source: a63b2d56-958e-11e2-0800-5afc9b8110d7 version: 2 local: 0 state: APPLYING flags: 65 conn_id: 6 trx_id: -1 seqnos (l: 16, g: 7835455, s: 7835447, d: 7835454, ts: 1364247229401653436)
      130325 21:33:50 [Warning] WSREP: BF applier failed to open_and_lock_tables: 1146, fatal: 0 wsrep = (exec_mode: 1 conflict_state: 0 seqno: 7835464)
      130325 21:33:50 [ERROR] Slave SQL: Error executing row event: 'Table 'sbtest.sbtest1' doesn't exist', Error_code: 1146
      130325 21:33:50 [Warning] WSREP: RBR event 2 Write_rows apply warning: 1146, 7835464
      130325 21:33:50 [ERROR] WSREP: Failed to apply trx: source: a63b2d56-958e-11e2-0800-5afc9b8110d7 version: 2 local: 0 state: APPLYING flags: 1 conn_id: 6 trx_id: 23477316 seqnos (l: 25, g: 7835464, s: 7835463, d: 7835455, ts: 1364247229606629441)
      130325 21:33:50 [ERROR] WSREP: Failed to apply app buffer: seqno: 7835464, status: WSREP_FATAL
               at galera/src/replicator_smm.cpp:apply_wscoll():53
               at galera/src/replicator_smm.cpp:apply_trx_ws():120
      130325 21:33:50 [ERROR] WSREP: Node consistency compromized, aborting...

      This is the binlog from node 1 after restart:

      /*!40019 SET @@session.max_insert_delayed_threads=0*/;
      /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
      DELIMITER /*!*/;
      # at 4
      #130325 20:57:52 server id 1  end_log_pos 245   Start: binlog v 4, server v 5.5.29-MariaDB-log created 130325 20:57:52 at startup
      # Warning: this binlog is either in use or was not closed properly.
      ROLLBACK/*!*/;
      BINLOG '
      ULpQUQ8BAAAA8QAAAPUAAAABAAQANS41LjI5LU1hcmlhREItbG9nAAAAAAAAAAAAAAAAAAAAAAAA
      AAAAAAAAAAAAAAAAAABQulBREzgNAAgAEgAEBAQEEgAA2QAEGggAAAAICAgCAAAAAAAAAAAAAAAA
      AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
      AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
      AAAAAAAAAAAAvuiXAw==
      '/*!*/;
      # at 245
      #130325 21:33:49 server id 1  end_log_pos 345   Query   thread_id=3     exec_time=0     error_code=0
      SET TIMESTAMP=1364247229/*!*/;
      SET @@session.pseudo_thread_id=3/*!*/;
      SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
      SET @@session.sql_mode=0/*!*/;
      SET @@session.auto_increment_increment=2, @@session.auto_increment_offset=1/*!*/;
      /*!\C utf8 *//*!*/;
      SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=8/*!*/;
      SET @@session.lc_time_names=0/*!*/;
      SET @@session.collation_database=DEFAULT/*!*/;
      DROP DATABASE IF EXISTS sbtest
      /*!*/;
      # at 345
      #130325 21:33:49 server id 1  end_log_pos 437   Query   thread_id=4     exec_time=0     error_code=0
      SET TIMESTAMP=1364247229/*!*/;
      CREATE DATABASE sbtest
      /*!*/;
      # at 437
      #130325 21:33:49 server id 1  end_log_pos 743   Query   thread_id=5     exec_time=0     error_code=0
      use `sbtest`/*!*/;
      SET TIMESTAMP=1364247229/*!*/;
      /*!\C latin1 *//*!*/;
      SET @@session.character_set_client=8,@@session.collation_connection=8,@@session.collation_server=8/*!*/;
      CREATE TABLE sbtest2 (
      id INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
      k INTEGER UNSIGNED DEFAULT '0' NOT NULL,
      c CHAR(120) DEFAULT '' NOT NULL,
      pad CHAR(60) DEFAULT '' NOT NULL,
      PRIMARY KEY (id)
      ) /*! ENGINE = innodb MAX_ROWS = 1000000 */
      /*!*/;
      # at 743
      #130325 21:33:49 server id 1  end_log_pos 1049  Query   thread_id=6     exec_time=0     error_code=0
      SET TIMESTAMP=1364247229/*!*/;
      CREATE TABLE sbtest1 (
      id INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
      k INTEGER UNSIGNED DEFAULT '0' NOT NULL,
      c CHAR(120) DEFAULT '' NOT NULL,
      pad CHAR(60) DEFAULT '' NOT NULL,
      PRIMARY KEY (id)
      ) /*! ENGINE = innodb MAX_ROWS = 1000000 */
      /*!*/;
      # at 1049
      #130325 21:33:49 server id 1  end_log_pos 1355  Query   thread_id=9     exec_time=0     error_code=0
      SET TIMESTAMP=1364247229/*!*/;
      CREATE TABLE sbtest4 (
      id INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
      k INTEGER UNSIGNED DEFAULT '0' NOT NULL,
      c CHAR(120) DEFAULT '' NOT NULL,
      pad CHAR(60) DEFAULT '' NOT NULL,
      PRIMARY KEY (id)
      ) /*! ENGINE = innodb MAX_ROWS = 1000000 */
      /*!*/;

      Note that sbtest1 is the second CREATE TABLE statement that was committed. All other tables from sbtest2 to sbtest3 have been successfully created on the slave. sbtest2 even received one insert before node shuts down due to insert to sbtest1 failing.

      Note that setting wsrep_debug=1 makes this issue go away. (Ie it is a usable workaround!)

      Attachments

        1. GRA_files.tgz
          259 kB
        2. my.cnf
          2 kB
        3. show variables.txt
          6 kB
        4. show variables all of them.txt
          56 kB
        5. test_sysbench.sh
          3 kB

        Activity

          hingo Henrik Ingo added a comment -

          The GRA files. The first one is the error, the two others are only a consequence of the table missing.

          hingo Henrik Ingo added a comment - The GRA files. The first one is the error, the two others are only a consequence of the table missing.
          hingo Henrik Ingo added a comment -

          Elena: Just FYI, this surfaced in test run by Ivan Zoratti. I then helped him to verify it.

          hingo Henrik Ingo added a comment - Elena: Just FYI, this surfaced in test run by Ivan Zoratti. I then helped him to verify it.

          Hi Henrik,

          Do you happen to have the cnf file or full show variables output (not just wsrep)?
          I can't reproduce it so far in my environment, so something might be missing.
          Also, do you know which version of galera was used? Maybe I have a different one, I tried 23.2.2 and 23.2.4.

          elenst Elena Stepanova added a comment - Hi Henrik, Do you happen to have the cnf file or full show variables output (not just wsrep)? I can't reproduce it so far in my environment, so something might be missing. Also, do you know which version of galera was used? Maybe I have a different one, I tried 23.2.2 and 23.2.4.
          hingo Henrik Ingo added a comment -

          Attaching my.cnf and show variables.

          Galera version is 23.2.4 and this is an Amazon Linux AMI (see Environment).

          hingo Henrik Ingo added a comment - Attaching my.cnf and show variables. Galera version is 23.2.4 and this is an Amazon Linux AMI (see Environment).
          seppo Seppo Jaakola added a comment -

          Confirmed, reproduced with MariaDB Galera Cluster 5.5.29. But, cannot reproduce with upstream Galera Cluster revisions.

          Here is a simple command for reproducing:

          sysbench --test=tests/db/parallel_prepare.lua --oltp_tables_count=100 --oltp-table-size=10 --num-threads=10 --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-user=root run

          seppo Seppo Jaakola added a comment - Confirmed, reproduced with MariaDB Galera Cluster 5.5.29. But, cannot reproduce with upstream Galera Cluster revisions. Here is a simple command for reproducing: sysbench --test=tests/db/parallel_prepare.lua --oltp_tables_count=100 --oltp-table-size=10 --num-threads=10 --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-user=root run
          seppo Seppo Jaakola added a comment - This bug is regression from: http://bazaar.launchpad.net/~maria-captains/maria/maria-5.5-galera/revision/3389
          seppo Seppo Jaakola added a comment - Fix has been pushed in revision: http://bazaar.launchpad.net/~maria-captains/maria/maria-5.5-galera/revision/3391
          seppo Seppo Jaakola added a comment -

          The bug was due to a race condition happening when several concurrent DDL statements were processed, sysbench's parallel_prepare causes such work load

          seppo Seppo Jaakola added a comment - The bug was due to a race condition happening when several concurrent DDL statements were processed, sysbench's parallel_prepare causes such work load

          People

            seppo Seppo Jaakola
            hingo Henrik Ingo
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.