Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-633

LP:1024058 - mysqld XA crash in replication slave

Details

    Description

      We found a simple XA transaction that crashes MySQL 5.5 replication. This simple transaction merely inserts into InnoDB and TokuDB tables. The bug was caused by a flaw in the logging code exposed by the transaction’s use of two XA storage engines (TokuDB and InnoDB) and was fixed in the TokuDB 6.0.1 release.

      Here are some details. Suppose that a database contains the following tables.

      create table t1 (a int) engine=InnoDB
      create table t2 (a int) engine=TokuDB

      The following transaction

      begin
      insert into t1 values (1)
      insert into t2 values (2)
      commit

      causes the replication slave to crash.

      The crash occurs when mysqld tries to dereference a NULL pointer.

      #4  0x000000000088e203 in MYSQL_BIN_LOG::log_and_order (this=0x14b8640, thd=0x7f7758000af0, xid=161, all=true, need_prepare_ordered=false, need_commit_ordered=true) at /home/mariadb-5.5.25/sql/log.cc:7491
      7491	  cache_mngr->using_xa= TRUE;
      (gdb) p cache_mngr
      $1 = (binlog_cache_mngr *) 0x0

      the bug is fixed on lp:~prohaska7/5.5-xa-rpl-crash-fix

      also, see mariadb-developers email thread.

      Attachments

        Activity

          richprohaska Rich Prohaska created issue -

          Launchpad bug id: 1024058

          ratzpo Rasmus Johansson (Inactive) added a comment - Launchpad bug id: 1024058
          ratzpo Rasmus Johansson (Inactive) made changes -
          Field Original Value New Value
          Labels Launchpad
          ratzpo Rasmus Johansson (Inactive) made changes -
          Fix Version/s Maria 5.5 [ 11303 ]
          Labels Launchpad Launchpad MariaDB_5.5
          ratzpo Rasmus Johansson (Inactive) made changes -
          Key IMT-6621 MDEV-633
          Project ImportTest [ 10200 ] MariaDB Development [ 10000 ]
          Workflow jira [ 20270 ] defaullt [ 21477 ]
          serg Sergei Golubchik made changes -
          Fix Version/s 5.5.29 [ 11701 ]
          serg Sergei Golubchik made changes -
          Labels Launchpad MariaDB_5.5 Launchpad
          serg Sergei Golubchik made changes -
          Affects Version/s 5.5.28 [ 11200 ]
          serg Sergei Golubchik made changes -
          Description We found a simple XA transaction that crashes MySQL 5.5 replication. This simple transaction merely inserts into InnoDB and TokuDB tables. The bug was caused by a flaw in the logging code exposed by the transaction’s use of two XA storage engines (TokuDB and InnoDB) and was fixed in the TokuDB 6.0.1 release.

          Here are some details. Suppose that a database contains the following tables.
          create table t1 (a int) engine=InnoDB
          create table t2 (a int) engine=TokuDB

          The following transaction
          begin
          insert into t1 values (1)
          insert into t2 values (2)
          commit
          causes the replication slave to crash.

          The crash occurs when mysqld tries to dereference a NULL pointer.

          #4 0x000000000088e203 in MYSQL_BIN_LOG::log_and_order (this=0x14b8640, thd=0x7f7758000af0, xid=161, all=true, need_prepare_ordered=false, need_commit_ordered=true) at /home/mariadb-5.5.25/sql/log.cc:7491
          7491 cache_mngr->using_xa= TRUE;
          (gdb) p cache_mngr
          $1 = (binlog_cache_mngr *) 0x0

          the bug is fixed on lp:~prohaska7/5.5-xa-rpl-crash-fix

          also, see mariadb developers email chain.
          We found a simple XA transaction that crashes MySQL 5.5 replication. This simple transaction merely inserts into InnoDB and TokuDB tables. The bug was caused by a flaw in the logging code exposed by the transaction’s use of two XA storage engines (TokuDB and InnoDB) and was fixed in the TokuDB 6.0.1 release.

          Here are some details. Suppose that a database contains the following tables.
          {noformat}
          create table t1 (a int) engine=InnoDB
          create table t2 (a int) engine=TokuDB
          {noformat}

          The following transaction
          {noformat}
          begin
          insert into t1 values (1)
          insert into t2 values (2)
          commit
          {noformat}
          causes the replication slave to crash.

          The crash occurs when mysqld tries to dereference a NULL pointer.

          {noformat}
          #4 0x000000000088e203 in MYSQL_BIN_LOG::log_and_order (this=0x14b8640, thd=0x7f7758000af0, xid=161, all=true, need_prepare_ordered=false, need_commit_ordered=true) at /home/mariadb-5.5.25/sql/log.cc:7491
          7491 cache_mngr->using_xa= TRUE;
          (gdb) p cache_mngr
          $1 = (binlog_cache_mngr *) 0x0
          {noformat}

          the bug is fixed on lp:~prohaska7/5.5-xa-rpl-crash-fix

          also, see mariadb-developers email thread.

          Although none of your patch is present in the current MariaDB 5.5, I failed to reproduce the crash with InnoDB and PBXT and the your test case.

          If you could provide more info, so that I'd be able to reproduce it, feel free to reopen this bug.

          serg Sergei Golubchik added a comment - Although none of your patch is present in the current MariaDB 5.5, I failed to reproduce the crash with InnoDB and PBXT and the your test case. If you could provide more info, so that I'd be able to reproduce it, feel free to reopen this bug.
          serg Sergei Golubchik made changes -
          Resolution Cannot Reproduce [ 5 ]
          Status Open [ 1 ] Closed [ 6 ]

          got more info from the reporter

          serg Sergei Golubchik added a comment - got more info from the reporter
          serg Sergei Golubchik made changes -
          Resolution Cannot Reproduce [ 5 ]
          Status Closed [ 6 ] Reopened [ 4 ]
          serg Sergei Golubchik made changes -
          Fix Version/s 5.5.29 [ 12102 ]
          Fix Version/s 5.5.28a [ 11701 ]
          serg Sergei Golubchik made changes -
          Status Reopened [ 4 ] In Progress [ 3 ]

          The problem here is very simple to explain. The server can use either mmap-based transaction coordinator for 2PC or a binary log. 2PC always uses binary log, if binary logging is enabled. But even if it is enabled globally, it is usually disabled in the replication slave thread unless --log-slave-updates is specified.

          One would probably get the same crash without replication, if one disables binary log manually with SET SQL_LOG_BIN=0;

          Possible fixes:

          • auto-enable binary log for 2PC transactions (bad, binlog will contain unwanted DDL and DML events).
          • abort 2PC transactions if binary log is locally disabled for this thread (worse, too easy to break the replication)
          • write Xid events to binlog even if binary log is locally disabled (best?)

          The last approach seems to be preferable. But in the future if we'll start recovering transactions from the binary log (doing only one sync per 2PC transaction), we'll have this problem again, because then we'll need the actual changes to be logged, not just the Xid.

          knielsen - opinion?

          serg Sergei Golubchik added a comment - The problem here is very simple to explain. The server can use either mmap-based transaction coordinator for 2PC or a binary log. 2PC always uses binary log, if binary logging is enabled. But even if it is enabled globally, it is usually disabled in the replication slave thread unless --log-slave-updates is specified. One would probably get the same crash without replication, if one disables binary log manually with SET SQL_LOG_BIN=0; Possible fixes: auto-enable binary log for 2PC transactions (bad, binlog will contain unwanted DDL and DML events). abort 2PC transactions if binary log is locally disabled for this thread (worse, too easy to break the replication) write Xid events to binlog even if binary log is locally disabled (best?) The last approach seems to be preferable. But in the future if we'll start recovering transactions from the binary log (doing only one sync per 2PC transaction), we'll have this problem again, because then we'll need the actual changes to be logged, not just the Xid. knielsen - opinion?
          serg Sergei Golubchik made changes -
          Labels Launchpad Launchpad replication
          serg Sergei Golubchik made changes -
          Resolution Fixed [ 1 ]
          Status In Progress [ 3 ] Closed [ 6 ]
          serg Sergei Golubchik made changes -
          Workflow defaullt [ 21477 ] MariaDB v2 [ 46470 ]
          ratzpo Rasmus Johansson (Inactive) made changes -
          Workflow MariaDB v2 [ 46470 ] MariaDB v3 [ 67172 ]
          serg Sergei Golubchik made changes -
          Workflow MariaDB v3 [ 67172 ] MariaDB v4 [ 145080 ]

          People

            serg Sergei Golubchik
            richprohaska Rich Prohaska
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.