Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32020

XA transaction replicates incorrectly, must be applied at XA COMMIT, not XA PREPARE

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Critical
    • Resolution: Unresolved
    • 10.5.2
    • 10.5
    • Replication, XA
    • None

    Description

      XA changes done in 10.5 introduces a regression that breaks replication.

      The problem is that the slave now applies XA transactions while replicating
      an XA_prepare_log_event binlogged by "XA PREPARE" on the master. This is
      wrong, the transaction must not be applied on the slave until "XA COMMIT",
      as it is done correctly in 10.4.

      Applying the XA PREPARE on the slave leaves dangling InnoDB row locks that
      can conflict with the replication of later transactions and cause
      replication to break. The below test case (also attached) demonstrates one
      simple instance of this.

      Another problem is that splitting a transaction in this way in the binlog
      means there is no longer a unique binlog position corresponding to the
      database state. This is demonstrated by the attached testcase
      rpl_xa_provision.test .

      This test case takes a mysqldump while an XA PREPARED transaction is active
      on the master, and uses it to provision a new slave. The new slave's GTID
      position cannot be set correctly. Setting it after the XA PREPARED
      transaction means the XA COMMIT will fail. But setting it before the XA
      PREPARE would also not be correct, as it would duplicate transactions
      binlogged after the XA PREPARE. Thus, in 10.5, the provisioned slave breaks
      its replication.

      The fix is to revert the change so that XA transactions are applied on the
      slave only as part of the XA COMMIT event. When the XA PREPARE event is
      received by the slave, it must not be applied. Instead it can be saved
      somewhere (there are several possible designs). In case of a master crash
      and the slave is promoted as the new master, those saved XA PREPAREd events
      can then be used to recover the XA transaction into the prepared state for
      the application to XA COMMIT or XA ROLLBACK.

      High-level design description for this fix:

      1. At XA PREPARE, the master will binlog a special Xa_prepared_trx_log_event
      event containing the contents of the binlog trx cache. This is binlogged
      without a GTID. The existing binlog_checkpoint mechanism is used to preserve
      the binlog file while this XA transaction is in the prepared state.

      2. The server keeps a record of pending XA PREPARE and their position in the
      binlog, for later reference. This is reconstructed from scanning the binlog
      at server startup/crash recovery. Optionally, this can be optimized to save
      the information (like the GTID state) at clean shutdown to avoid the scan in
      non-crash-recovery case.

      3. At XA COMMIT, the corresponding binlog trx cache data is read from the
      Xa_prepared_trx_log_event entry in the binlog, and a full, normal commit
      transaction is binlogged containing the event data.

      4. Optionally, we can optimize XA COMMIT to only binlog a placeholder with
      reference back to the Xa_prepared_trx_log_event entry, and use the binlog
      checkpoint mechanism to preserve the binlog file containing the entry for as
      long as needed.

      5. By default, the slave will ignore the Xa_prepared_trx_log_event (the dump
      thread can simply skip sending these events to the slave, or the slave can
      just ignore them), and the XA COMMIT will be replicated as a normal commit.

      6. The user can enable some --replicate-xa option, which will make the slave
      process the Xa_prepared_trx_log_event. This will require --log-slave-updates
      enabled on the slave. When the Xa_prepared_trx_log_event is processed on the
      slave, it is simply binlogged, no events are applied. When the commit events
      are received, they are applied as a normal transaction, and the pending
      Xa_prepared_trx_log_event in the slave's binlog is released.

      7. If the optimization in (4) is implemented, the placeholder commit event
      on the slave will read the even data to be applied from the relay log or
      from the slave's binlog if the data has been processed and the corresponding
      relay log purged. The dump thread on the master will check each placeholder
      commit event if the slave has already been sent the corresponding
      Xa_prepared_trx_log_event, and if not will read out the full commit event
      and send to the slave for normal commit processing.

      8. When the slave is promoted to a new master, any pending XA PREPAREd
      events that have been processed on the slave can be explicitly instantiated
      by the user into the XA prepared, committed, or rolled back state. The
      corresponding XIDs are listed in XA RECOVER on the slave, and can be
      specified in SQL statements XA PREARE <xid>, XA COMMIT <xid>, XA ROLLBACK
      <xid>. These statements will then read out the pending events and apply them
      as normal. (The exact syntax for this is up for discussion, maybe a separate
      option keyword or different statement should be used for slave promotion).

      9. If the locking happens to be different on the slave than on the master
      when promoting pending XA prepared, these might get locking conflicts.
      However, since these can be run by the user in any order and in parallel,
      promotion can still proceed and once the external transaction coordinator
      issues commit or rollback decisions for one transaction, the blocked
      promotion can then continue. This way, statement-based replication and
      non-primary-key row-based replication can still for for XA promotion on the
      slave.

      A Proof-of-concept implementation of this design has been pushed to branch
      knielsen_mdev32020. This patch is not yet ready for testing, but it
      demonstrates the feasibility of all parts of the design. Once the high-level
      design has been finalized wrt. precise syntax to use etc., and a decision
      has been made on which version to implement this in, the patch can be easily
      completed according to embedded "ToDo" comments.

      --source include/have_innodb.inc
      --source include/have_binlog_format_row.inc
      --source include/master-slave.inc
       
      --connection master
       
      CREATE TABLE t1 (a int, b int, c int,
        INDEX i1(a),
        INDEX i2(b))
        ENGINE=InnoDB;
       
      INSERT INTO t1 VALUES
        (1,1,0), (1,2,0),
        (2,1,0), (2,2,0);
      --sync_slave_with_master
       
      --source include/stop_slave.inc
      SET @old_timeout= @@GLOBAL.innodb_lock_wait_timeout;
      SET @old_retries= @@GLOBAL.slave_transaction_retries;
      SET GLOBAL innodb_lock_wait_timeout= 2;
      SET GLOBAL slave_transaction_retries= 3;
      --source include/start_slave.inc
       
      --connection master
      XA START "t1";
      UPDATE t1 FORCE INDEX (i2) SET c=c+1 WHERE a=1 AND b=1;
      XA END "t1";
      XA PREPARE "t1";
       
      --connection master1
      XA START "t2";
      UPDATE t1 FORCE INDEX (i2) SET c=c+1 WHERE a=1 AND b=2;
      XA END "t2";
      XA PREPARE "t2";
       
      --connection master
      XA COMMIT "t1";
       
      --connection master1
      XA COMMIT "t2";
       
      --connection master
      SELECT * FROM t1 ORDER BY a,b,c;
       
      --sync_slave_with_master
      SELECT * FROM t1 ORDER BY a,b,c;
       
      # Cleanup
      --connection master
      DROP TABLE t1;
       
      --connection slave
      SET GLOBAL innodb_lock_wait_timeout= @old_timeout;
      SET GLOBAL slave_transaction_retries= @old_retries;
       
      --source include/rpl_end.inc
      

      Attachments

        Issue Links

          Activity

            People

              knielsen Kristian Nielsen
              knielsen Kristian Nielsen
              Votes:
              3 Vote for this issue
              Watchers:
              17 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.