Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-29642

Server Crash During XA Prepare Can Break Replication

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Duplicate
    • 10.5, 10.6, 10.7(EOL), 10.8(EOL), 10.9(EOL), 10.10(EOL)
    • N/A
    • Replication
    • None

    Description

      If a slave crashes (unrelated) while processing an XA PREPARE such that the event fully commits in the binlog and innodb; however, crashes before updating gtid_slave_pos, attempts to restart the slave SQL thread will crash with errors such as out-of-order GTID attempt (if gtid strict mode is enabled) or XID already exists (otherwise). The following comment in Xid_apply_log_event::do_apply_event() documents this behavior.

        /*
          ...
          
          XA_PREPARE_LOG_EVENT also updates the gtid table *but* the update gets
          committed as separate "autocommit" transaction.
        */
      

      I think logic should be added to detect the possibility of a crash happening before the separate transaction completes, and if so, automatically update gtid slave state on restart, because gtid_binlog_pos will already be updated.

      Attachments

        Issue Links

          Activity

            Elkin Andrei Elkin added a comment -

            MDEV-21469 relates to this one. The current one rightfully claims gtid_slave_pos update should be a part of the replicated prepared XA.

            Elkin Andrei Elkin added a comment - MDEV-21469 relates to this one. The current one rightfully claims gtid_slave_pos update should be a part of the replicated prepared XA.

            I think bugs such as this is a clear indication that the design has not been thought through for the replication of user XA PREPARE.
            It's such a central design of GTID that the mysql.gtid_slave_pos table is updated in the same transaction as the transaction it belongs to. The user XA PREPARE needs to respect this part of the design, not break it.

            Let's do it differently. We can binlog and send to the slave the XA PREPARE, but don't apply the events on the slave.
            Then in the normal case, when XA COMMIT happens on the master, the events are applied on the slave as a normal transaction.
            This bug and a lot of other bugs will then simply go away.

            And then if the master crashes, implement suitable recovery code for the slave to recover the XA PREPAREd transactions when it is promoted as the master. This code will then be separate and not affect the logic of normal replication.

            I think this is a much cleaner design and should have some chance of working, at least.

            knielsen Kristian Nielsen added a comment - I think bugs such as this is a clear indication that the design has not been thought through for the replication of user XA PREPARE. It's such a central design of GTID that the mysql.gtid_slave_pos table is updated in the same transaction as the transaction it belongs to. The user XA PREPARE needs to respect this part of the design, not break it. Let's do it differently. We can binlog and send to the slave the XA PREPARE, but don't apply the events on the slave. Then in the normal case, when XA COMMIT happens on the master, the events are applied on the slave as a normal transaction. This bug and a lot of other bugs will then simply go away. And then if the master crashes, implement suitable recovery code for the slave to recover the XA PREPAREd transactions when it is promoted as the master. This code will then be separate and not affect the logic of normal replication. I think this is a much cleaner design and should have some chance of working, at least.
            Elkin Andrei Elkin added a comment - - edited

            knielsen, well bnestere, whose analysis of course was cool, was not aware of MDEV-21777 at reporting. In my comment I should've referred to it (not just to the related MDEV-21469) and close this one its duplicate.
            The plan has been to process GTID-insert as
            > a separate transaction to be two-phase-committed with the replicated one.
            That is XA_prepare_log_event::do_apply_event would execute a 2pc-like sequence of gtid_insert.prepare(xid), XA.prepare(xid), insert.commit(xid). How to recover having from Innodb zero, one or two xid is proposed in here (now I believe this can be done better - say with narrowing `formatID` domain for 1-2 bits which would be employed for recovery purpose.).

            This sane idea
            > We can binlog and send to the slave the XA PREPARE, but don't apply the events on the slave.
            seemed feasible but was not elected for apparent extra latency (proportional to the XAP size) and not least for the very recovery reason. Slave sure can recover it, provided XA-prepare is held recoverably. I hope you'd agree the trouble to implement of what seems to be a transactional write by the slave IO thread (that acks in the semisync to master who eventually okays to the client on XAP's completion), that trouble is not smaller than one of 21777.

            Elkin Andrei Elkin added a comment - - edited knielsen , well bnestere , whose analysis of course was cool, was not aware of MDEV-21777 at reporting. In my comment I should've referred to it (not just to the related MDEV-21469 ) and close this one its duplicate. The plan has been to process GTID-insert as > a separate transaction to be two-phase-committed with the replicated one. That is XA_prepare_log_event::do_apply_event would execute a 2pc-like sequence of gtid_insert.prepare(xid), XA.prepare(xid), insert.commit(xid) . How to recover having from Innodb zero, one or two xid is proposed in here (now I believe this can be done better - say with narrowing `formatID` domain for 1-2 bits which would be employed for recovery purpose.). This sane idea > We can binlog and send to the slave the XA PREPARE, but don't apply the events on the slave. seemed feasible but was not elected for apparent extra latency (proportional to the XAP size) and not least for the very recovery reason. Slave sure can recover it, provided XA-prepare is held recoverably . I hope you'd agree the trouble to implement of what seems to be a transactional write by the slave IO thread (that acks in the semisync to master who eventually okays to the client on XAP's completion), that trouble is not smaller than one of 21777.
            Elkin Andrei Elkin added a comment -

            The earlier report covering this matter is in MDEV-21777.

            Elkin Andrei Elkin added a comment - The earlier report covering this matter is in MDEV-21777 .

            It should be trivial to ensure that XA prepare is replicated recoverably, by using the existing binlog crash recovery mechanism.

            Require the slave to enable --log-bin and --log-slave-updates. When XA PREPARE is replicated on the slave, it is binlogged together with mysql.gtid_slave_pos update in the normal way, but the xid_could (ie. unlog()) is postponed until XA COMMIT is received. This way, the BINLOG CHECKPOINT event will be postponed, and the binlog will be scanned during crash recovery, at which time the XA PREPAREd transaction can be recoved.

            Maybe this can even be used to optionally omit the query/row events from the XA COMMIT to reduce binlog size, since these can be read from the binlog at XA COMMIT time.

            knielsen Kristian Nielsen added a comment - It should be trivial to ensure that XA prepare is replicated recoverably, by using the existing binlog crash recovery mechanism. Require the slave to enable --log-bin and --log-slave-updates. When XA PREPARE is replicated on the slave, it is binlogged together with mysql.gtid_slave_pos update in the normal way, but the xid_could (ie. unlog()) is postponed until XA COMMIT is received. This way, the BINLOG CHECKPOINT event will be postponed, and the binlog will be scanned during crash recovery, at which time the XA PREPAREd transaction can be recoved. Maybe this can even be used to optionally omit the query/row events from the XA COMMIT to reduce binlog size, since these can be read from the binlog at XA COMMIT time.
            Elkin Andrei Elkin added a comment -

            knielsen, I agree this would be a viable solution, perhaps a preferable one to cover cases where adding hints to slave execution context (like suggested in MDEV-32020) may not help.

            Elkin Andrei Elkin added a comment - knielsen , I agree this would be a viable solution, perhaps a preferable one to cover cases where adding hints to slave execution context (like suggested in MDEV-32020 ) may not help.

            People

              Elkin Andrei Elkin
              bnestere Brandon Nesterenko
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.