[MDEV-29642] Server Crash During XA Prepare Can Break Replication Created: 2022-09-26  Updated: 2023-09-17  Resolved: 2023-08-25

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.5, 10.6, 10.7, 10.8, 10.9, 10.10
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Brandon Nesterenko Assignee: Andrei Elkin
Resolution: Duplicate Votes: 1
Labels: None

Issue Links:
Relates
relates to MDEV-742 LP:803649 - Xa recovery failed on cli... Closed
relates to MDEV-31038 Parallel Replication Breaks if XA PRE... Closed
relates to MDEV-21469 Implement crash-safe logging of the u... Stalled
relates to MDEV-30165 X-lock on supremum for prepared trans... Closed

 Description   

If a slave crashes (unrelated) while processing an XA PREPARE such that the event fully commits in the binlog and innodb; however, crashes before updating gtid_slave_pos, attempts to restart the slave SQL thread will crash with errors such as out-of-order GTID attempt (if gtid strict mode is enabled) or XID already exists (otherwise). The following comment in Xid_apply_log_event::do_apply_event() documents this behavior.

  /*
    ...
    
    XA_PREPARE_LOG_EVENT also updates the gtid table *but* the update gets
    committed as separate "autocommit" transaction.
  */

I think logic should be added to detect the possibility of a crash happening before the separate transaction completes, and if so, automatically update gtid slave state on restart, because gtid_binlog_pos will already be updated.



 Comments   
Comment by Andrei Elkin [ 2022-10-13 ]

MDEV-21469 relates to this one. The current one rightfully claims gtid_slave_pos update should be a part of the replicated prepared XA.

Comment by Kristian Nielsen [ 2023-08-24 ]

I think bugs such as this is a clear indication that the design has not been thought through for the replication of user XA PREPARE.
It's such a central design of GTID that the mysql.gtid_slave_pos table is updated in the same transaction as the transaction it belongs to. The user XA PREPARE needs to respect this part of the design, not break it.

Let's do it differently. We can binlog and send to the slave the XA PREPARE, but don't apply the events on the slave.
Then in the normal case, when XA COMMIT happens on the master, the events are applied on the slave as a normal transaction.
This bug and a lot of other bugs will then simply go away.

And then if the master crashes, implement suitable recovery code for the slave to recover the XA PREPAREd transactions when it is promoted as the master. This code will then be separate and not affect the logic of normal replication.

I think this is a much cleaner design and should have some chance of working, at least.

Comment by Andrei Elkin [ 2023-08-25 ]

knielsen, well bnestere, whose analysis of course was cool, was not aware of MDEV-21777 at reporting. In my comment I should've referred to it (not just to the related MDEV-21469) and close this one its duplicate.
The plan has been to process GTID-insert as
> a separate transaction to be two-phase-committed with the replicated one.
That is XA_prepare_log_event::do_apply_event would execute a 2pc-like sequence of gtid_insert.prepare(xid), XA.prepare(xid), insert.commit(xid). How to recover having from Innodb zero, one or two xid is proposed in here (now I believe this can be done better - say with narrowing `formatID` domain for 1-2 bits which would be employed for recovery purpose.).

This sane idea
> We can binlog and send to the slave the XA PREPARE, but don't apply the events on the slave.
seemed feasible but was not elected for apparent extra latency (proportional to the XAP size) and not least for the very recovery reason. Slave sure can recover it, provided XA-prepare is held recoverably. I hope you'd agree the trouble to implement of what seems to be a transactional write by the slave IO thread (that acks in the semisync to master who eventually okays to the client on XAP's completion), that trouble is not smaller than one of 21777.

Comment by Andrei Elkin [ 2023-08-25 ]

The earlier report covering this matter is in MDEV-21777.

Comment by Kristian Nielsen [ 2023-09-11 ]

It should be trivial to ensure that XA prepare is replicated recoverably, by using the existing binlog crash recovery mechanism.

Require the slave to enable --log-bin and --log-slave-updates. When XA PREPARE is replicated on the slave, it is binlogged together with mysql.gtid_slave_pos update in the normal way, but the xid_could (ie. unlog()) is postponed until XA COMMIT is received. This way, the BINLOG CHECKPOINT event will be postponed, and the binlog will be scanned during crash recovery, at which time the XA PREPAREd transaction can be recoved.

Maybe this can even be used to optionally omit the query/row events from the XA COMMIT to reduce binlog size, since these can be read from the binlog at XA COMMIT time.

Comment by Andrei Elkin [ 2023-09-17 ]

knielsen, I agree this would be a viable solution, perhaps a preferable one to cover cases where adding hints to slave execution context (like suggested in MDEV-32020) may not help.

Generated at Thu Feb 08 10:10:10 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.