[MDEV-29642] Server Crash During XA Prepare Can Break Replication - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Duplicate
Affects Version/s: 10.5, 10.6, 10.7(EOL), 10.8(EOL), 10.9(EOL), 10.10(EOL)
Fix Version/s: N/A
Component/s: Replication
Labels:
None

Description

If a slave crashes (unrelated) while processing an XA PREPARE such that the event fully commits in the binlog and innodb; however, crashes before updating gtid_slave_pos, attempts to restart the slave SQL thread will crash with errors such as out-of-order GTID attempt (if gtid strict mode is enabled) or XID already exists (otherwise). The following comment in Xid_apply_log_event::do_apply_event() documents this behavior.

/*

...

    XA_PREPARE_LOG_EVENT also updates the gtid table *but* the update gets

    committed as separate "autocommit" transaction.

*/

I think logic should be added to detect the possibility of a crash happening before the separate transaction completes, and if so, automatically update gtid slave state on restart, because gtid_binlog_pos will already be updated.

Attachments

Issue Links

causes

MDEV-34526 Mariadb crashed and replication got broken after MariaDB services came up

Closed

relates to

MDEV-742 LP:803649 - Xa recovery failed on client disconnection

Closed

MDEV-31038 Parallel Replication Breaks if XA PREPARE Fails Updating Slave GTID State

Closed

MDEV-21469 Implement crash-safe logging of the user XA

Stalled

MDEV-30165 X-lock on supremum for prepared transaction for RR

Closed

Activity

Ascending order - Click to sort in descending order

Andrei Elkin added a comment - 2022-10-13 10:58

MDEV-21469 relates to this one. The current one rightfully claims gtid_slave_pos update should be a part of the replicated prepared XA.

Andrei Elkin added a comment - 2022-10-13 10:58 MDEV-21469 relates to this one. The current one rightfully claims gtid_slave_pos update should be a part of the replicated prepared XA.

Kristian Nielsen added a comment - 2023-08-24 19:01

I think bugs such as this is a clear indication that the design has not been thought through for the replication of user XA PREPARE.
It's such a central design of GTID that the mysql.gtid_slave_pos table is updated in the same transaction as the transaction it belongs to. The user XA PREPARE needs to respect this part of the design, not break it.

Let's do it differently. We can binlog and send to the slave the XA PREPARE, but don't apply the events on the slave.
Then in the normal case, when XA COMMIT happens on the master, the events are applied on the slave as a normal transaction.
This bug and a lot of other bugs will then simply go away.

And then if the master crashes, implement suitable recovery code for the slave to recover the XA PREPAREd transactions when it is promoted as the master. This code will then be separate and not affect the logic of normal replication.

I think this is a much cleaner design and should have some chance of working, at least.

Kristian Nielsen added a comment - 2023-08-24 19:01 I think bugs such as this is a clear indication that the design has not been thought through for the replication of user XA PREPARE. It's such a central design of GTID that the mysql.gtid_slave_pos table is updated in the same transaction as the transaction it belongs to. The user XA PREPARE needs to respect this part of the design, not break it. Let's do it differently. We can binlog and send to the slave the XA PREPARE, but don't apply the events on the slave. Then in the normal case, when XA COMMIT happens on the master, the events are applied on the slave as a normal transaction. This bug and a lot of other bugs will then simply go away. And then if the master crashes, implement suitable recovery code for the slave to recover the XA PREPAREd transactions when it is promoted as the master. This code will then be separate and not affect the logic of normal replication. I think this is a much cleaner design and should have some chance of working, at least.

Andrei Elkin added a comment - 2023-08-25 08:39 - edited

knielsen, well bnestere, whose analysis of course was cool, was not aware of MDEV-21777 at reporting. In my comment I should've referred to it (not just to the related MDEV-21469) and close this one its duplicate.
The plan has been to process GTID-insert as
> a separate transaction to be two-phase-committed with the replicated one.
That is XA_prepare_log_event::do_apply_event would execute a 2pc-like sequence of gtid_insert.prepare(xid), XA.prepare(xid), insert.commit(xid). How to recover having from Innodb zero, one or two xid is proposed in here (now I believe this can be done better - say with narrowing `formatID` domain for 1-2 bits which would be employed for recovery purpose.).

This sane idea
> We can binlog and send to the slave the XA PREPARE, but don't apply the events on the slave.
seemed feasible but was not elected for apparent extra latency (proportional to the XAP size) and not least for the very recovery reason. Slave sure can recover it, provided XA-prepare is held recoverably. I hope you'd agree the trouble to implement of what seems to be a transactional write by the slave IO thread (that acks in the semisync to master who eventually okays to the client on XAP's completion), that trouble is not smaller than one of 21777.

Andrei Elkin added a comment - 2023-08-25 08:39 - edited knielsen , well bnestere , whose analysis of course was cool, was not aware of MDEV-21777 at reporting. In my comment I should've referred to it (not just to the related MDEV-21469 ) and close this one its duplicate. The plan has been to process GTID-insert as > a separate transaction to be two-phase-committed with the replicated one. That is XA_prepare_log_event::do_apply_event would execute a 2pc-like sequence of gtid_insert.prepare(xid), XA.prepare(xid), insert.commit(xid) . How to recover having from Innodb zero, one or two xid is proposed in here (now I believe this can be done better - say with narrowing `formatID` domain for 1-2 bits which would be employed for recovery purpose.). This sane idea > We can binlog and send to the slave the XA PREPARE, but don't apply the events on the slave. seemed feasible but was not elected for apparent extra latency (proportional to the XAP size) and not least for the very recovery reason. Slave sure can recover it, provided XA-prepare is held recoverably . I hope you'd agree the trouble to implement of what seems to be a transactional write by the slave IO thread (that acks in the semisync to master who eventually okays to the client on XAP's completion), that trouble is not smaller than one of 21777.

Andrei Elkin added a comment - 2023-08-25 08:42

The earlier report covering this matter is in MDEV-21777.

Andrei Elkin added a comment - 2023-08-25 08:42 The earlier report covering this matter is in MDEV-21777 .

Kristian Nielsen added a comment - 2023-09-11 13:09

It should be trivial to ensure that XA prepare is replicated recoverably, by using the existing binlog crash recovery mechanism.

Require the slave to enable --log-bin and --log-slave-updates. When XA PREPARE is replicated on the slave, it is binlogged together with mysql.gtid_slave_pos update in the normal way, but the xid_could (ie. unlog()) is postponed until XA COMMIT is received. This way, the BINLOG CHECKPOINT event will be postponed, and the binlog will be scanned during crash recovery, at which time the XA PREPAREd transaction can be recoved.

Maybe this can even be used to optionally omit the query/row events from the XA COMMIT to reduce binlog size, since these can be read from the binlog at XA COMMIT time.

Kristian Nielsen added a comment - 2023-09-11 13:09 It should be trivial to ensure that XA prepare is replicated recoverably, by using the existing binlog crash recovery mechanism. Require the slave to enable --log-bin and --log-slave-updates. When XA PREPARE is replicated on the slave, it is binlogged together with mysql.gtid_slave_pos update in the normal way, but the xid_could (ie. unlog()) is postponed until XA COMMIT is received. This way, the BINLOG CHECKPOINT event will be postponed, and the binlog will be scanned during crash recovery, at which time the XA PREPAREd transaction can be recoved. Maybe this can even be used to optionally omit the query/row events from the XA COMMIT to reduce binlog size, since these can be read from the binlog at XA COMMIT time.

Andrei Elkin added a comment - 2023-09-17 16:28

knielsen, I agree this would be a viable solution, perhaps a preferable one to cover cases where adding hints to slave execution context (like suggested in MDEV-32020) may not help.

Andrei Elkin added a comment - 2023-09-17 16:28 knielsen , I agree this would be a viable solution, perhaps a preferable one to cover cases where adding hints to slave execution context (like suggested in MDEV-32020 ) may not help.

People

Assignee:: Andrei Elkin

Reporter:: Brandon Nesterenko

Votes:: 1 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 2022-09-26 21:37

Updated:: 2024-07-13 16:13

Resolved:: 2023-08-25 08:42

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server