[MDEV-29410] abort-and-replay prepared XA transactions on the slave Created: 2022-08-29 Updated: 2023-10-31 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | Replication, XA |
| Fix Version/s: | None |
| Type: | Task | Priority: | Minor |
| Reporter: | Sergei Golubchik | Assignee: | Andrei Elkin |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Description |
|
Parallel applier guarantees that transactions are committed in a fixed predefined order (same as on the master). If trx1 must be committed before trx2, but the parallel applier executes them concurrently and trx2 happen to block trx1, then the applier aborts trx2, allows trx1 to finish, and then re-executes trx2. This does not work if trx1 is an XA transaction. It becomes persistent on XA PREPARE, so it's XA PREPARE that must happen before trx2, not XA COMMIT. But XA PREPARE doesn't release all locks, so XA PREPARE is not a guarantee that a conflicting trx2 will be able to continue. How can be fixed? |
| Comments |
| Comment by Kristian Nielsen [ 2022-09-11 ] |
|
First, I don't understand why XA PREPARE cannot be rolled back? The whole Is it because the global transaction id is persisted also after rollback, As far as I can see there should be no problem with rolling back and Second, why binlog XA PREPARE at all? XA is a mechanism to ensure consistent Replicating XA PREPARE leaves a prepared transaction active on the slave, This does require to persist the binlog cache to preserve an XA PREPAREd Third, the XA COMMIT (and XA ROLLBACK) event groups must be marked non-trans The bug description mentions that "it can be purged from relay log after Hope this helps,
|
| Comment by Sergei Golubchik [ 2022-09-12 ] |
|
Technically, a transaction after XA PREPARE can be rolled back, and should. This MDEV is about doing exactly that. But currently an "XA transaction" in relay log is a sequence of events from XA START to XA PREPARE. This is what the master writes to binlog, binlog trx_cache in THD is flushed to binlog on XA PREPARE. So, while a transaction in the sql worker thread can be rolled back after XA PREPARE, from the relay log point of view the transaction was done, relay log forgets about it and it cannot be re-applied. This is what this MDEV wants to fix — to preserve XA transactions over XA PREPARE up to XA COMMIT or XA ROLLBACK. Somehow. "why binlog XA PREPARE at all" — this was |
| Comment by Kristian Nielsen [ 2022-09-12 ] |
|
I still think the inuse_relaylog should ensure that the relaylog does not go away too early. When the slave worker executes XA PREPARE, this should participate in binlog group commit (it writes to slave binlog, right), which includes doing a wait_for_prior_commit(). Until wait_for_prior_commit() completes, the transaction can be safely rolled back, the XA PREPARE is not yet persisted, the relay log is not yet deleted. After wait_for_prior_commit(), there are no earlier commits to conflict with, the optimistic parallel replication will not need to rollback and retry the XA PREPARE. I think if this doesn't work, there is a (simple) bug that should be fixed. Or is there something I'm missing? Is there a test case that shows the problem? To the second point, the XA PREPARE is written to the binlog for the sake of 2PC persistency, ok. This doesn't explain why it is replicated to the slaves? There seem to be no benefit for having the transaction in XA PREPAREd state on the slave (and a number of disadvantages). Save the binlog cache in memory after XA PREPARE on the master. Then at XA COMMIT, write it to the binlog for the slave to replicate (with a normal START TRANSACTION/COMMIT). In case of crash, load it into binlog cache again during crash recovery. The XA PREPARE binlog event group can be there, just don't send it to the slave, or send it but ignore it on the slave. It seems needlessly complicated to have replicated transactions in XA PREPAREd state on the slave. For example, what happens if the slave is switched to a different master while an XA PREPAREd transaction is in the middle of being replicated? Hope this helps,
|
| Comment by Kristian Nielsen [ 2022-09-12 ] |
|
Reading the original description again: "trx2 is an XA transaction that managed to do XA PREPARE before trx1 is blocked" This shouldn't be possible. The XA PREPARE is similar to a commit/XID event, it completes the event group. So it must not complete until all prior transactions have committed (ie. it must do wait_for_prior_commit() before completing). If it does not currently do that, then maybe that is the real bug here? If XA PREPARE writes to the binlog (as I would think), there is an optimized code path that does the wait_for_prior_commit() implicitly as part of binlog group commit. A less optimal way is to just run wait_for_prior_commit() at the start of XA PREPARE. I don't see a reason that the normal wait_for_prior_commit mechanism to ensure correct parallel replication order and rollback/retry from relay log files should not also work for XA PREPARE. |
| Comment by Andrei Elkin [ 2022-09-15 ] |
|
knielsen, let me reply to some of your questions (I am not yet regular at kbd). > This shouldn't be possible. The XA PREPARE is ... Indeed: The idea to get rid of useless and harmful for replication GAP locks must be the right way to go, but this ticket is rather cautious about implementation of that objective. Also to the reason of (I'll respond to other questions a bit later) |