XA changes done in 10.5 introduces a regression that breaks replication.
The problem is that the slave now applies XA transactions while replicating
an XA_prepare_log_event binlogged by "XA PREPARE" on the master. This is
wrong, the transaction must not be applied on the slave until "XA COMMIT",
as it is done correctly in 10.4.
Applying the XA PREPARE on the slave leaves dangling InnoDB row locks that
can conflict with the replication of later transactions and cause
replication to break. The below test case (also attached) demonstrates one
simple instance of this.
Another problem is that splitting a transaction in this way in the binlog
means there is no longer a unique binlog position corresponding to the
database state. This is demonstrated by the attached testcase
This test case takes a mysqldump while an XA PREPARED transaction is active
on the master, and uses it to provision a new slave. The new slave's GTID
position cannot be set correctly. Setting it after the XA PREPARED
transaction means the XA COMMIT will fail. But setting it before the XA
PREPARE would also not be correct, as it would duplicate transactions
binlogged after the XA PREPARE. Thus, in 10.5, the provisioned slave breaks
The fix is to revert the change so that XA transactions are applied on the
slave only as part of the XA COMMIT event. When the XA PREPARE event is
received by the slave, it must not be applied. Instead it can be saved
somewhere (there are several possible designs). In case of a master crash
and the slave is promoted as the new master, those saved XA PREPAREd events
can then be used to recover the XA transaction into the prepared state for
the application to XA COMMIT or XA ROLLBACK.