Details
-
Bug
-
Status: Open (View Workflow)
-
Critical
-
Resolution: Unresolved
-
10.5.2
-
None
Description
XA changes done in 10.5 introduces a regression that breaks replication.
The problem is that the slave now applies XA transactions while replicating
an XA_prepare_log_event binlogged by "XA PREPARE" on the master. This is
wrong, the transaction must not be applied on the slave until "XA COMMIT",
as it is done correctly in 10.4.
Applying the XA PREPARE on the slave leaves dangling InnoDB row locks that
can conflict with the replication of later transactions and cause
replication to break. The below test case (also attached) demonstrates one
simple instance of this.
Another problem is that splitting a transaction in this way in the binlog
means there is no longer a unique binlog position corresponding to the
database state. This is demonstrated by the attached testcase
rpl_xa_provision.test .
This test case takes a mysqldump while an XA PREPARED transaction is active
on the master, and uses it to provision a new slave. The new slave's GTID
position cannot be set correctly. Setting it after the XA PREPARED
transaction means the XA COMMIT will fail. But setting it before the XA
PREPARE would also not be correct, as it would duplicate transactions
binlogged after the XA PREPARE. Thus, in 10.5, the provisioned slave breaks
its replication.
The fix is to revert the change so that XA transactions are applied on the
slave only as part of the XA COMMIT event. When the XA PREPARE event is
received by the slave, it must not be applied. Instead it can be saved
somewhere (there are several possible designs). In case of a master crash
and the slave is promoted as the new master, those saved XA PREPAREd events
can then be used to recover the XA transaction into the prepared state for
the application to XA COMMIT or XA ROLLBACK.
High-level design description for this fix:
1. At XA PREPARE, the master will binlog a special Xa_prepared_trx_log_event
event containing the contents of the binlog trx cache. This is binlogged
without a GTID. The existing binlog_checkpoint mechanism is used to preserve
the binlog file while this XA transaction is in the prepared state.
2. The server keeps a record of pending XA PREPARE and their position in the
binlog, for later reference. This is reconstructed from scanning the binlog
at server startup/crash recovery. Optionally, this can be optimized to save
the information (like the GTID state) at clean shutdown to avoid the scan in
non-crash-recovery case.
3. At XA COMMIT, the corresponding binlog trx cache data is read from the
Xa_prepared_trx_log_event entry in the binlog, and a full, normal commit
transaction is binlogged containing the event data.
4. Optionally, we can optimize XA COMMIT to only binlog a placeholder with
reference back to the Xa_prepared_trx_log_event entry, and use the binlog
checkpoint mechanism to preserve the binlog file containing the entry for as
long as needed.
5. By default, the slave will ignore the Xa_prepared_trx_log_event (the dump
thread can simply skip sending these events to the slave, or the slave can
just ignore them), and the XA COMMIT will be replicated as a normal commit.
6. The user can enable some --replicate-xa option, which will make the slave
process the Xa_prepared_trx_log_event. This will require --log-slave-updates
enabled on the slave. When the Xa_prepared_trx_log_event is processed on the
slave, it is simply binlogged, no events are applied. When the commit events
are received, they are applied as a normal transaction, and the pending
Xa_prepared_trx_log_event in the slave's binlog is released.
7. If the optimization in (4) is implemented, the placeholder commit event
on the slave will read the even data to be applied from the relay log or
from the slave's binlog if the data has been processed and the corresponding
relay log purged. The dump thread on the master will check each placeholder
commit event if the slave has already been sent the corresponding
Xa_prepared_trx_log_event, and if not will read out the full commit event
and send to the slave for normal commit processing.
8. When the slave is promoted to a new master, any pending XA PREPAREd
events that have been processed on the slave can be explicitly instantiated
by the user into the XA prepared, committed, or rolled back state. The
corresponding XIDs are listed in XA RECOVER on the slave, and can be
specified in SQL statements XA PREARE <xid>, XA COMMIT <xid>, XA ROLLBACK
<xid>. These statements will then read out the pending events and apply them
as normal. (The exact syntax for this is up for discussion, maybe a separate
option keyword or different statement should be used for slave promotion).
9. If the locking happens to be different on the slave than on the master
when promoting pending XA prepared, these might get locking conflicts.
However, since these can be run by the user in any order and in parallel,
promotion can still proceed and once the external transaction coordinator
issues commit or rollback decisions for one transaction, the blocked
promotion can then continue. This way, statement-based replication and
non-primary-key row-based replication can still for for XA promotion on the
slave.
A Proof-of-concept implementation of this design has been pushed to branch
knielsen_mdev32020. This patch is not yet ready for testing, but it
demonstrates the feasibility of all parts of the design. Once the high-level
design has been finalized wrt. precise syntax to use etc., and a decision
has been made on which version to implement this in, the patch can be easily
completed according to embedded "ToDo" comments.
--source include/have_innodb.inc
|
--source include/have_binlog_format_row.inc
|
--source include/master-slave.inc
|
|
--connection master
|
|
CREATE TABLE t1 (a int, b int, c int,
|
INDEX i1(a),
|
INDEX i2(b))
|
ENGINE=InnoDB;
|
|
INSERT INTO t1 VALUES
|
(1,1,0), (1,2,0),
|
(2,1,0), (2,2,0);
|
--sync_slave_with_master
|
|
--source include/stop_slave.inc
|
SET @old_timeout= @@GLOBAL.innodb_lock_wait_timeout;
|
SET @old_retries= @@GLOBAL.slave_transaction_retries;
|
SET GLOBAL innodb_lock_wait_timeout= 2;
|
SET GLOBAL slave_transaction_retries= 3;
|
--source include/start_slave.inc
|
|
--connection master
|
XA START "t1";
|
UPDATE t1 FORCE INDEX (i2) SET c=c+1 WHERE a=1 AND b=1;
|
XA END "t1";
|
XA PREPARE "t1";
|
|
--connection master1
|
XA START "t2";
|
UPDATE t1 FORCE INDEX (i2) SET c=c+1 WHERE a=1 AND b=2;
|
XA END "t2";
|
XA PREPARE "t2";
|
|
--connection master
|
XA COMMIT "t1";
|
|
--connection master1
|
XA COMMIT "t2";
|
|
--connection master
|
SELECT * FROM t1 ORDER BY a,b,c;
|
|
--sync_slave_with_master
|
SELECT * FROM t1 ORDER BY a,b,c;
|
|
# Cleanup
|
--connection master
|
DROP TABLE t1;
|
|
--connection slave
|
SET GLOBAL innodb_lock_wait_timeout= @old_timeout;
|
SET GLOBAL slave_transaction_retries= @old_retries;
|
|
--source include/rpl_end.inc
|
Attachments
Issue Links
- relates to
-
MDEV-742 LP:803649 - Xa recovery failed on client disconnection
- Closed
-
MDEV-32830 refactor XA binlogging for better integration with BGC/replication/recovery
- In Progress
-
MDEV-34466 XA prepare don't release unmodified records in non-blocking mode
- Closed
-
MDEV-33921 Replication fails when XA transactions are used where the slave has replicate_do_db set and the client has touched a different database when running DML such as inserts.
- Closed
-
MDEV-34481 optimize away waiting for owned by prepared xa non-unique index record
- In Testing
-
MDEV-34777 Wrong page mode during getting page from buffer pool for lock_rec_unlock_unmodified()
- Closed