[MDEV-31921] Replication Breaks after Recovering a Prepared-but-not-binlogged XA ONE PHASE Transaction Created: 2023-08-14  Updated: 2023-11-28

Status: Open
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.5, 10.6, 10.7, 10.8, 10.9, 10.10, 10.11, 11.0, 11.1
Fix Version/s: 10.5, 10.6, 10.11

Type: Bug Priority: Major
Reporter: Brandon Nesterenko Assignee: Brandon Nesterenko
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-21469 Implement crash-safe logging of the u... Stalled

 Description   

If the server crashes when executing a ONE PHASE XA transaction after it is prepared in the storage engine, but before it is binlogged, upon recovery, the binlogged transaction is incomplete. That is, only the XA COMMIT part of the transaction is logged, and the body of the XA transaction is lost. Consider the following MTR test which shows that an incomplete binary logging of a one phase XA transaction (only XA COMMIT '1' exists in the binary log when executing SHOW BINARY LOGS), and that replication breaks due to an unknown XID on query.

--source include/master-slave.inc
--source include/have_innodb.inc
#--source include/have_log_bin.inc
--source include/have_binlog_format_statement.inc
 
--echo #
--echo # Initialize test data
--connection master
create table t1 (a int) engine=innodb;
--source include/save_master_gtid.inc
 
--connection slave
--source include/sync_with_master_gtid.inc
 
--connection master
XA START '1';
insert into t1 values (1);
XA END '1';
 
--enable_reconnect
 
set session debug_dbug="+d,crash_commit_after_prepare";
# Write file to make mysql-test-run.pl expect crash
--exec echo "restart" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
 
--error 2006,2013
XA COMMIT '1' ONE PHASE;
 
# Poll the server waiting for it to be back online again.
--source include/wait_until_connected_again.inc
 
XA RECOVER;
 
XA COMMIT '1';
show binlog events in 'master-bin.000002';
--source include/save_master_gtid.inc
 
--connection slave
--let $slave_sql_errno= 1397
--source include/wait_for_slave_sql_error.inc
 
--query_vertical SHOW SLAVE STATUS
 
die Quit early for error
 
 
--echo #
--echo # Cleanup
--connection master
DROP TABLE t1;
--source include/save_master_gtid.inc
 
--connection slave
--source include/sync_with_master_gtid.inc
 
--source include/rpl_end.inc
 
--echo # End of test

Output of show binlog events:

show binlog events in 'master-bin.000002';
Log_name        Pos     Event_type      Server_id       End_log_pos     Info
master-bin.000002       4       Format_desc     1       256     Server ver: 11.2.0-MariaDB-debug-log, Binlog ver: 4
master-bin.000002       256     Gtid_list       1       299     [0-1-1]
master-bin.000002       299     Binlog_checkpoint       1       343     master-bin.000002
master-bin.000002       343     Gtid    1       386     GTID 0-1-2
master-bin.000002       386     Query   1       474     XA COMMIT X'31',X'',1

And SHOW SLAVE STATUS error output:

Last_Errno      1397
Last_Error      Error 'XAER_NOTA: Unknown XID' on query. Default database: 'test'. Query: 'XA COMMIT X'31',X'',1'


Generated at Thu Feb 08 10:27:28 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.