Details
-
Task
-
Status: In Review (View Workflow)
-
Critical
-
Resolution: Unresolved
-
Q2/2025 Development, Q3/2025 Server Development
Description
MDEV-32830 I. refactor XA binlogging for better integration with
BGC/replication/recovery
The task is being the part I of the series of four that addresses MDEV-31949 in
two main directions which are xa parallel slave performance (remedied already by
MDEV-33668) and xa transaction crash-recovery MDEV-33168.
This one improves upon MDEV-742 design's XA binlogging to
facilitate to the crash-recovery (the actual binlog-based recovery
is coming in the part IV (MDEV-33668).
Legends:
XA-COMPLETE is typed with there's no difference between COMMIT and ROLLBACK
actions as far as binlogging is concerned.
High-level design
The immediate objective is when binlog is ON, handle execution of a XA transaction,
including binlogging, as uniform as possible with the normal BEGIN-COMMIT
transaction so that MDEV-33668 would extend the existing binlog-based recovery
over XA\footnote
.
The requirement implies that XA-PREPARE first is prepared in engines and after
that accumulated replication events are written to the binary log, naturally
without any completion event as it's unknown yet.
When later XA-"COMPLETE" that is XA-COMMIT and XA-PREPARE follows up, the binary
logging of respective Query event takes place first to be concluded with
the engine action.
One can perceive such scheme as if a normal transaction logging is split in the
middle into two parts.
With binlog is enabled both phases' loggings go through binlog-group-commit,
where XA-PREPARE "sub-transaction" merely groups for binary logging so skips the
engine action while XA-"COMPLETE" does both, that is the logging and an ordered
"complete". This binlog-grouping behavior is consistent between completions
from the native and external connections. The completion from the external
connection, which the slave's option too, makes sure that
the binlog group commit calls hton::"complete"_by_xid only when the
transaction participant (Engine) is defined with commit_ordered.
(This part of overall MDEV-31949 assumes such engines' hton::"complete"_by_xid
is logically equivalent to its ordered_commit, which is btw the case for Innodb).
Being a participant of binlog-group-commit designates either XA phase
is recoverable (not implemented here) from active binlogs determined
by binlog-checkpoint.
Additionally a corner case of engine read-only XA transaction is addressed.
Previously it was streamlined with logging an empty XA-PREPARE group of binlog
events concluded by XA-"COMPLETE" query-event.
Now when a preparing XA transaction is found to have only read-only engine
branches or none it is marked for rollback as XA_RDONLY optimization:
- nothing gets logged at the prepare time an XA_RDONLY note is generated and
- it's rolled back at disconnect
For XA-COMPLETE to tell whether the prepare phase was logged or not the XID
state object is extended with a boolean flag which is a part internal interface
for recovery implementation ofMDEV-33668(that is going to raise this flag when a
prepared XA would be proven recoverable).
While this task is not concerned with other than binlog transaction coordinator like
TC_LOG_MMAP its current (if any) level of support is not made worse.
Notable low-level design points
1. ha_prepare() has to prepare engine branches before binlogging, so
its so far indiscriminate hton loop extracts the binlog branch out
(that repeats a pattern probed by 1-phase binlog_commit() by MDEV-21117).
2. conversely binlog_commit, binlog_rollback and binlog_commit_by_xid,
binlog_rollback_by_xid have to execute the binlog (internal 2pc coordinator
role) as the first step.
3. For skipping the engine actions inside binlog-group-commit the preparing
XA enters it (as leader or follower) having its cache_mngr::using_xa
set to FALSE.
There is no implication on binlog-checkpoint based recovery, because
via p.1 XA-PREPARE is already durable so binlog-checkpoint event is generated
to account the safely (not anymore doubtfully) prepared state.
4. Follow the normal transaction pattern, XA-COMPLETE executes the engine part
cache_mngr::using_xa = true and it will not anymore specify
binlog_group_entry::need_unlog which also necessary for successful crash-recovery.
5. Follow the normal transaction pattern, XA-COMPLETE does not execute
hton::"complete"_ordered if the engine/hton does not have it.
This applies to the external completion-by-xid too.
In other words the ordered commit aka the fast part of commit is invoked when
an (engine) participant has both commit_ordered() and
"complete"_by_xid methods.
6. Logging of XA-PREPARE and XA-COMMIT engage (remains doing so)
binlog_commit(), as the footprint of doing that is very little.
To address the read-only XA
7. ha_prepare() has to recognize the engine read-only status of the transaction,
it notes the user about the fact of the ro status and marks the read-only XA to
optimize it away at disconnect
8. for the external connection XA-COMPLETE to find out whether the xa status
is read-only XID_cache_element::xap_binlogged_awaiting_xac is introduced
and manipulated as the following:
raised by XA-PREPARE at flushing to binlog (so it's read-write),
checked by binlog_commit,rollback() to make its mind about the XA-COMPLETE logging.
The added flag is a piece of interface with MDEV-33668 recovery.
Attachments
Issue Links
- blocks
-
MDEV-21777 Implement crash-safe execution the user XA on binlog-less slave
-
- Open
-
-
MDEV-31949 slow parallel replication of user xa
-
- Stalled
-
-
MDEV-32896 Unstable XA + binglog tests, with possible MDEV-32830 caused issues
-
- Closed
-
-
MDEV-33168 XA crash-recovery base on engines prepare first rule
-
- Stalled
-
- causes
-
MDEV-32852 Assertion `is_prepared_xa(thd) || ((thd->lex->sql_command == SQLCOM_XA_ROLLBACK && thd->transaction->xid_state.get_state_code() == XA_IDLE) || thd->lex->xa_opt == XA_ONE_PHASE)' failed in TC_LOG::run_commit_ordered
-
- Closed
-
-
MDEV-32857 Assertion `thd->lex->sql_command == SQLCOM_XA_COMMIT || thd->lex->sql_command == SQLCOM_XA_ROLLBACK' failed in TC_LOG::run_commit_ordered
-
- Closed
-
-
MDEV-36799 Assertion `thd->lex->sql_command == SQLCOM_XA_COMMIT || (thd->lex->sql_command == SQLCOM_XA_ROLLBACK || (thd->lex->sql_command == SQLCOM_PRELOAD_KEYS && thd->get_stmt_da()->get_sql_errno() ))' failed in run_xa_complete_ordered
-
- Closed
-
-
MDEV-36800 Assertion `thd_arg->in_multi_stmt_transaction_mode()' failed in Gtid_log_event::Gtid_log_event
-
- Closed
-
-
MDEV-36802 flaw in external xa-commit with multiple xa-capable engines lead to assert
-
- Closed
-
-
MDEV-36804 SIGSEGV in TC_LOG::run_prepare_ordered on ROLLBACK
-
- Closed
-
-
MDEV-36825 Assertion `thd->lex->sql_command != SQLCOM_XA_ROLLBACK || thd->transaction->xid_state.is_explicit_XA()' failed in ha_rollback_trans on XA ROLLBACK
-
- Closed
-
-
MDEV-36826 read-only on slave XA-prepare thought it failed to assert
-
- Closed
-
-
MDEV-36859 Error 1399 or Error 0 on replica upon XA PREPARE
-
- Open
-
- is blocked by
-
MDEV-32852 Assertion `is_prepared_xa(thd) || ((thd->lex->sql_command == SQLCOM_XA_ROLLBACK && thd->transaction->xid_state.get_state_code() == XA_IDLE) || thd->lex->xa_opt == XA_ONE_PHASE)' failed in TC_LOG::run_commit_ordered
-
- Closed
-
-
MDEV-32857 Assertion `thd->lex->sql_command == SQLCOM_XA_COMMIT || thd->lex->sql_command == SQLCOM_XA_ROLLBACK' failed in TC_LOG::run_commit_ordered
-
- Closed
-
- relates to
-
MDEV-31949 slow parallel replication of user xa
-
- Stalled
-
-
MDEV-32020 XA transaction replicates incorrectly, must be applied at XA COMMIT, not XA PREPARE
-
- Open
-