[MDEV-21469] Implement crash-safe logging of the user XA Created: 2020-01-13  Updated: 2024-01-11

Status: Stalled
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.5
Fix Version/s: 10.6

Type: Bug Priority: Critical
Reporter: Andrei Elkin Assignee: Brandon Nesterenko
Resolution: Unresolved Votes: 1
Labels: XA

Issue Links:
Issue split
split to MDEV-21777 Implement crash-safe execution the us... Open
Relates
relates to MDEV-18959 Engine transaction recovery through p... Stalled
relates to MDEV-26652 xa transactions binlogged in wrong order Open
relates to MDEV-29642 Server Crash During XA Prepare Can Br... Closed
relates to MDEV-31921 Replication Breaks after Recovering a... Open
relates to MDEV-31949 slow parallel replication of user xa In Review
relates to MDEV-742 LP:803649 - Xa recovery failed on cli... Closed
relates to MDEV-31998 XA PREPARE should do binlog_prepare last Open

 Description   

This task is ensued by MDEV-742 and implements the upstream's Bug#76233

In fact there's no issue in the ordering of logging of XA PREPARE into the binary log and transaction preparing in Engine. The chosen by upstream method of the logging goes first is not really incorrect but just needs
some refinement of the server recovery.

So in case the server crashes having XA-prepared "transaction" binlogged but not yet
prepared in Engine, its next recovery must be augmented with identifying the XA-prepared's events in the binlog and resubmitting the transaction for preparing in Engine.

XA-COMMIT or XA-ROLLBACK are logged compatibly with XA-PREPARE and therefore
a similar issue arises when the server crashes after binary logging but before actual transaction completion in Engine. The same technique must be used to identify and resubmit to commit-or-rollback.



 Comments   
Comment by Sujatha Sivakumar (Inactive) [ 2020-02-25 ]

Hello Andrei,

Please review MDEV-21469 changes.

Branch name: bb-10.5-MDEV_21469

Thank you.

Comment by Sujatha Sivakumar (Inactive) [ 2020-03-03 ]

Design of XA recovery:-
=====================
Let us assume "binary log" has following sequence of events.

PREPARE XA1
COMMIT XA1
PREPARE XA2 --> Crash during XA2 prepare.

Engine has the following view of above XA transactions.

XA1 - Complete in Engine
XA2 - Prepared in Engine

During server restart binlog recovery is initiated.
At present during recovery only the 'XID_t' list is sent.
Along with this list another list for 'XA' transactions
will also be sent.

struct xa_recovery_member
{
   XID xid;
   enum xa_binlog_state state;
   bool in_engine_prepare;
};
 
enum xa_binlog_state {XA_PREPARE=0, XA_COMPLETE};

'XA' recovery list is prepared as shown below.

Parse 1 of Binary log:
=====================

PREPARE XA1 name state in_engine_prepare

     
     
     
XA1 p  
_____ ______ _____

PREPARE XA2 name state in_engine_preare

     
     
XA2 P  
XA1 p  
_____ ______ _____

COMMIT XA1 name state in_engine_prepare

     
     
XA2 P  
XA1 C  
_____ ______ _____

The above list is passed on to engine during 'ha_recover(xid_list, XA_list)'.

By checking with engine update the 'in_engine_prepare' filed in list.

Since engine has only 'XA2' in prepared state it is marked with 'Y' flag.

name state in_engine_prepare

     
     
XA2 P Y
XA1 C N
_____ ______ _____

Now the above list is returned back to binlog recovery code.

Parse 2 of binary log to replay the events:
===========================================
Start reading events from the binary log.

PREPARE XA1 -

Compare XA1 state with 'state' in list.

if (PREPARE == C ) No.
in_engine_prepare= N.

This transaction is complete. Donot apply.

COMMIT XA1 -

if (COMMIT == C) yes.
in_engine_prepare=N.
This transaction is complete. Donot apply.

PREPARE XA2

if (PREPARE == P) yes
in_engine_prpare= N

Then replay the event from binary log.

Recovery of XA_COMMIT/XA_ROLLBACK:
==================================

PREPARE XA1
PREPARE XA2
COMMIT XA1
COMMIT XA2 --> Crash during XA2 commit.

Let us assume server has crashed at this stage.

Engine has the following view of above XA transactions.

XA1 - Complete in Engine
XA2 - Prepared in Engine

Parse 1 of Binary log:
=====================

PREPARE XA1 name state in_engine_prepare

     
     
     
XA1 p  
_____ ______ _____

PREPARE XA2 name state in_engine_preare

     
     
XA2 P  
XA1 p  
_____ ______ _____

COMMIT XA1 name state in_engine_prepare

     
     
XA2 P  
XA1 C  
_____ ______ _____

COMMIT XA2 name state in_engine_prepare

     
     
XA2 C  
XA1 C  
_____ ______ _____

The above list is passed on to engine during 'ha_recover(xid_list, XA_list)'.

Updated list after checking with engine is given below.

name state in_engine_prepare

     
     
XA2 C Y
XA1 C N
_____ ______ _____

Now the above list is returned back to binlog recovery code.

Parse 2 of binary log to replay the events:
===========================================
Start reading events from the binary log.

PREPARE XA1 -

Compare XA1 state with 'state' in list.

if (PREPARE == C ) No.
in_engine_prepare= N.

This transaction is complete. Donot apply.

COMMIT XA1 -

if (COMMIT == C) yes.
in_engine_prepare=N.
This transaction is complete. Donot apply.

PREPARE XA2 -

Compare XA2 state with 'state' in list

if (PREPARE == P) yes
in_engine_prepare=Y
The transaction is complete. Donot apply.

COMMIT XA2 -

if (COMMIT == C)
in_engine_prepare=Y

Then 'COMMIT XA2' is present only in binlog. Not in engine.
Replay the 'XA COMMIT XA2' event.

Similar logic applies for ROLLBACK as well.

Comment by Andrei Elkin [ 2020-03-03 ]

Sujatha, thanks for an awesome piece of work! We may polish the high-level description later though (for which I don't have time right now).

Comment by Andrei Elkin [ 2020-06-03 ]

Howdy, Serg.

Could you please check two commits that top bb-10.5-MDEV_21469 branch
implementing the task agenda. It was partly reviewed by Svoj, who left few notes
on GITHUB, which are mostly addressed.

To MDEV-21469 Pre-commit: make sure binlog hton is listed in the head of the user XA participants list I already pointed (in the last weekly call) that changing trans_register_ha() is the least intrusion method, as well as with the least extra operations for non-user-xa executions (the cost is of that of one if).

Comment by Sergei Golubchik [ 2020-12-21 ]

https://github.com/MariaDB/server/commit/dc92235f21e
https://github.com/MariaDB/server/commit/f1db957c5da

Comment by Marko Mäkelä [ 2022-07-26 ]

MySQL 8.0.30 includes WL#11300 Crash-safe XA + binary log. However, MySQL Bug #76233 XA prepare is logged ahead of engine prepare has not been closed yet.

Comment by Brandon Nesterenko [ 2023-08-10 ]

Changed the relationship from blocks to relates to. In reality, MDEV-21117 (already closed) provided the base for this patch, and likely, the dependency with MDEV-18959 is reversed, i.e. MDEV-21469 would depend on the framework laid out by this patch.

Comment by Brandon Nesterenko [ 2023-12-13 ]

Changed the fix version to 10.6 because the patch for MDEV-21117 was committed into 10.6, and refactored a lot of the binlog recovery logic which is well suited to this work.

Comment by Michael Widenius [ 2023-12-18 ]

Note that we have to ensure that this MDEV and MDEV-31949 are both solved and tested together

Comment by Kristian Nielsen [ 2023-12-21 ]

Repeating my comment from MDEV-31949:

Let me just state once and for all:

The idea to binlog first and prepare in engine later for XA prepare will not be approved. Don't spend more effort in this direction.

  • The engines, InnoDB in particular, have very mature and carefully designed and optimized write-ahead logging. This should be the primary source of recovery, not the binlog which is not very well designed or suitable as a high-performance and scalable write-ahead log. And the binlog is optional, the engine needs to implement recovery from its own log anyway. This should not be duplicated in the binlog.
  • For normal transactions, we call innobase_xa_prepare() before binlogging. We should not do the opposite semantics for user XA PREPARE.
  • Recovering the transaction from the binlog is not possible in all cases. Statement-based binlogging is a supported user configuration and does not admit this, even row-based binlogging doesn't support it in all cases.

For these reasons, the engine prepare must come before the binlog write for XA PREPARE. Again, patches that does the opposite will be rejected in review.

Generated at Thu Feb 08 09:07:23 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.