[MDEV-29819] Shutdown unexpectedly executes XA ROLLBACK Created: 2022-10-18  Updated: 2023-11-28

Status: Open
Project: MariaDB Server
Component/s: Replication, Server
Affects Version/s: 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 10.10, 10.11
Fix Version/s: 10.5, 10.6, 10.11

Type: Bug Priority: Major
Reporter: Andrei Elkin Assignee: Andrei Elkin
Resolution: Unresolved Votes: 0
Labels: XA

Issue Links:
Relates
relates to MDEV-742 LP:803649 - Xa recovery failed on cli... Closed
relates to MDEV-25616 Binlog event for XA COMMIT is generat... Closed

 Description   

When an user prepared XA does not update data it expectedly does not survive
the server restart.
However in case of crash it gets recovered. Here is a test:

--source include/have_binlog_format_mixed.inc
--source include/have_innodb.inc
 
# MDEV-TODO unneccessary recover of read-only transaction
CREATE TABLE t1 (a INT PRIMARY KEY) ENGINE=InnoDB;
INSERT INTO t1 VALUES (1);
 
call mtr.add_suppression("Found 1 prepared XA transactions");
 
XA START '1';
  --error ER_DUP_ENTRY
  INSERT INTO t1 VALUES (2),(1);
  --echo Despite '1' has done no work ...
  SELECT * FROM t1 WHERE a = 2;
XA END '1';
XA PREPARE '1';
XA RECOVER;
 
--echo server crash
--let $shutdown_timeout=0
--source include/restart_mysqld.inc
 
--connection default
--enable_reconnect
--echo ... it is recovered:
XA RECOVER;
 
# Cleanup
XA ROLLBACK '1';
DROP TABLE t1;

The recoverable behavior is more than unnecessary as it's harmful for replication because of
inconsistent prepared status. There are reasons to claim that the "read-only" prepared user XA should not be recovered.



 Comments   
Comment by Andrei Elkin [ 2022-10-18 ]

MDEV-25616 may actually 'depend on' rather than simply 'relates to'. I am considering a work-around within its fixes to circumvent the effect of the current one.

Comment by Marko Mäkelä [ 2022-10-19 ]

Where is this exceptional behavior of XA PREPARE documented? I see that it is a non-standard extension. The MySQL 5.7 documentation on transaction states includes the following:

Use XA START to start an XA transaction and put it in the ACTIVE state.

For an ACTIVE XA transaction, issue the SQL statements that make up the transaction, and then issue an XA END statement. XA END puts the transaction in the IDLE state.

For an IDLE XA transaction, you can issue either an XA PREPARE statement or an XA COMMIT ... ONE PHASE statement:

XA PREPARE puts the transaction in the PREPARED state. An XA RECOVER statement at this point includes the transaction's xid value in its output, because XA RECOVER lists all XA transactions that are in the PREPARED state.

I see no mention of any requirement or assumption that there be modifications to the database state between XA START and XA END. Neither do I see that in the Postgres documentation of PREPARE TRANSACTION.

The Postgres equivalent ought to be something like the following:

create table t(a int primary key);
start transaction;
insert into t values(1);
commit work;
start transaction;
-- ERROR:  duplicate key value violates unique constraint "t_pkey"
-- DETAIL:  Key (a)=(1) already exists.
insert into t values(1),(2);
prepare transaction '1';
-- should fail with some error "transaction '2' not found"
commit prepared '2';
-- should fail because the table is locked by the prepared transaction '1'
drop table t;
commit prepared '1';
drop table t;

I tried variations of this on https://dbfiddle.uk but any attempt to use distributed transactions would fail due to the configuration parameter max_prepared_transactions=0. I did not try to other RDBMS that are available via that service.

Comment by Andrei Elkin [ 2022-10-19 ]

marko, instead of referring to the docs let me compare the description test with

xa start '1';
savepoint sp;
insert into ti values (2); 
rollback to sp;
...

It does not make recoverable prepared XA while the two cases functionally fully equivalent:
in both cases the only query rolls its effect back.

So it's another (to the graceful shutdown's non-recovery) incompatibility.

Comment by Andrei Elkin [ 2022-10-19 ]

marko, also as you're having PG, could you please run my test on it fully, that is including the crash-and-restart part? Would PG recover or would not?

Comment by Andrei Elkin [ 2022-10-19 ]

I don't any docs references which either promotes or dismiss about the current behaviour.
My arguments on that the current is a bug case are based on incompatibility observations.

Comment by Marko Mäkelä [ 2022-10-19 ]

Elkin, please explain using the terminology that I quoted: Which transaction state changes do you expect to occur between the XA statements in your original SQL example. If there is any documentation that you think is more applicable to this scenario, please quote it as well.

Comment by Andrei Elkin [ 2022-10-19 ]

marko, the docs paragraph that you cited does not aim to specify the recovery behavior of a prepared read-only user XA. Such behaviour is not clarified anywhere in the Oracle docs either. They do specify the recovery behaviour in the read-write case.

Here what I would put into the docs about the read-only prepared user XA.
It can't be logged into the binary log for a simple reason that its xid may not be found
by any or some engines. E.g in the most simple case of no engine involved there is no way to find its xid, so it could not be completed. That's unacceptable for many reasons including replication ones.

It makes sense to repeat a variant of the read-only that gave you elsewhere (with few more actually):

xa start '1'; 
update  ti set a=3 where a=666;  # no such record
xa end '1'; xa prepare '1';

If it's logged into binary log and afterward this master server crashes the post-recovery
xa recover will see nothing (sic!).
But its xid would be dangling elsewhere on slaves obviously to prevent XA replication when
the xid were to be reused.
So the xid should not be sent over to slaves! And that's the current policy.

Please compare the latter use case and a rollback to sp (to undo a successful statement) one against the description one, find for yourself that they all are about the read-only transaction class, and apply my clarification of why the dangling xid is bad for replication in particular.

Let me rush now into how to fix, unless there are clear ideas already what to examine in the trx state at recovery, the rollback to sp may suggest how to clean the errored-out statement's trx properly that
the following crash-recovery does not detect it.

Comment by Marko Mäkelä [ 2022-10-20 ]

Elkin, I believe that many of the documented restrictions of MySQL 5.7 with regard to XA transactions related to crash recovery ("unexpected halt") could be fixed by following up on MDEV-742.

I would prefer Albert Einstein’s principle: Make everything as simple as it can be, but not simpler. I did not find any documentation that would specify any exceptions for "read-only" or "empty" XA transactions, and I do not think that there should be any special treatment of them.

As far as I can understand, the actual problem that your examples expose is outside InnoDB. I just tried the following:

--source include/have_innodb.inc
CREATE TABLE t1 (a INT PRIMARY KEY) ENGINE=InnoDB;
INSERT INTO t1 VALUES (1);
XA START '1';
--error ER_DUP_ENTRY
INSERT INTO t1 VALUES (2),(1);
XA END '1';
XA PREPARE '1';
XA RECOVER;
--source include/restart_mysqld.inc
DROP TABLE t1;
XA RECOVER;
XA ROLLBACK '1';

Edit: There is no lock conflict, because the XA PREPARE transaction should have been persisted with 0 undo log records and 0 locks, as my later kill-and-restart test shows. What is important here is that the XA PREPARE transaction would be wrongly rolled back as part of server shutdown, as shown below.

There was no XA COMMIT or XA ROLLBACK statement ever since the XA PREPARE statement, so the transaction must remain in that state. Yet, the XA ROLLBACK would fail to find the transaction.

To my great surprise, the XA PREPARE transaction was unexpectedly rolled back on shutdown:

./mtr --rr name_of_test
rr replay var/log/mysqld.1.rr/mariadbd-0

break innobase_xa_prepare
continue
next
watch -l trx.state
continue
continue
backtrace

10.6 db330c87d6d253a1e2e93a747327040d9205c5fe

Thread 2 hit Breakpoint 1, innobase_xa_prepare (hton=0x55d52d7c5ae8, thd=0x7efcac0020d8, prepare_trx=true) at /mariadb/10.5/storage/innobase/handler/ha_innodb.cc:17018
17018		trx_t*		trx = check_trx_exists(thd);
(rr) next
17020		DBUG_ASSERT(hton == innodb_hton_ptr);
(rr) watch -l trx.state
Hardware watchpoint 2: -location trx.state
(rr) continue
Continuing.
 
Thread 2 hit Hardware watchpoint 2: -location trx.state
 
Old value = {m = std::atomic<trx_state_t> = { TRX_STATE_ACTIVE }}
New value = {m = std::atomic<trx_state_t> = { TRX_STATE_PREPARED }}
std::atomic<trx_state_t>::store (this=0x7efcd08ffbe0, __i=TRX_STATE_PREPARED, __m=std::memory_order_relaxed) at /usr/include/c++/12/atomic:272
272	      }
(rr) continue
Continuing.
 
Thread 2 hit Hardware watchpoint 2: -location trx.state
 
Old value = {m = std::atomic<trx_state_t> = { TRX_STATE_PREPARED }}
New value = {m = std::atomic<trx_state_t> = { TRX_STATE_COMMITTED_IN_MEMORY }}
std::atomic<trx_state_t>::store (this=0x7efcd08ffbe0, __i=TRX_STATE_COMMITTED_IN_MEMORY, __m=std::memory_order_relaxed) at /usr/include/c++/12/atomic:272
272	      }
(rr) backtrace
#0  std::atomic<trx_state_t>::store (this=0x7efcd08ffbe0, __i=TRX_STATE_COMMITTED_IN_MEMORY, __m=std::memory_order_relaxed) at /usr/include/c++/12/atomic:272
#1  0x000055d52b76955c in Atomic_relaxed<trx_state_t>::store (this=0x7efcd08ffbe0, i=TRX_STATE_COMMITTED_IN_MEMORY, o=std::memory_order_relaxed) at /mariadb/10.5/include/my_atomic_wrapper.h:47
#2  0x000055d52b767e01 in Atomic_relaxed<trx_state_t>::operator= (this=0x7efcd08ffbe0, i=TRX_STATE_COMMITTED_IN_MEMORY) at /mariadb/10.5/include/my_atomic_wrapper.h:49
#3  0x000055d52b7902a6 in trx_t::commit_state (this=0x7efcd08ffb80) at /mariadb/10.5/storage/innobase/trx/trx0trx.cc:481
#4  trx_t::commit_in_memory (mtr=0x7efcd0066ff0, this=0x7efcd08ffb80) at /mariadb/10.5/storage/innobase/trx/trx0trx.cc:1289
#5  trx_t::commit_low (this=0x7efcd08ffb80, mtr=0x7efcd0066ff0) at /mariadb/10.5/storage/innobase/trx/trx0trx.cc:1460
#6  0x000055d52b790a2f in trx_t::commit_persist (this=0x7efcd08ffb80) at /mariadb/10.5/storage/innobase/trx/trx0trx.cc:1474
#7  0x000055d52b790b2f in trx_t::commit (this=0x7efcd08ffb80) at /mariadb/10.5/storage/innobase/trx/trx0trx.cc:1483
#8  0x000055d52b7817ea in trx_t::rollback_finish (this=0x7efcd08ffb80) at /mariadb/10.5/storage/innobase/trx/trx0roll.cc:66
#9  0x000055d52b781f12 in trx_t::rollback_low (this=0x7efcd08ffb80, savept=0x0) at /mariadb/10.5/storage/innobase/trx/trx0roll.cc:133
#10 0x000055d52b77ecb3 in trx_rollback_for_mysql_low (trx=0x7efcd08ffb80) at /mariadb/10.5/storage/innobase/trx/trx0roll.cc:196
#11 0x000055d52b77f319 in trx_rollback_for_mysql (trx=0x7efcd08ffb80) at /mariadb/10.5/storage/innobase/trx/trx0roll.cc:269
#12 0x000055d52b50be0e in innobase_rollback (hton=0x55d52d7c5ae8, thd=0x7efcac0020d8, rollback_trx=true) at /mariadb/10.5/storage/innobase/handler/ha_innodb.cc:4706
#13 0x000055d52b129db6 in ha_rollback_trans (thd=0x7efcac0020d8, all=true) at /mariadb/10.5/sql/handler.cc:2181
#14 0x000055d52b055905 in trans_xa_detach (thd=0x7efcac0020d8) at /mariadb/10.5/sql/xa.cc:834
#15 0x000055d52acfab23 in THD::cleanup (this=0x7efcac0020d8) at /mariadb/10.5/sql/sql_class.cc:1551
#16 0x000055d52ac26cfb in unlink_thd (thd=0x7efcac0020d8) at /mariadb/10.5/sql/mysqld.cc:2728
#17 0x000055d52af3d2a4 in do_handle_one_connection (connect=0x55d52db0dfd8, put_in_cache=true) at /mariadb/10.5/sql/sql_connect.cc:1427
#18 0x000055d52af3cfa6 in handle_one_connection (arg=0x55d52daee7e8) at /mariadb/10.5/sql/sql_connect.cc:1318
#19 0x00007efcd108784a in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#20 0x00007efcd110a530 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100

I assume that with a kill-and-restart instead of my attempted shutdown-and-restart, the unexpected and wrong rollback would have been avoided.

The unexpected rollbacks were supposed to be removed in MDEV-742. That apparently happened only for client disconnect, not for server shutdown.

Comment by Marko Mäkelä [ 2022-10-20 ]

For an empty XA transaction, innobase_xa_prepare() will not be called. That is fine, but the following does not look correct to me:

--source include/have_binlog_format_mixed.inc
XA START '1';
XA END '1';
XA PREPARE '1';
XA RECOVER;
--source include/restart_mysqld.inc
XA RECOVER;
--error ER_XAER_NOTA
XA COMMIT '2';
XA COMMIT '1';

I got the following result:

10.6 db330c87d6d253a1e2e93a747327040d9205c5fe

mysqltest: At line 10: query 'XA COMMIT '1'' failed: ER_XAER_NOTA (1397): XAER_NOTA: Unknown XID

The XA RECOVER listed the transaction before the server was shut down and restarted but not after it.

Comment by Marko Mäkelä [ 2022-10-20 ]

I created a variation of my test that kills the server to work around the bug that causes the unexpected rollback:

--source include/have_innodb.inc
CREATE TABLE t1 (a INT PRIMARY KEY) ENGINE=InnoDB;
INSERT INTO t1 VALUES (1);
connect con1,localhost,root;
XA START '1';
INSERT INTO t1 VALUES(3);
--error ER_DUP_ENTRY
INSERT INTO t1 VALUES(2),(1);
XA END '1';
XA PREPARE '1';
XA RECOVER;
connection default;
--let $shutdown_timeout=0
--source include/restart_mysqld.inc
disconnect con1;
SET innodb_lock_wait_timeout=1;
--error ER_LOCK_WAIT_TIMEOUT
INSERT INTO t1 VALUES(3);
--error ER_LOCK_WAIT_TIMEOUT
DROP TABLE t1;
XA RECOVER;
XA ROLLBACK '1';
INSERT INTO t1 VALUES(3);
DROP TABLE t1;

2022-10-20 17:04:06 0 [Warning] Found 1 prepared XA transactions

If I remove the insert of the row (3), then the XA transaction will be persisted with no undo log records and no InnoDB locks recovered:

--source include/have_innodb.inc
CREATE TABLE t1 (a INT PRIMARY KEY) ENGINE=InnoDB;
INSERT INTO t1 VALUES (1);
connect con1,localhost,root;
XA START '1';
--error ER_DUP_ENTRY
INSERT INTO t1 VALUES(2),(1);
XA END '1';
XA PREPARE '1';
XA RECOVER;
connection default;
--let $shutdown_timeout=0
--source include/restart_mysqld.inc
disconnect con1;
DROP TABLE t1;
XA RECOVER;
XA ROLLBACK '1';

2022-10-20 17:06:39 0 [Note] InnoDB: Starting recovery for XA transactions...
2022-10-20 17:06:39 0 [Note] InnoDB: Transaction 24 in prepared state after recovery
2022-10-20 17:06:39 0 [Note] InnoDB: Transaction contains changes to 0 rows
2022-10-20 17:06:39 0 [Note] InnoDB: 1 transactions in prepared state after recovery
2022-10-20 17:06:39 0 [Note] Found 1 prepared transaction(s) in InnoDB
2022-10-20 17:06:39 0 [Warning] Found 1 prepared XA transactions

These tests demonstrate that the XA PREPARE is durably written inside InnoDB (it will trigger a synchronous redo log write) and that InnoDB will preserve transactions in XA PREPARE state until it is requested to commit or roll back them.

Comment by Andrei Elkin [ 2022-10-20 ]

marko, there's no question to the INSERT(3) case, it just demonstrates the read-write case is crash-recoverable.
The 2nd one is borrowed from the description with all my conclusion on incompatibilities. A glaring one to my taste is
The following case of the explicit rollback

     xa start '1';
      savepoint sp;
      ---error 0
      insert into t1 values (2);
     rollback to sp;
      xa end '1';
      xa prepare '1'; 

which does not recover after the crash. does not recover. Ideally they should behave equally I think.
But the major issue in this ticket has been identified by us in a slack conversation.
It is that the server should optimize the read-only branches of XA to not invoke their prepare(). After all that's what we effectively always do for pure empty XA (to defacto implement the XA specs' read-only optimization).

To the pure empty XA case with no Engine involved its prepared status is, as you speak 'optimized' and that's actually just to follow XA specs, specifically p.44:

xa_prepare() may return XA_RDONLY

The optimization decision was necessary for MDEV-742.

Comment by Marko Mäkelä [ 2022-10-27 ]

Elkin, for the following test, innobase_xa_prepare() is not being invoked at all:

--source include/have_innodb.inc
 
create table t1 (a int primary key) engine=innodb;
xa start '1';
savepoint sp;
insert into t1 values (2);
rollback to sp;
xa end '1';
xa prepare '1';
xa recover;
 
--source include/restart_mysqld.inc
xa recover;
#--error ER_XAER_NOTA
xa commit '1';
 
drop table t1;

Instead, a full InnoDB rollback is wrongly being invoked:

10.6 79dc3989fdc3bdcfc4eb5e17ac87fa501039ebba

#0  trx_t::commit (this=0x7f569f933b80) at /mariadb/10.6/storage/innobase/trx/trx0trx.cc:1482
#1  0x0000555ed506004e in trx_t::rollback_finish (this=this@entry=0x7f569f933b80) at /mariadb/10.6/storage/innobase/trx/trx0roll.cc:66
#2  0x0000555ed505fa5c in trx_t::rollback_low (this=this@entry=0x7f569f933b80, savept=savept@entry=0x0) at /mariadb/10.6/storage/innobase/trx/trx0roll.cc:133
#3  0x0000555ed505d453 in trx_rollback_for_mysql (trx=trx@entry=0x7f569f933b80) at /mariadb/10.6/storage/innobase/include/dyn0buf.h:192
#4  0x0000555ed4d3cb99 in innobase_rollback (hton=<optimized out>, thd=0x7f56740020d8, rollback_trx=true) at /mariadb/10.6/storage/innobase/handler/ha_innodb.cc:4697
#5  0x0000555ed476b0e6 in ha_rollback_to_savepoint (thd=thd@entry=0x7f56740020d8, sv=sv@entry=0x7f5674033f40) at /mariadb/10.6/sql/handler.cc:2877
#6  0x0000555ed4b87236 in trans_rollback_to_savepoint (thd=thd@entry=0x7f56740020d8, name={str = 0x7f567402df78 "sp", length = <synthetic pointer>}) at /mariadb/10.6/sql/transaction.cc:670
#7  0x0000555ed4a26062 in mysql_execute_command (thd=thd@entry=0x7f56740020d8, is_called_from_prepared_stmt=false) at /mariadb/10.6/sql/sql_parse.cc:5694
#8  0x0000555ed4a1fee0 in mysql_parse (thd=thd@entry=0x7f56740020d8, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x7f568d7ac3e0) at /mariadb/10.6/sql/sql_parse.cc:8016
#9  0x0000555ed4a1e4c7 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x7f56740020d8, packet=packet@entry=0x7f5674024539 "rollback to sp", packet_length=packet_length@entry=14, 
    blocking=true) at /mariadb/10.6/sql/sql_parse.cc:1896

The SQL layer is wrongly invoking a handlerton::rollback instead of handlerton::savepoint_rollback or handlerton::savepoint_rollback_can_release_mdl.

Please try to post complete test cases and try present proof (such as call stack traces) for claims like "automatic statement rollback in Innodb". InnoDB typically does what the SQL layer tells it to do.

Comment by Andrei Elkin [ 2022-10-28 ]

marko, thanks for following up! To

> Please try to post complete test cases and try present proof (such as call stack traces) for claims like "automatic statement rollback in Innodb"

I had removed (with explanation) any mentioning of 'automatic statement rollback in Innodb' (which also included 'partial' adjective and which was the key ).

After that this ticket contains two items to our attention:

  • The recovery/replication related one - the read-only optimization should be completed cleanly;
  • the errored statement rollback and explicit rollback to sp should be made compatible (might be also on the server layer to do so
    as your latest comment says).
Generated at Thu Feb 08 10:11:31 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.