[MDEV-8794] Recovery failed! You must enable exactly 3 storage engines that support two-phase commit protocol Created: 2015-09-11  Updated: 2015-11-06  Resolved: 2015-11-06

Status: Closed
Project: MariaDB Server
Component/s: Plugins, XA
Affects Version/s: 10.1
Fix Version/s: N/A

Type: Bug Priority: Blocker
Reporter: Elena Stepanova Assignee: Sergei Golubchik
Resolution: Won't Fix Votes: 0
Labels: None

Issue Links:
Blocks
blocks MDEV-9039 Can't upgrade MariaDB to to 10.1.8 ve... Closed
Sprint: 10.1.9-2

 Description   

Test flow

- start server with SEQUENCE and InnoDB
- kill server (do not shut down, make it die)
- start server without SEQUENCE
  => server does not start

Note: It does not matter which xa-capable engine is missing, and whether it is missing or it newly appeared, only the number matters. E.g. it can be reproduced by adding TokuDB instead of removing SEQUENCE; on the other hand, if you remove SEQUENCE and add TokuDB at once, the server starts all right, because the number of engines is the same.

Test case

--source include/have_innodb.inc
 
--echo # Shutdown the server...
 
--exec echo "wait" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
--shutdown_server 10
--source include/wait_until_disconnected.inc
 
--echo # Restart the server with SEQUENCE enabled....
 
--exec echo "restart:--sequence --plugin-load-add=ha_sequence.so" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
--enable_reconnect
--source include/wait_until_connected_again.inc
 
select engine, support from information_schema.engines;
 
--echo # Kill the server....
 
--exec echo "wait" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
--shutdown_server 0
--source include/wait_until_disconnected.inc
 
--echo # Restart the server without SEQUENCE....
 
--exec echo "restart:--loose-skip-sequence" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
--enable_reconnect
--source include/wait_until_connected_again.inc
 
select engine, support from information_schema.engines;
 
--echo # All done.

2015-09-11 20:15:30 139911419701088 [ERROR] Recovery failed! You must enable exactly 3 storage engines that support two-phase commit protocol
2015-09-11 20:15:30 139911419701088 [ERROR] Crash recovery failed. Either correct the problem (if it's, for example, out of memory error) and restart, or delete tc log and start mysqld with --tc-heuristic-recover={commit|rollback}
2015-09-11 20:15:30 139911419701088 [ERROR] Can't init tc log
2015-09-11 20:15:30 139911419701088 [ERROR] Aborting

Also, the number it reports looks weird. When there were two engines and one is gone, it asks for 3, etc.



 Comments   
Comment by Elena Stepanova [ 2015-10-30 ]

Raising the priority because of MDEV-9039.
We have enabled the sequence engine by default, so all upgrades which happen after the server crashed or was killed are in trouble.

Comment by Sergei Golubchik [ 2015-11-06 ]

This behavior is intentional. The check for the number of engines is supposed to catch user mistakes when one of XA-capable engines is disabled after the crash but before the recovery. Note that

  • this engine might have prepared but uncommitted transactions
  • the only entity that knows whether to commit or rollback these transaction is the tc-log
  • tc-log is destroyed after the recovery

It follows that all XA-capable engines from before the crash must be present during recovery, otherwise you risk "hanging" transactions and inconsistent data.

This is not a bullet-proof check, if one wants to fool it and get inconstent data, he can unload XA-capable engine and replace it with a different XA-capable engine. Or one can simply delete the tc-log file. This check is only designed to prevents user mistakes, not malicious actions.

The upgrade — sequence engine related — issue will be fixed in MDEV-9039.

Generated at Thu Feb 08 07:29:51 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.