[MDEV-27342] MariaDB 10.6 failed to do crash recovery, potential regression in commit 6c39eaeb Created: 2021-12-22  Updated: 2023-11-27  Resolved: 2022-03-15

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.6
Fix Version/s: 10.6.8, 10.7.4, 10.8.3

Type: Bug Priority: Major
Reporter: Hugo Wen Assignee: Andrei Elkin
Resolution: Fixed Votes: 2
Labels: None


 Description   

Issue description:

MariaDB is designed to be crash-safe and recent versions of MariaDB (tested with 10.5.12) are perfectly capable of starting and recovering if the /var/lib/mysql data directory is based on a filesystem level snapshot and there is a transaction in progress. However in 10.6.5 this is no longer the case and mariadbd fails to start and recover when the data directory is from a filesystem snapshot. Based on my investigation is seems to be a regression introduced in 6c39eaeb1.

MariaDB 10.6.5 failed to recover the database from a data directory snapshot which was taken while following transaction is running:

create database testdb;
use testdb;
CREATE TABLE `t1` (`col1` int(11) NOT NULL,`col2` int(11) DEFAULT NULL,PRIMARY KEY (`col1`)) ENGINE=InnoDB DEFAULT CHARSET=latin1;
DELIMITER $$
CREATE PROCEDURE generateredo(count int)
BEGIN
DECLARE counter INT DEFAULT 1;
WHILE counter <= count DO
insert into t1 values(counter,counter);
update t1 set col2=counter+1 where col1=counter;
delete from t1 where col1=counter;
SET counter = counter + 1;
END WHILE;
END$$
DELIMITER ;
 
call testdb.generateredo(10000000);

The mysql error log when recovering:

2021-12-20 18:55:50 0 [Note] RocksDB: 2 column families found
2021-12-20 18:55:50 0 [Note] RocksDB: Column Families at start:
2021-12-20 18:55:50 0 [Note]   cf=default
2021-12-20 18:55:50 0 [Note]     write_buffer_size=67108864
2021-12-20 18:55:50 0 [Note]     target_file_size_base=67108864
2021-12-20 18:55:50 0 [Note]   cf=__system__
2021-12-20 18:55:50 0 [Note]     write_buffer_size=67108864
2021-12-20 18:55:50 0 [Note]     target_file_size_base=67108864
2021-12-20 18:55:50 0 [Note] RocksDB: Table_store: loaded DDL data for 0 tables
2021-12-20 18:55:50 0 [Note] RocksDB: global statistics using get_sched_indexer_t indexer
2021-12-20 18:55:50 0 [Note] MyRocks storage engine plugin has been successfully initialized.
2021-12-20 18:55:50 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2021-12-20 18:55:50 0 [Note] InnoDB: Number of pools: 1
2021-12-20 18:55:50 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
2021-12-20 18:55:50 0 [Note] InnoDB: Using Linux native AIO
2021-12-20 18:55:50 0 [Note] InnoDB: Initializing buffer pool, total size = 49392123904, chunk size = 134217728
2021-12-20 18:55:50 0 [Note] InnoDB: Completed initialization of buffer pool
2021-12-20 18:55:50 0 [Note] InnoDB: Transaction 210265 was in the XA prepared state.
2021-12-20 18:55:50 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo
2021-12-20 18:55:50 0 [Note] InnoDB: Trx id counter is 210266
2021-12-20 18:55:50 0 [Note] InnoDB: 128 rollback segments are active.
2021-12-20 18:55:50 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2021-12-20 18:55:50 0 [Note] InnoDB: Setting file '/rdsdbdata/db/innodb/ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2021-12-20 18:55:50 0 [Note] InnoDB: File '/rdsdbdata/db/innodb/ibtmp1' size is now 12 MB.
2021-12-20 18:55:50 0 [Note] InnoDB: Starting in background the rollback of recovered transactions
2021-12-20 18:55:50 0 [Note] InnoDB: Rollback of non-prepared transactions completed
2021-12-20 18:55:50 0 [Note] InnoDB: 10.6.5 started; log sequence number 25054305; transaction id 210267
2021-12-20 18:55:50 0 [Note] InnoDB: Loading buffer pool(s) from /rdsdbdata/db/innodb/ib_buffer_pool
2021-12-20 18:55:50 0 [Note] Recovering after a crash using /rdsdbdata/log/binlog/mysql-bin-changelog
2021-12-20 18:55:50 0 [Note] Starting table crash recovery...
2021-12-20 18:55:50 0 [Note] InnoDB: Starting recovery for XA transactions...
2021-12-20 18:55:50 0 [Note] InnoDB: Transaction 210265 in prepared state after recovery
2021-12-20 18:55:50 0 [Note] InnoDB: Transaction contains changes to 1 rows
2021-12-20 18:55:50 0 [Note] InnoDB: 1 transactions in prepared state after recovery
2021-12-20 18:55:50 0 [Note] Found 1 prepared transaction(s) in InnoDB
2021-12-20 18:55:50 0 [Note] Crash table recovery finished.
2021-12-20 18:55:50 0 [Note] InnoDB: Starting recovery for XA transactions...
2021-12-20 18:55:50 0 [Note] InnoDB: Transaction 210265 in prepared state after recovery
2021-12-20 18:55:50 0 [Note] InnoDB: Transaction contains changes to 1 rows
2021-12-20 18:55:50 0 [Note] InnoDB: 1 transactions in prepared state after recovery
2021-12-20 18:55:50 0 [Note] Found 1 prepared transaction(s) in InnoDB
2021-12-20 18:55:50 0 [ERROR] Found 1 prepared transactions! It means that server was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start server with --tc-heuristic-recover switch to commit or rollback pending transactions.
2021-12-20 18:55:50 0 [ERROR] Aborting

The issue is regarded as a regression because the same data files are able to recover with MariaDB 10.5.12 binaries.

Some investigation:

Based on the implementation of function init_server_components(), during the recovery, ha_recover() function is called twice

  • Stack trace of the first call: https://github.com/MariaDB/server/blob/10.6/sql/mysqld.cc#L5292

    sql/handler.cc:2706(ha_recover(st_hash*, st_mem_root*))[0x5599477cf057]
    sql/log.cc:11169(MYSQL_BIN_LOG::recover(st_log_info*, char const*, st_io_cache*, Format_description_log_event*, bool))[0x5599478a80dd]
    sql/log.cc:11502(MYSQL_BIN_LOG::do_binlog_recovery(char const*, bool))[0x5599478a8931]
    sql/log.cc:10151(MYSQL_BIN_LOG::open(char const*))[0x5599478aceda]
    sql/mysqld.cc:5289(init_server_components())[0x5599475098ce]
    sql/mysqld.cc:5655(mysqld_main(int, char**))[0x55994750ed55]
    ??:0(__libc_start_main)[0x14f07bff20ba]
    

  • Stack trace of the second call: https://github.com/MariaDB/server/blob/10.6/sql/mysqld.cc#L5298

    sql/handler.cc:2710(ha_recover(st_hash*, st_mem_root*))[0x55e57611247b]
    sql/mysqld.cc:5295(init_server_components())[0x55e575e4c8df]
    sql/mysqld.cc:5655(mysqld_main(int, char**))[0x55e575e51d55]
    ??:0(__libc_start_main)[0x1542b650e0ba]
    ??:0(_start)[0x55e575e4764a]
    

However, in the commit the ha_recover() function call was moved from the end of function TC_LOG_BINLOG::recover to the beginning.

Tried a quick fix

I tried a simple fix of moving the ha_recover() back to the end of recover() then the issue was fixed:

diff --git a/sql/log.cc b/sql/log.cc
index 41eed80ef66..ddda2e5d330 100644
--- a/sql/log.cc
+++ b/sql/log.cc
@@ -11166,8 +11166,8 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name,
   fdle->flags&= ~LOG_EVENT_BINLOG_IN_USE_F; // abort on the first error
 
   /* finds xids when root is not NULL */
-  if (do_xa && ha_recover(&xids, &mem_root))
-    goto err1;
+  //if (do_xa && ha_recover(&xids, &mem_root))
+  //  goto err1;
 
   /*
     Scan the binlog for XIDs that need to be committed if still in the
@@ -11371,6 +11374,8 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name,
 
   if (do_xa)
   {
+    if (ha_recover(&xids))
+      goto err2;
     if (binlog_checkpoint_found)
     {
 #ifndef HAVE_REPLICATION

Error log before and after this fix
Before the fix, the error was in the second call of ha_recover():

2021-12-21 20:07:45 0 [Note] InnoDB: 10.6.5 started; log sequence number 15159225; transaction id 126557
2021-12-21 20:07:45 0 [Note] InnoDB: Loading buffer pool(s) from /rdsdbdata/db/innodb/ib_buffer_pool
2021-12-21 20:07:45 0 [Note] Plugin 'FEEDBACK' is disabled.
2021-12-21 20:07:45 0 [Note] Recovering after a crash using /rdsdbdata/log/binlog/mysql-bin-changelog
2021-12-21 20:07:45 0 [Note] Starting table crash recovery...
2021-12-21 20:07:45 0 [Note] InnoDB: Starting recovery for XA transactions...
2021-12-21 20:07:45 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery
2021-12-21 20:07:45 0 [Note] InnoDB: Transaction contains changes to 1 rows
2021-12-21 20:07:45 0 [Note] InnoDB: 1 transactions in prepared state after recovery
2021-12-21 20:07:45 0 [Note] Found 1 prepared transaction(s) in InnoDB
2021-12-21 20:07:45 0 [Note] Crash table recovery finished.
2021-12-21 20:07:45 0 [Note] InnoDB: Starting recovery for XA transactions...
2021-12-21 20:07:45 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery
2021-12-21 20:07:45 0 [Note] InnoDB: Transaction contains changes to 1 rows
2021-12-21 20:07:45 0 [Note] InnoDB: 1 transactions in prepared state after recovery
2021-12-21 20:07:45 0 [Note] Found 1 prepared transaction(s) in InnoDB
2021-12-21 20:07:45 0 [ERROR] Found 1 prepared transactions! It means that server was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start server with --tc-heuristic-recover switch to commit or rollback pending transactions.
2021-12-21 20:07:45 0 [ERROR] Aborting

After the fix, ha_recover() was still called twice(verified by adding some private log) but no error/note logs of the second call:

2021-12-21  1:00:20 0 [Note] InnoDB: 10.6.5 started; log sequence number 15158841; transaction id 126557
2021-12-21  1:00:20 0 [Note] InnoDB: Loading buffer pool(s) from /rdsdbdata/db/innodb/ib_buffer_pool
2021-12-21  1:00:20 0 [Note] Recovering after a crash using /rdsdbdata/log/binlog/mysql-bin-changelog
2021-12-21  1:00:20 0 [Note] Starting table crash recovery...
2021-12-21  1:00:20 0 [Note] InnoDB: Starting recovery for XA transactions...
2021-12-21  1:00:20 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery
2021-12-21  1:00:20 0 [Note] InnoDB: Transaction contains changes to 1 rows
2021-12-21  1:00:20 0 [Note] InnoDB: 1 transactions in prepared state after recovery
2021-12-21  1:00:20 0 [Note] Found 1 prepared transaction(s) in InnoDB
2021-12-21  1:00:20 0 [Note] Crash table recovery finished.
2021-12-21  1:00:20 0 [Note] Server socket created on IP: '0.0.0.0'.
2021-12-21  1:00:20 0 [Note] Server socket created on IP: '::'.
2021-12-21  1:00:20 0 [Note] InnoDB: Buffer pool(s) load completed at 211221  1:00:20
2021-12-21  1:00:20 0 [Note] /rdsdbbin/mysql/bin/mysqld: ready for connections.

Could you please investigate whether we should update the function or other conditions need to be considered?



 Comments   
Comment by Otto Kekäläinen [ 2021-12-22 ]

Commit was authored by sujatha.sivakumar and committed by Elkin (latter now assigned to issue). Maybe Andrei can give some pointers of what the original commit tried to achieve so it would be easier to debug what the code is doing?

Comment by Andrei Elkin [ 2021-12-23 ]

otto, No pointers can be made at this point but I am going to look into shortly. It feels a bug. You'll see more details from commits.

Comment by Andrei Elkin [ 2021-12-25 ]

wenhug, thanks for the report! It's not easy to reproduce though. I tried running your SP while kill -9 the server.
Every time there's a trx in prepared state it gets recovered.

First I am wondering if my method is really equivalent. Maybe you could give it a try to prove practically.
Secondly, could you please tell me your server build options and run time options?
Maybe building and running as you did I will have better luck.

Notice you can gain really helpful warnings into the error log with setting log_warning > 2.
See e.g

   if (count_in_prepare > 0 && global_system_variables.log_warnings > 2)
    {
      sql_print_warning("Could not complete %u number of transactions.",
                        count_in_prepare);

Once you'll have more info in the error log I may have some ideas.

If nothing like that will help us, please consider if I could log in your server and examine the recovery
with gdb.

Thank you and Merry Xmas!

Comment by Andrei Elkin [ 2021-12-26 ]

wenhug, to your quick fix,

it's equivalent of executing --tc-heuristic-recover=ROLLBACK. That's why it works out.

After closer look into possible causes of non-decided xids by the 2nd ha_recover() time,
there's a

    if (round == 1)
    {
      if (!binlog_checkpoint_found)
        break;   

escape from the for-loop and after it

  if (do_xa)
  {
    if (binlog_checkpoint_found)
    {
#ifndef HAVE_REPLICATION
      if (ha_recover_complete(&xids))
#else
      if (ctx.complete(this, xids))
#endif

won't be attended.
Could you please examine the last binlog file on whether it contains any Binlog_checkpoint_log_event.
When it does not indeed, your next attempt with log_warnings = 3 obviously won't hit

    if (count_in_prepare > 0 && global_system_variables.log_warnings > 2)
    {
      sql_print_warning("Could not complete %u number of transactions.",
                        count_in_prepare);
      return false; // there's later dry run ha_recover() to error out
    }

which otherwise (Binlog_checkpoint event is in place) it must do.
Also more warnings from inside ha_recover_complete would also display then xid:s of undecided sticky transactions.

Hope this will be helpful.

Comment by Hugo Wen [ 2021-12-26 ]

Elkin Thank you very much for looking into the issue!

The issue occurred when creating a snapshot of the file system. I couldn't reproduce with `kill -9` either using the same binaries. Also probably there were some other SQL commands ran before the snapshot was taken but I don't know for sure.
I'll do some more tests and see if there's a steady way to reproduce it. Or maybe can share the data file if that's allowed by our policy.

I just tried starting mysqld with --log_warnings=3 as you suggested:
(It's on another instance with the same issue, so the trx id is different to the original one)

2021-12-26 19:39:04 0 [Note] Loaded 'ha_rocksdb.so' with offset 0x1550a0e1b000
2021-12-26 19:39:04 0 [Note] RocksDB: 2 column families found
2021-12-26 19:39:04 0 [Note] RocksDB: Column Families at start:
2021-12-26 19:39:04 0 [Note]   cf=default
2021-12-26 19:39:04 0 [Note]     write_buffer_size=67108864
2021-12-26 19:39:04 0 [Note]     target_file_size_base=67108864
2021-12-26 19:39:04 0 [Note]   cf=__system__
2021-12-26 19:39:04 0 [Note]     write_buffer_size=67108864
2021-12-26 19:39:04 0 [Note]     target_file_size_base=67108864
2021-12-26 19:39:04 0 [Note] RocksDB: Table_store: loaded DDL data for 0 tables
2021-12-26 19:39:04 0 [Note] RocksDB: global statistics using get_sched_indexer_t indexer
2021-12-26 19:39:04 0 [Note] MyRocks storage engine plugin has been successfully initialized.
2021-12-26 19:39:04 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2021-12-26 19:39:04 0 [Note] InnoDB: Number of pools: 1
2021-12-26 19:39:04 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
2021-12-26 19:39:04 0 [Note] InnoDB: Using Linux native AIO
2021-12-26 19:39:04 0 [Note] InnoDB: Initializing buffer pool, total size = 2147483648, chunk size = 134217728
2021-12-26 19:39:04 0 [Note] InnoDB: Completed initialization of buffer pool
2021-12-26 19:39:04 0 [Note] InnoDB: Transaction 126555 was in the XA prepared state.
2021-12-26 19:39:04 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo
2021-12-26 19:39:04 0 [Note] InnoDB: Trx id counter is 126556
2021-12-26 19:39:04 0 [Note] InnoDB: 128 rollback segments are active.
2021-12-26 19:39:04 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2021-12-26 19:39:04 0 [Note] InnoDB: Starting in background the rollback of recovered transactions
2021-12-26 19:39:04 0 [Note] InnoDB: Rollback of non-prepared transactions completed
2021-12-26 19:39:04 0 [Note] InnoDB: Setting file '/rdsdbdata/db/innodb/ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2021-12-26 19:39:04 0 [Note] InnoDB: File '/rdsdbdata/db/innodb/ibtmp1' size is now 12 MB.
2021-12-26 19:39:04 0 [Note] InnoDB: 10.6.5 started; log sequence number 15159285; transaction id 126557
2021-12-26 19:39:04 0 [Note] InnoDB: Loading buffer pool(s) from /rdsdbdata/db/innodb/ib_buffer_pool
2021-12-26 19:39:04 0 [Note] Recovering after a crash using /rdsdbdata/log/binlog/mysql-bin-changelog
2021-12-26 19:39:04 0 [Note] Starting table crash recovery...
2021-12-26 19:39:04 0 [Note] InnoDB: Starting recovery for XA transactions...
2021-12-26 19:39:04 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery
2021-12-26 19:39:04 0 [Note] InnoDB: Transaction contains changes to 1 rows
2021-12-26 19:39:04 0 [Note] InnoDB: 1 transactions in prepared state after recovery
2021-12-26 19:39:04 0 [Note] Found 1 prepared transaction(s) in InnoDB
2021-12-26 19:39:04 0 [Note] Crash table recovery finished.
2021-12-26 19:39:04 0 [Warning] Found prepared transaction with xid 118560
2021-12-26 19:39:04 0 [Warning] Could not complete 1 number of transactions.
2021-12-26 19:39:04 0 [Note] InnoDB: Starting recovery for XA transactions...
2021-12-26 19:39:04 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery
2021-12-26 19:39:04 0 [Note] InnoDB: Transaction contains changes to 1 rows
2021-12-26 19:39:04 0 [Note] InnoDB: 1 transactions in prepared state after recovery
2021-12-26 19:39:04 0 [Note] Found 1 prepared transaction(s) in InnoDB
2021-12-26 19:39:04 0 [ERROR] Found 1 prepared transactions! It means that server was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start server with --tc-heuristic-recover switch to commit or rollback pending transactions.
2021-12-26 19:39:04 0 [ERROR] Aborting

And the binlog files are like this:

[19:23:09][root]$ ls -alt
total 13336
drwxr-xr-x  2 rdsdb rdsdb    4096 Dec 26 19:18 .
-rw-rw-rw-  1 rdsdb rdsdb     343 Dec 26 19:18 mysql-bin-changelog.index
-rw-rw----  1 rdsdb rdsdb   67146 Dec 20 20:25 mysql-bin-changelog.000021
-rw-rw----  1 rdsdb rdsdb 1707199 Dec 20 20:25 mysql-bin-changelog.000020
-rw-rw----  1 rdsdb rdsdb 4792114 Dec 20 20:25 mysql-bin-changelog.000019
-rw-rw-r--  1 rdsdb rdsdb 7051915 Dec 20 20:23 mysql-bin-changelog.000018
-rw-rw-r--  1 rdsdb rdsdb     717 Dec 20 20:20 mysql-bin-changelog.000017
-rw-rw-r--  1 rdsdb rdsdb     717 Dec 20 20:15 mysql-bin-changelog.000016
-rw-rw-r--  1 rdsdb rdsdb     717 Dec 20 20:10 mysql-bin-changelog.000015
drwxr-xr-x 10 rdsdb rdsdb    4096 Dec 20 19:04 ..
[19:23:11][root]$ 
[19:23:12][root]$ /rdsdbbin/mysql/bin/mysqlbinlog mysql-bin-changelog.000021 | grep checkpoint
#211220 20:25:29 server id 1604102129  end_log_pos 352 CRC32 0x1c7cf816 	Binlog checkpoint mysql-bin-changelog.000020
#211220 20:25:29 server id 1604102129  end_log_pos 627 CRC32 0x6509ef3b 	Binlog checkpoint mysql-bin-changelog.000021
[19:23:15][root]$ 

I'll be OOO until January 3rd, later I can do more tests and private builds if needed. Happy holidays and happy new year!

Comment by Andrei Elkin [ 2021-12-27 ]

wenhug,

So it looks very much that the block of static void xarecover_do_commit_or_rollback()

{
  rc= xarecover_decide_to_commit(member, ptr_commit_max) ?
    hton->commit_by_xid(hton, &x) : hton->rollback_by_xid(hton, &x);
 
  /*
    It's fine to have non-zero rc which would be from transaction
    non-participant hton:s.
  */
  DBUG_ASSERT(rc || member->in_engine_prepare > 0);

computes rc >0. It could be XAER_NOTA = -4 or XAER_RMFAIL = -7.
Also consider the latter code is returned "normally" at times:

innobase_commit_by_xid(
/*===================*/
	handlerton*	hton,
	XID*		xid)	/*!< in: X/Open XA transaction identification */
{
	...
	if (high_level_read_only) {
		return(XAER_RMFAIL);
	}

Enjoy your yearend, bloody well deserved & Happy New one ahead!

Andrei

Comment by Hugo Wen [ 2022-01-08 ]

Elkin,

I tried several ways to reproduce the issue but seems it's not easy to reproduce with normal pkill commands. Probably requires a file system snapshot or something, I'm still testing.

However I created a tar file of the crashed data directory and I just sent to you in a separate email. Would you be able to help debugging with those data files?

Thanks,
Hugo

Comment by Hugo Wen [ 2022-01-12 ]

After debugging line by line using GDB, root cause was found.

When doing the recovery, the server generates a list of XIDs to be committed or rolled-back.
Then the callback function of innodb will be called to handle the commit or rollback.

In this case, the innodb transaction needs to be rolled-back. However it failed when searching the XID from trx_sys in function trx_get_trx_by_xid_callback . To be specific, failed when comparing xid data in line-2014. Code

Failed reason is server id doesn't match. Server ID is saved in xid_t.data[]. The format of xid_t.data[] is:

MySQLXid<Server ID 8 bytes><XID 8 bytes>

Paste the GDB output here:

arg->xid is the input of the function to be searched from the hash.
element->trx->xid is the one exists in the hash which expected to match

Apparently gtrid_length, bqual_length, and transaction id all match but server ids are different.

  • ((ulonglong)(arg->xid->data+8)) = 1049637262 which is the server id of the restored database.
  • ((ulonglong)(element->trx->xid->data+8)) = 1604102129 which is the server id of the original database that the snapshot taken from.

(gdb) p element->id
$2 = 126555
(gdb) p element->trx
$3 = (trx_t *) 0x7fffebcca288
(gdb) p element->trx->is_recovered
$4 = true
(gdb) p element->trx->state
$5 = {m = {static _S_min_alignment = 4, static _S_alignment = 4, _M_i = TRX_STATE_PREPARED}}
(gdb) p arg->xid
$6 = (const XID *) 0x7fffffffad90
(gdb) p element->trx->xid
$7 = (XID *) 0x7ffff68b9258
(gdb) p arg->xid->gtrid_length
$8 = 24
(gdb) p element->trx->xid->gtrid_length
$9 = 24
(gdb) p arg->xid->bqual_length
$10 = 0
(gdb) p element->trx->xid->bqual_length
$11 = 0
(gdb) p arg->xid->data
$12 = "MySQLXid\216\061\220>\000\000\000\000 \317\001", '\000' <repeats 22 times>, "\002\000\000\000\000\000\000\000\277;W\031\000\000\000`\256\377\377\377\177\000\000J\006\001\000\000\000\000\000\023", '\000' <repeats 15 times>, "й\377\377\377\177\000\000\266\071\000\000\000\000\000\000Ю\377\377\377\177\000\000\337\361wVUU\000\000\230\000\000\000\000\000\000"
(gdb) p element->trx->xid->data
$13 = "MySQLXid\361\247\234_\000\000\000\000 \317\001", '\000' <repeats 108 times>
(gdb) p *((ulonglong*)(arg->xid->data+16))
$14 = 118560
(gdb) p *((ulonglong*)(element->trx->xid->data+16))
$15 = 118560
(gdb) p *((ulonglong*)(arg->xid->data+8))
$16 = 1049637262
(gdb) p *((ulonglong*)(element->trx->xid->data+8))
$17 = 1604102129

Why the server id doesn't match ?

After this commit, the input xid creation becomes using the set function:

  void set(ulonglong xid)
  {
    my_xid tmp;
    formatID= 1;
    set(MYSQL_XID_PREFIX_LEN, 0, MYSQL_XID_PREFIX);
    memcpy(data+MYSQL_XID_PREFIX_LEN, &server_id, sizeof(server_id));
    tmp= xid;
    memcpy(data+MYSQL_XID_OFFSET, &tmp, sizeof(tmp));
    gtrid_length=MYSQL_XID_GTRID_LEN;
  }

This set function doesn't copy the original server id, instead it copy the new server_id of the new database.

memcpy(data+MYSQL_XID_PREFIX_LEN, &server_id, sizeof(server_id))

Therefore the xid can't be find in trx_sys because the server id doesn't match any more.

Why there's no such issue in 10.5?

In 10.5, the input xid was created directly from original xid : code xid= *trx->xid;

      XID& xid= arg->xid_list[arg->count];
      if (arg->count++ < arg->len)
      {
        trx->state= TRX_STATE_PREPARED_RECOVERED;
        ...
        xid= *trx->xid;
      }

So the server match keeps the same with the old one.

To make it short, the issue could happen when restoring a new instance from original crashed data directory USING NEW SERVER ID. The issue doesn't exist in previous major versions before 10.6.

How to fix?

Seems a possible fix is to update the logic of creating the input XID object:

Proposed fix:

I've tested following changes fixed the issue. with the fix db recovered successfully.
While I'm still running the full mtr tests against the clean upstream commits to make sure it doesn't break anything, I paste it here as a raw diff file

If there's no breaking mtr tests and no obvious issues I will create PR to fix it.
Also may worth to add a MTR test, but it's late today I'll work on it tomorrow.

Elkin, could you please help review if the root cause and fix makes sense?

[wenhug@dev-dsk-wenhug-2b-f388f48d RDSMariaDB]$ cat brazil-build/patches/mariadb-1335.diff 
diff --git a/sql/handler.cc b/sql/handler.cc
index e4543bde5de..5cad422f45f 100644
--- a/sql/handler.cc
+++ b/sql/handler.cc
@@ -2414,7 +2414,7 @@ struct xarecover_st
 */
 static xid_recovery_member*
 xid_member_insert(HASH *hash_arg, my_xid xid_arg, MEM_ROOT *ptr_mem_root,
-                  XID *full_xid_arg)
+                  XID *full_xid_arg, ulong server_id_arg)
 {
   xid_recovery_member *member= (xid_recovery_member *)
     alloc_root(ptr_mem_root, sizeof(xid_recovery_member));
@@ -2428,7 +2428,7 @@ xid_member_insert(HASH *hash_arg, my_xid xid_arg, MEM_ROOT *ptr_mem_root,
 
   if (full_xid_arg)
     *xid_full= *full_xid_arg;
-  *member= xid_recovery_member(xid_arg, 1, false, xid_full);
+  *member= xid_recovery_member(xid_arg, 1, false, xid_full, server_id_arg);
 
   return
     my_hash_insert(hash_arg, (uchar*) member) ? NULL : member;
@@ -2443,14 +2443,15 @@ xid_member_insert(HASH *hash_arg, my_xid xid_arg, MEM_ROOT *ptr_mem_root,
 */
 static bool xid_member_replace(HASH *hash_arg, my_xid xid_arg,
                                MEM_ROOT *ptr_mem_root,
-                               XID *full_xid_arg)
+                               XID *full_xid_arg,
+                               ulong server_id_arg)
 {
   xid_recovery_member* member;
   if ((member= (xid_recovery_member *)
        my_hash_search(hash_arg, (uchar *)& xid_arg, sizeof(xid_arg))))
     member->in_engine_prepare++;
   else
-    member= xid_member_insert(hash_arg, xid_arg, ptr_mem_root, full_xid_arg);
+    member= xid_member_insert(hash_arg, xid_arg, ptr_mem_root, full_xid_arg,  server_id_arg);
 
   return member == NULL;
 }
@@ -2502,7 +2503,7 @@ static void xarecover_do_commit_or_rollback(handlerton *hton,
   Binlog_offset *ptr_commit_max= arg->binlog_coord;
 
   if (!member->full_xid)
-    x.set(member->xid);
+    x.set(member->xid, member->server_id);
   else
     x= *member->full_xid;
 
@@ -2660,7 +2661,8 @@ static my_bool xarecover_handlerton(THD *unused, plugin_ref plugin,
         {
           // remember "full" xid too when it's not in mysql format
           if (xid_member_replace(info->commit_list, x, info->mem_root,
-                                 is_server_xid? NULL : &info->list[i]))
+                                 is_server_xid? NULL : &info->list[i],
+                                 is_server_xid? info->list[i].get_trx_server_id(): server_id))
           {
             info->error= true;
             sql_print_error("Error in memory allocation at xarecover_handlerton");
diff --git a/sql/handler.h b/sql/handler.h
index 0cdaf1dd698..73014093705 100644
--- a/sql/handler.h
+++ b/sql/handler.h
@@ -895,6 +895,18 @@ struct xid_t {
     memcpy(data+MYSQL_XID_OFFSET, &tmp, sizeof(tmp));
     gtrid_length=MYSQL_XID_GTRID_LEN;
   }
+
+  void set(ulonglong xid, ulong server_id)
+  {
+    my_xid tmp;
+    formatID= 1;
+    set(MYSQL_XID_PREFIX_LEN, 0, MYSQL_XID_PREFIX);
+    memcpy(data+MYSQL_XID_PREFIX_LEN, &server_id, sizeof(server_id));
+    tmp= xid;
+    memcpy(data+MYSQL_XID_OFFSET, &tmp, sizeof(tmp));
+    gtrid_length=MYSQL_XID_GTRID_LEN;
+  }
+
   void set(long g, long b, const char *d)
   {
     formatID= 1;
@@ -916,6 +928,12 @@ struct xid_t {
            !memcmp(data, MYSQL_XID_PREFIX, MYSQL_XID_PREFIX_LEN) ?
            quick_get_my_xid() : 0;
   }
+  ulong get_trx_server_id()
+  {
+    ulong trx_server_id;
+    memcpy(&trx_server_id, data+MYSQL_XID_PREFIX_LEN, sizeof(server_id));
+    return trx_server_id;
+  }
   uint length()
   {
     return static_cast<uint>(sizeof(formatID)) + key_length();
@@ -957,11 +975,12 @@ struct xid_recovery_member
   bool decided_to_commit;
   Binlog_offset binlog_coord; // semisync recovery binlog offset
   XID *full_xid;           // needed by wsrep or past it recovery
+  ulong server_id;         // server id of original server
 
   xid_recovery_member(my_xid xid_arg, uint prepare_arg, bool decided_arg,
-                      XID *full_xid_arg)
+                      XID *full_xid_arg, ulong server_id_arg)
     : xid(xid_arg), in_engine_prepare(prepare_arg),
-      decided_to_commit(decided_arg), full_xid(full_xid_arg) {};
+      decided_to_commit(decided_arg), full_xid(full_xid_arg) , server_id(server_id_arg) {};
 };
 
 /* for recover() handlerton call */

Comment by Andrei Elkin [ 2022-01-12 ]

wenhug, thanks for digging it out!

Indeed, 6c39eaeb1 made recovery dependent on server_id.
Your diff makes sense, so feel free to go with a PR.
To a lower level, having only browsed it, I am not sure if a new full-blown SET method is necessary (while we only need server_id to overwrite)...

Cheers,

Andrei.

Comment by Hugo Wen [ 2022-01-12 ]

Elkin, I created a PR for the issue https://github.com/MariaDB/server/pull/1983 . Could you help review it ? Thanks a lot!

Comment by Andrei Elkin [ 2022-03-15 ]

commit dafc5fb9c17637c6233e8a73c890f158d645a37c
Author: Hugo Wen <wenhug@amazon.com>
Date: Fri Feb 4 03:56:08 2022 +0000

is pushed into 10.6.

Generated at Thu Feb 08 09:52:11 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.