[MDEV-27342] MariaDB 10.6 failed to do crash recovery, potential regression in commit 6c39eaeb - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.6
Fix Version/s: 10.6.8, 10.7.4, 10.8.3
Component/s: Replication
Labels:
None

Description

Issue description:

MariaDB is designed to be crash-safe and recent versions of MariaDB (tested with 10.5.12) are perfectly capable of starting and recovering if the /var/lib/mysql data directory is based on a filesystem level snapshot and there is a transaction in progress. However in 10.6.5 this is no longer the case and mariadbd fails to start and recover when the data directory is from a filesystem snapshot. Based on my investigation is seems to be a regression introduced in 6c39eaeb1.

MariaDB 10.6.5 failed to recover the database from a data directory snapshot which was taken while following transaction is running:

create database testdb;

use testdb;

CREATE TABLE `t1` (`col1` int(11) NOT NULL,`col2` int(11) DEFAULT NULL,PRIMARY KEY (`col1`)) ENGINE=InnoDB DEFAULT CHARSET=latin1;

DELIMITER $$

CREATE PROCEDURE generateredo(count int)

BEGIN

DECLARE counter INT DEFAULT 1;

WHILE counter <= count DO

insert into t1 values(counter,counter);

update t1 set col2=counter+1 where col1=counter;

delete from t1 where col1=counter;

SET counter = counter + 1;

END WHILE;

END$$

DELIMITER ;

call testdb.generateredo(10000000);

The mysql error log when recovering:

2021-12-20 18:55:50 0 [Note] RocksDB: 2 column families found

2021-12-20 18:55:50 0 [Note] RocksDB: Column Families at start:

2021-12-20 18:55:50 0 [Note]   cf=default

2021-12-20 18:55:50 0 [Note]     write_buffer_size=67108864

2021-12-20 18:55:50 0 [Note]     target_file_size_base=67108864

2021-12-20 18:55:50 0 [Note]   cf=__system__

2021-12-20 18:55:50 0 [Note]     write_buffer_size=67108864

2021-12-20 18:55:50 0 [Note]     target_file_size_base=67108864

2021-12-20 18:55:50 0 [Note] RocksDB: Table_store: loaded DDL data for 0 tables

2021-12-20 18:55:50 0 [Note] RocksDB: global statistics using get_sched_indexer_t indexer

2021-12-20 18:55:50 0 [Note] MyRocks storage engine plugin has been successfully initialized.

2021-12-20 18:55:50 0 [Note] InnoDB: Compressed tables use zlib 1.2.11

2021-12-20 18:55:50 0 [Note] InnoDB: Number of pools: 1

2021-12-20 18:55:50 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions

2021-12-20 18:55:50 0 [Note] InnoDB: Using Linux native AIO

2021-12-20 18:55:50 0 [Note] InnoDB: Initializing buffer pool, total size = 49392123904, chunk size = 134217728

2021-12-20 18:55:50 0 [Note] InnoDB: Completed initialization of buffer pool

2021-12-20 18:55:50 0 [Note] InnoDB: Transaction 210265 was in the XA prepared state.

2021-12-20 18:55:50 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo

2021-12-20 18:55:50 0 [Note] InnoDB: Trx id counter is 210266

2021-12-20 18:55:50 0 [Note] InnoDB: 128 rollback segments are active.

2021-12-20 18:55:50 0 [Note] InnoDB: Creating shared tablespace for temporary tables

2021-12-20 18:55:50 0 [Note] InnoDB: Setting file '/rdsdbdata/db/innodb/ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...

2021-12-20 18:55:50 0 [Note] InnoDB: File '/rdsdbdata/db/innodb/ibtmp1' size is now 12 MB.

2021-12-20 18:55:50 0 [Note] InnoDB: Starting in background the rollback of recovered transactions

2021-12-20 18:55:50 0 [Note] InnoDB: Rollback of non-prepared transactions completed

2021-12-20 18:55:50 0 [Note] InnoDB: 10.6.5 started; log sequence number 25054305; transaction id 210267

2021-12-20 18:55:50 0 [Note] InnoDB: Loading buffer pool(s) from /rdsdbdata/db/innodb/ib_buffer_pool

2021-12-20 18:55:50 0 [Note] Recovering after a crash using /rdsdbdata/log/binlog/mysql-bin-changelog

2021-12-20 18:55:50 0 [Note] Starting table crash recovery...

2021-12-20 18:55:50 0 [Note] InnoDB: Starting recovery for XA transactions...

2021-12-20 18:55:50 0 [Note] InnoDB: Transaction 210265 in prepared state after recovery

2021-12-20 18:55:50 0 [Note] InnoDB: Transaction contains changes to 1 rows

2021-12-20 18:55:50 0 [Note] InnoDB: 1 transactions in prepared state after recovery

2021-12-20 18:55:50 0 [Note] Found 1 prepared transaction(s) in InnoDB

2021-12-20 18:55:50 0 [Note] Crash table recovery finished.

2021-12-20 18:55:50 0 [Note] InnoDB: Starting recovery for XA transactions...

2021-12-20 18:55:50 0 [Note] InnoDB: Transaction 210265 in prepared state after recovery

2021-12-20 18:55:50 0 [Note] InnoDB: Transaction contains changes to 1 rows

2021-12-20 18:55:50 0 [Note] InnoDB: 1 transactions in prepared state after recovery

2021-12-20 18:55:50 0 [Note] Found 1 prepared transaction(s) in InnoDB

2021-12-20 18:55:50 0 [ERROR] Found 1 prepared transactions! It means that server was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start server with --tc-heuristic-recover switch to commit or rollback pending transactions.

2021-12-20 18:55:50 0 [ERROR] Aborting

The issue is regarded as a regression because the same data files are able to recover with MariaDB 10.5.12 binaries.

Some investigation:

Based on the implementation of function init_server_components(), during the recovery, ha_recover() function is called twice

Stack trace of the first call: https://github.com/MariaDB/server/blob/10.6/sql/mysqld.cc#L5292

sql/handler.cc:2706(ha_recover(st_hash*, st_mem_root*))[0x5599477cf057]

sql/log.cc:11169(MYSQL_BIN_LOG::recover(st_log_info*, char const*, st_io_cache*, Format_description_log_event*, bool))[0x5599478a80dd]

sql/log.cc:11502(MYSQL_BIN_LOG::do_binlog_recovery(char const*, bool))[0x5599478a8931]

sql/log.cc:10151(MYSQL_BIN_LOG::open(char const*))[0x5599478aceda]

sql/mysqld.cc:5289(init_server_components())[0x5599475098ce]

sql/mysqld.cc:5655(mysqld_main(int, char**))[0x55994750ed55]

??:0(__libc_start_main)[0x14f07bff20ba]

Stack trace of the second call: https://github.com/MariaDB/server/blob/10.6/sql/mysqld.cc#L5298

sql/handler.cc:2710(ha_recover(st_hash*, st_mem_root*))[0x55e57611247b]

sql/mysqld.cc:5295(init_server_components())[0x55e575e4c8df]

sql/mysqld.cc:5655(mysqld_main(int, char**))[0x55e575e51d55]

??:0(__libc_start_main)[0x1542b650e0ba]

??:0(_start)[0x55e575e4764a]

However, in the commit the ha_recover() function call was moved from the end of function TC_LOG_BINLOG::recover to the beginning.

Before the commit, in the second call of ha_recover(), no actual recover was done because trx_sys.rw_trx_hash.iterate_no_dups() iterate doesn't have any items - https://github.com/MariaDB/server/blob/10.6/storage/innobase/trx/trx0trx.cc#L1980 .

However after this commit, the second call of ha_recover() also tries to do some recovery which eventually caused the error of:
https://github.com/MariaDB/server/blob/10.6/sql/handler.cc#L2746-L2756

[ERROR] Found 1 prepared transactions! It means that server was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start server with --tc-heuristic-recover switch to commit or rollback pending transactions.

Tried a quick fix

I tried a simple fix of moving the ha_recover() back to the end of recover() then the issue was fixed:

diff --git a/sql/log.cc b/sql/log.cc

index 41eed80ef66..ddda2e5d330 100644

--- a/sql/log.cc

+++ b/sql/log.cc

@@ -11166,8 +11166,8 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name,

   fdle->flags&= ~LOG_EVENT_BINLOG_IN_USE_F; // abort on the first error

   /* finds xids when root is not NULL */

-  if (do_xa && ha_recover(&xids, &mem_root))

-    goto err1;

+  //if (do_xa && ha_recover(&xids, &mem_root))

+  //  goto err1;

/*

     Scan the binlog for XIDs that need to be committed if still in the

@@ -11371,6 +11374,8 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name,

   if (do_xa)

+    if (ha_recover(&xids))

+      goto err2;

     if (binlog_checkpoint_found)

 #ifndef HAVE_REPLICATION

Error log before and after this fix
Before the fix, the error was in the second call of ha_recover():

2021-12-21 20:07:45 0 [Note] InnoDB: 10.6.5 started; log sequence number 15159225; transaction id 126557

2021-12-21 20:07:45 0 [Note] InnoDB: Loading buffer pool(s) from /rdsdbdata/db/innodb/ib_buffer_pool

2021-12-21 20:07:45 0 [Note] Plugin 'FEEDBACK' is disabled.

2021-12-21 20:07:45 0 [Note] Recovering after a crash using /rdsdbdata/log/binlog/mysql-bin-changelog

2021-12-21 20:07:45 0 [Note] Starting table crash recovery...

2021-12-21 20:07:45 0 [Note] InnoDB: Starting recovery for XA transactions...

2021-12-21 20:07:45 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery

2021-12-21 20:07:45 0 [Note] InnoDB: Transaction contains changes to 1 rows

2021-12-21 20:07:45 0 [Note] InnoDB: 1 transactions in prepared state after recovery

2021-12-21 20:07:45 0 [Note] Found 1 prepared transaction(s) in InnoDB

2021-12-21 20:07:45 0 [Note] Crash table recovery finished.

2021-12-21 20:07:45 0 [Note] InnoDB: Starting recovery for XA transactions...

2021-12-21 20:07:45 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery

2021-12-21 20:07:45 0 [Note] InnoDB: Transaction contains changes to 1 rows

2021-12-21 20:07:45 0 [Note] InnoDB: 1 transactions in prepared state after recovery

2021-12-21 20:07:45 0 [Note] Found 1 prepared transaction(s) in InnoDB

2021-12-21 20:07:45 0 [ERROR] Found 1 prepared transactions! It means that server was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start server with --tc-heuristic-recover switch to commit or rollback pending transactions.

2021-12-21 20:07:45 0 [ERROR] Aborting

After the fix, ha_recover() was still called twice(verified by adding some private log) but no error/note logs of the second call:

2021-12-21  1:00:20 0 [Note] InnoDB: 10.6.5 started; log sequence number 15158841; transaction id 126557

2021-12-21  1:00:20 0 [Note] InnoDB: Loading buffer pool(s) from /rdsdbdata/db/innodb/ib_buffer_pool

2021-12-21  1:00:20 0 [Note] Recovering after a crash using /rdsdbdata/log/binlog/mysql-bin-changelog

2021-12-21  1:00:20 0 [Note] Starting table crash recovery...

2021-12-21  1:00:20 0 [Note] InnoDB: Starting recovery for XA transactions...

2021-12-21  1:00:20 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery

2021-12-21  1:00:20 0 [Note] InnoDB: Transaction contains changes to 1 rows

2021-12-21  1:00:20 0 [Note] InnoDB: 1 transactions in prepared state after recovery

2021-12-21  1:00:20 0 [Note] Found 1 prepared transaction(s) in InnoDB

2021-12-21  1:00:20 0 [Note] Crash table recovery finished.

2021-12-21  1:00:20 0 [Note] Server socket created on IP: '0.0.0.0'.

2021-12-21  1:00:20 0 [Note] Server socket created on IP: '::'.

2021-12-21  1:00:20 0 [Note] InnoDB: Buffer pool(s) load completed at 211221  1:00:20

2021-12-21  1:00:20 0 [Note] /rdsdbbin/mysql/bin/mysqld: ready for connections.

Could you please investigate whether we should update the function or other conditions need to be considered?

Attachments

Activity

Ascending order - Click to sort in descending order

Otto Kekäläinen added a comment - 2021-12-22 17:35

Commit was authored by sujatha.sivakumar and committed by Elkin (latter now assigned to issue). Maybe Andrei can give some pointers of what the original commit tried to achieve so it would be easier to debug what the code is doing?

Otto Kekäläinen added a comment - 2021-12-22 17:35 Commit was authored by sujatha.sivakumar and committed by Elkin (latter now assigned to issue). Maybe Andrei can give some pointers of what the original commit tried to achieve so it would be easier to debug what the code is doing?

Andrei Elkin added a comment - 2021-12-23 11:24

otto, No pointers can be made at this point but I am going to look into shortly. It feels a bug. You'll see more details from commits.

Andrei Elkin added a comment - 2021-12-23 11:24 otto , No pointers can be made at this point but I am going to look into shortly. It feels a bug. You'll see more details from commits.

Andrei Elkin added a comment - 2021-12-25 19:34

wenhug, thanks for the report! It's not easy to reproduce though. I tried running your SP while kill -9 the server.
Every time there's a trx in prepared state it gets recovered.

First I am wondering if my method is really equivalent. Maybe you could give it a try to prove practically.
Secondly, could you please tell me your server build options and run time options?
Maybe building and running as you did I will have better luck.

Notice you can gain really helpful warnings into the error log with setting log_warning > 2.
See e.g

   if (count_in_prepare > 0 && global_system_variables.log_warnings > 2)

      sql_print_warning("Could not complete %u number of transactions.",

                        count_in_prepare);

Once you'll have more info in the error log I may have some ideas.

If nothing like that will help us, please consider if I could log in your server and examine the recovery
with gdb.

Thank you and Merry Xmas!

Andrei Elkin added a comment - 2021-12-25 19:34 wenhug , thanks for the report! It's not easy to reproduce though. I tried running your SP while kill -9 the server. Every time there's a trx in prepared state it gets recovered. First I am wondering if my method is really equivalent. Maybe you could give it a try to prove practically. Secondly, could you please tell me your server build options and run time options? Maybe building and running as you did I will have better luck. Notice you can gain really helpful warnings into the error log with setting log_warning > 2 . See e.g if (count_in_prepare > 0 && global_system_variables.log_warnings > 2 ) { sql_print_warning( "Could not complete %u number of transactions." , count_in_prepare); Once you'll have more info in the error log I may have some ideas. If nothing like that will help us, please consider if I could log in your server and examine the recovery with gdb. Thank you and Merry Xmas!

Andrei Elkin added a comment - 2021-12-26 12:58

wenhug, to your quick fix,

it's equivalent of executing --tc-heuristic-recover=ROLLBACK. That's why it works out.

After closer look into possible causes of non-decided xids by the 2nd ha_recover() time,
there's a

    if (round == 1)

      if (!binlog_checkpoint_found)

        break;

escape from the for-loop and after it

  if (do_xa)

    if (binlog_checkpoint_found)

#ifndef HAVE_REPLICATION

      if (ha_recover_complete(&xids))

#else

      if (ctx.complete(this, xids))

#endif

won't be attended.
Could you please examine the last binlog file on whether it contains any Binlog_checkpoint_log_event.
When it does not indeed, your next attempt with log_warnings = 3 obviously won't hit

    if (count_in_prepare > 0 && global_system_variables.log_warnings > 2)

      sql_print_warning("Could not complete %u number of transactions.",

                        count_in_prepare);

      return false; // there's later dry run ha_recover() to error out

which otherwise (Binlog_checkpoint event is in place) it must do.
Also more warnings from inside ha_recover_complete would also display then xid:s of undecided sticky transactions.

Hope this will be helpful.

Andrei Elkin added a comment - 2021-12-26 12:58 wenhug , to your quick fix, it's equivalent of executing --tc-heuristic-recover=ROLLBACK. That's why it works out. After closer look into possible causes of non-decided xids by the 2nd ha_recover() time, there's a if (round == 1 ) { if (!binlog_checkpoint_found) break ; escape from the for-loop and after it if (do_xa) { if (binlog_checkpoint_found) { #ifndef HAVE_REPLICATION if (ha_recover_complete(&xids)) # else if (ctx.complete( this , xids)) #endif won't be attended. Could you please examine the last binlog file on whether it contains any Binlog_checkpoint_log_event. When it does not indeed, your next attempt with log_warnings = 3 obviously won't hit if (count_in_prepare > 0 && global_system_variables.log_warnings > 2 ) { sql_print_warning( "Could not complete %u number of transactions." , count_in_prepare); return false ; // there's later dry run ha_recover() to error out } which otherwise (Binlog_checkpoint event is in place) it must do. Also more warnings from inside ha_recover_complete would also display then xid:s of undecided sticky transactions. Hope this will be helpful.

Hugo Wen added a comment - 2021-12-26 19:57 - edited

Elkin Thank you very much for looking into the issue!

The issue occurred when creating a snapshot of the file system. I couldn't reproduce with `kill -9` either using the same binaries. Also probably there were some other SQL commands ran before the snapshot was taken but I don't know for sure.
I'll do some more tests and see if there's a steady way to reproduce it. Or maybe can share the data file if that's allowed by our policy.

I just tried starting mysqld with --log_warnings=3 as you suggested:
(It's on another instance with the same issue, so the trx id is different to the original one)

2021-12-26 19:39:04 0 [Note] Loaded 'ha_rocksdb.so' with offset 0x1550a0e1b000

2021-12-26 19:39:04 0 [Note] RocksDB: 2 column families found

2021-12-26 19:39:04 0 [Note] RocksDB: Column Families at start:

2021-12-26 19:39:04 0 [Note]   cf=default

2021-12-26 19:39:04 0 [Note]     write_buffer_size=67108864

2021-12-26 19:39:04 0 [Note]     target_file_size_base=67108864

2021-12-26 19:39:04 0 [Note]   cf=__system__

2021-12-26 19:39:04 0 [Note]     write_buffer_size=67108864

2021-12-26 19:39:04 0 [Note]     target_file_size_base=67108864

2021-12-26 19:39:04 0 [Note] RocksDB: Table_store: loaded DDL data for 0 tables

2021-12-26 19:39:04 0 [Note] RocksDB: global statistics using get_sched_indexer_t indexer

2021-12-26 19:39:04 0 [Note] MyRocks storage engine plugin has been successfully initialized.

2021-12-26 19:39:04 0 [Note] InnoDB: Compressed tables use zlib 1.2.11

2021-12-26 19:39:04 0 [Note] InnoDB: Number of pools: 1

2021-12-26 19:39:04 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions

2021-12-26 19:39:04 0 [Note] InnoDB: Using Linux native AIO

2021-12-26 19:39:04 0 [Note] InnoDB: Initializing buffer pool, total size = 2147483648, chunk size = 134217728

2021-12-26 19:39:04 0 [Note] InnoDB: Completed initialization of buffer pool

2021-12-26 19:39:04 0 [Note] InnoDB: Transaction 126555 was in the XA prepared state.

2021-12-26 19:39:04 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo

2021-12-26 19:39:04 0 [Note] InnoDB: Trx id counter is 126556

2021-12-26 19:39:04 0 [Note] InnoDB: 128 rollback segments are active.

2021-12-26 19:39:04 0 [Note] InnoDB: Creating shared tablespace for temporary tables

2021-12-26 19:39:04 0 [Note] InnoDB: Starting in background the rollback of recovered transactions

2021-12-26 19:39:04 0 [Note] InnoDB: Rollback of non-prepared transactions completed

2021-12-26 19:39:04 0 [Note] InnoDB: Setting file '/rdsdbdata/db/innodb/ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...

2021-12-26 19:39:04 0 [Note] InnoDB: File '/rdsdbdata/db/innodb/ibtmp1' size is now 12 MB.

2021-12-26 19:39:04 0 [Note] InnoDB: 10.6.5 started; log sequence number 15159285; transaction id 126557

2021-12-26 19:39:04 0 [Note] InnoDB: Loading buffer pool(s) from /rdsdbdata/db/innodb/ib_buffer_pool

2021-12-26 19:39:04 0 [Note] Recovering after a crash using /rdsdbdata/log/binlog/mysql-bin-changelog

2021-12-26 19:39:04 0 [Note] Starting table crash recovery...

2021-12-26 19:39:04 0 [Note] InnoDB: Starting recovery for XA transactions...

2021-12-26 19:39:04 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery

2021-12-26 19:39:04 0 [Note] InnoDB: Transaction contains changes to 1 rows

2021-12-26 19:39:04 0 [Note] InnoDB: 1 transactions in prepared state after recovery

2021-12-26 19:39:04 0 [Note] Found 1 prepared transaction(s) in InnoDB

2021-12-26 19:39:04 0 [Note] Crash table recovery finished.

2021-12-26 19:39:04 0 [Warning] Found prepared transaction with xid 118560

2021-12-26 19:39:04 0 [Warning] Could not complete 1 number of transactions.

2021-12-26 19:39:04 0 [Note] InnoDB: Starting recovery for XA transactions...

2021-12-26 19:39:04 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery

2021-12-26 19:39:04 0 [Note] InnoDB: Transaction contains changes to 1 rows

2021-12-26 19:39:04 0 [Note] InnoDB: 1 transactions in prepared state after recovery

2021-12-26 19:39:04 0 [Note] Found 1 prepared transaction(s) in InnoDB

2021-12-26 19:39:04 0 [ERROR] Found 1 prepared transactions! It means that server was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start server with --tc-heuristic-recover switch to commit or rollback pending transactions.

2021-12-26 19:39:04 0 [ERROR] Aborting

And the binlog files are like this:

[19:23:09][root]$ ls -alt

total 13336

drwxr-xr-x  2 rdsdb rdsdb    4096 Dec 26 19:18 .

-rw-rw-rw-  1 rdsdb rdsdb     343 Dec 26 19:18 mysql-bin-changelog.index

-rw-rw----  1 rdsdb rdsdb   67146 Dec 20 20:25 mysql-bin-changelog.000021

-rw-rw----  1 rdsdb rdsdb 1707199 Dec 20 20:25 mysql-bin-changelog.000020

-rw-rw----  1 rdsdb rdsdb 4792114 Dec 20 20:25 mysql-bin-changelog.000019

-rw-rw-r--  1 rdsdb rdsdb 7051915 Dec 20 20:23 mysql-bin-changelog.000018

-rw-rw-r--  1 rdsdb rdsdb     717 Dec 20 20:20 mysql-bin-changelog.000017

-rw-rw-r--  1 rdsdb rdsdb     717 Dec 20 20:15 mysql-bin-changelog.000016

-rw-rw-r--  1 rdsdb rdsdb     717 Dec 20 20:10 mysql-bin-changelog.000015

drwxr-xr-x 10 rdsdb rdsdb    4096 Dec 20 19:04 ..

[19:23:11][root]$

[19:23:12][root]$ /rdsdbbin/mysql/bin/mysqlbinlog mysql-bin-changelog.000021 | grep checkpoint

#211220 20:25:29 server id 1604102129  end_log_pos 352 CRC32 0x1c7cf816 	Binlog checkpoint mysql-bin-changelog.000020

#211220 20:25:29 server id 1604102129  end_log_pos 627 CRC32 0x6509ef3b 	Binlog checkpoint mysql-bin-changelog.000021

[19:23:15][root]$

I'll be OOO until January 3rd, later I can do more tests and private builds if needed. Happy holidays and happy new year!

Hugo Wen added a comment - 2021-12-26 19:57 - edited Elkin Thank you very much for looking into the issue! The issue occurred when creating a snapshot of the file system. I couldn't reproduce with `kill -9` either using the same binaries. Also probably there were some other SQL commands ran before the snapshot was taken but I don't know for sure. I'll do some more tests and see if there's a steady way to reproduce it. Or maybe can share the data file if that's allowed by our policy. I just tried starting mysqld with --log_warnings=3 as you suggested: (It's on another instance with the same issue, so the trx id is different to the original one) 2021 - 12 - 26 19 : 39 : 04 0 [Note] Loaded 'ha_rocksdb.so' with offset 0x1550a0e1b000 2021 - 12 - 26 19 : 39 : 04 0 [Note] RocksDB: 2 column families found 2021 - 12 - 26 19 : 39 : 04 0 [Note] RocksDB: Column Families at start: 2021 - 12 - 26 19 : 39 : 04 0 [Note] cf= default 2021 - 12 - 26 19 : 39 : 04 0 [Note] write_buffer_size= 67108864 2021 - 12 - 26 19 : 39 : 04 0 [Note] target_file_size_base= 67108864 2021 - 12 - 26 19 : 39 : 04 0 [Note] cf=__system__ 2021 - 12 - 26 19 : 39 : 04 0 [Note] write_buffer_size= 67108864 2021 - 12 - 26 19 : 39 : 04 0 [Note] target_file_size_base= 67108864 2021 - 12 - 26 19 : 39 : 04 0 [Note] RocksDB: Table_store: loaded DDL data for 0 tables 2021 - 12 - 26 19 : 39 : 04 0 [Note] RocksDB: global statistics using get_sched_indexer_t indexer 2021 - 12 - 26 19 : 39 : 04 0 [Note] MyRocks storage engine plugin has been successfully initialized. 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: Compressed tables use zlib 1.2 . 11 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: Number of pools: 1 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: Using Linux native AIO 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: Initializing buffer pool, total size = 2147483648 , chunk size = 134217728 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: Completed initialization of buffer pool 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: Transaction 126555 was in the XA prepared state. 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: Trx id counter is 126556 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: 128 rollback segments are active. 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: Creating shared tablespace for temporary tables 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: Starting in background the rollback of recovered transactions 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: Rollback of non-prepared transactions completed 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: Setting file '/rdsdbdata/db/innodb/ibtmp1' size to 12 MB. Physically writing the file full; Please wait ... 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: File '/rdsdbdata/db/innodb/ibtmp1' size is now 12 MB. 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: 10.6 . 5 started; log sequence number 15159285 ; transaction id 126557 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: Loading buffer pool(s) from /rdsdbdata/db/innodb/ib_buffer_pool 2021 - 12 - 26 19 : 39 : 04 0 [Note] Recovering after a crash using /rdsdbdata/log/binlog/mysql-bin-changelog 2021 - 12 - 26 19 : 39 : 04 0 [Note] Starting table crash recovery... 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: Starting recovery for XA transactions... 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: Transaction contains changes to 1 rows 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: 1 transactions in prepared state after recovery 2021 - 12 - 26 19 : 39 : 04 0 [Note] Found 1 prepared transaction(s) in InnoDB 2021 - 12 - 26 19 : 39 : 04 0 [Note] Crash table recovery finished. 2021 - 12 - 26 19 : 39 : 04 0 [Warning] Found prepared transaction with xid 118560 2021 - 12 - 26 19 : 39 : 04 0 [Warning] Could not complete 1 number of transactions. 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: Starting recovery for XA transactions... 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: Transaction contains changes to 1 rows 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: 1 transactions in prepared state after recovery 2021 - 12 - 26 19 : 39 : 04 0 [Note] Found 1 prepared transaction(s) in InnoDB 2021 - 12 - 26 19 : 39 : 04 0 [ERROR] Found 1 prepared transactions! It means that server was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start server with --tc-heuristic-recover switch to commit or rollback pending transactions. 2021 - 12 - 26 19 : 39 : 04 0 [ERROR] Aborting And the binlog files are like this: [ 19 : 23 : 09 ][root]$ ls -alt total 13336 drwxr-xr-x 2 rdsdb rdsdb 4096 Dec 26 19 : 18 . -rw-rw-rw- 1 rdsdb rdsdb 343 Dec 26 19 : 18 mysql-bin-changelog.index -rw-rw---- 1 rdsdb rdsdb 67146 Dec 20 20 : 25 mysql-bin-changelog. 000021 -rw-rw---- 1 rdsdb rdsdb 1707199 Dec 20 20 : 25 mysql-bin-changelog. 000020 -rw-rw---- 1 rdsdb rdsdb 4792114 Dec 20 20 : 25 mysql-bin-changelog. 000019 -rw-rw-r-- 1 rdsdb rdsdb 7051915 Dec 20 20 : 23 mysql-bin-changelog. 000018 -rw-rw-r-- 1 rdsdb rdsdb 717 Dec 20 20 : 20 mysql-bin-changelog. 000017 -rw-rw-r-- 1 rdsdb rdsdb 717 Dec 20 20 : 15 mysql-bin-changelog. 000016 -rw-rw-r-- 1 rdsdb rdsdb 717 Dec 20 20 : 10 mysql-bin-changelog. 000015 drwxr-xr-x 10 rdsdb rdsdb 4096 Dec 20 19 : 04 .. [ 19 : 23 : 11 ][root]$ [ 19 : 23 : 12 ][root]$ /rdsdbbin/mysql/bin/mysqlbinlog mysql-bin-changelog. 000021 | grep checkpoint # 211220 20 : 25 : 29 server id 1604102129 end_log_pos 352 CRC32 0x1c7cf816 Binlog checkpoint mysql-bin-changelog. 000020 # 211220 20 : 25 : 29 server id 1604102129 end_log_pos 627 CRC32 0x6509ef3b Binlog checkpoint mysql-bin-changelog. 000021 [ 19 : 23 : 15 ][root]$ I'll be OOO until January 3rd, later I can do more tests and private builds if needed. Happy holidays and happy new year!

Andrei Elkin added a comment - 2021-12-27 11:11

wenhug,

So it looks very much that the block of static void xarecover_do_commit_or_rollback()

  rc= xarecover_decide_to_commit(member, ptr_commit_max) ?

    hton->commit_by_xid(hton, &x) : hton->rollback_by_xid(hton, &x);

/*

    It's fine to have non-zero rc which would be from transaction

    non-participant hton:s.

*/

  DBUG_ASSERT(rc || member->in_engine_prepare > 0);

computes rc >0. It could be XAER_NOTA = -4 or XAER_RMFAIL = -7.
Also consider the latter code is returned "normally" at times:

innobase_commit_by_xid(

/*===================*/

	handlerton*	hton,

	XID*		xid)	/*!< in: X/Open XA transaction identification */

...

	if (high_level_read_only) {

		return(XAER_RMFAIL);

Enjoy your yearend, bloody well deserved & Happy New one ahead!

Andrei

Andrei Elkin added a comment - 2021-12-27 11:11 wenhug , So it looks very much that the block of static void xarecover_do_commit_or_rollback() { rc= xarecover_decide_to_commit(member, ptr_commit_max) ? hton->commit_by_xid(hton, &x) : hton->rollback_by_xid(hton, &x); /* It's fine to have non-zero rc which would be from transaction non-participant hton:s. */ DBUG_ASSERT(rc || member->in_engine_prepare > 0 ); computes rc >0 . It could be XAER_NOTA = -4 or XAER_RMFAIL = -7 . Also consider the latter code is returned "normally" at times: innobase_commit_by_xid( /*===================*/ handlerton* hton, XID* xid) /*!< in: X/Open XA transaction identification */ { ... if (high_level_read_only) { return (XAER_RMFAIL); } Enjoy your yearend, bloody well deserved & Happy New one ahead! Andrei

Hugo Wen added a comment - 2022-01-08 01:34 - edited

Elkin,

I tried several ways to reproduce the issue but seems it's not easy to reproduce with normal pkill commands. Probably requires a file system snapshot or something, I'm still testing.

However I created a tar file of the crashed data directory and I just sent to you in a separate email. Would you be able to help debugging with those data files?

Thanks,
Hugo

Hugo Wen added a comment - 2022-01-08 01:34 - edited Elkin , I tried several ways to reproduce the issue but seems it's not easy to reproduce with normal pkill commands. Probably requires a file system snapshot or something, I'm still testing. However I created a tar file of the crashed data directory and I just sent to you in a separate email. Would you be able to help debugging with those data files? Thanks, Hugo

Hugo Wen added a comment - 2022-01-12 07:17

After debugging line by line using GDB, root cause was found.

When doing the recovery, the server generates a list of XIDs to be committed or rolled-back.
Then the callback function of innodb will be called to handle the commit or rollback.

In this case, the innodb transaction needs to be rolled-back. However it failed when searching the XID from trx_sys in function trx_get_trx_by_xid_callback . To be specific, failed when comparing xid data in line-2014. Code

Failed reason is server id doesn't match. Server ID is saved in xid_t.data[]. The format of xid_t.data[] is:

MySQLXid<Server ID 8 bytes><XID 8 bytes>

Paste the GDB output here:

arg->xid is the input of the function to be searched from the hash.
element->trx->xid is the one exists in the hash which expected to match

Apparently gtrid_length, bqual_length, and transaction id all match but server ids are different.

((ulonglong)(arg->xid->data+8)) = 1049637262 which is the server id of the restored database.
((ulonglong)(element->trx->xid->data+8)) = 1604102129 which is the server id of the original database that the snapshot taken from.

(gdb) p element->id

$2 = 126555

(gdb) p element->trx

$3 = (trx_t *) 0x7fffebcca288

(gdb) p element->trx->is_recovered

$4 = true

(gdb) p element->trx->state

$5 = {m = {static _S_min_alignment = 4, static _S_alignment = 4, _M_i = TRX_STATE_PREPARED}}

(gdb) p arg->xid

$6 = (const XID *) 0x7fffffffad90

(gdb) p element->trx->xid

$7 = (XID *) 0x7ffff68b9258

(gdb) p arg->xid->gtrid_length

$8 = 24

(gdb) p element->trx->xid->gtrid_length

$9 = 24

(gdb) p arg->xid->bqual_length

$10 = 0

(gdb) p element->trx->xid->bqual_length

$11 = 0

(gdb) p arg->xid->data

$12 = "MySQLXid\216\061\220>\000\000\000\000 \317\001", '\000' <repeats 22 times>, "\002\000\000\000\000\000\000\000\277;W\031\000\000\000`\256\377\377\377\177\000\000J\006\001\000\000\000\000\000\023", '\000' <repeats 15 times>, "й\377\377\377\177\000\000\266\071\000\000\000\000\000\000Ю\377\377\377\177\000\000\337\361wVUU\000\000\230\000\000\000\000\000\000"

(gdb) p element->trx->xid->data

$13 = "MySQLXid\361\247\234_\000\000\000\000 \317\001", '\000' <repeats 108 times>

(gdb) p *((ulonglong*)(arg->xid->data+16))

$14 = 118560

(gdb) p *((ulonglong*)(element->trx->xid->data+16))

$15 = 118560

(gdb) p *((ulonglong*)(arg->xid->data+8))

$16 = 1049637262

(gdb) p *((ulonglong*)(element->trx->xid->data+8))

$17 = 1604102129

Why the server id doesn't match ?

After this commit, the input xid creation becomes using the set function:

  void set(ulonglong xid)

    my_xid tmp;

    formatID= 1;

    set(MYSQL_XID_PREFIX_LEN, 0, MYSQL_XID_PREFIX);

    memcpy(data+MYSQL_XID_PREFIX_LEN, &server_id, sizeof(server_id));

    tmp= xid;

    memcpy(data+MYSQL_XID_OFFSET, &tmp, sizeof(tmp));

    gtrid_length=MYSQL_XID_GTRID_LEN;

This set function doesn't copy the original server id, instead it copy the new server_id of the new database.

memcpy(data+MYSQL_XID_PREFIX_LEN, &server_id, sizeof(server_id))

Therefore the xid can't be find in trx_sys because the server id doesn't match any more.

Why there's no such issue in 10.5?

In 10.5, the input xid was created directly from original xid : code xid= *trx->xid;

      XID& xid= arg->xid_list[arg->count];

      if (arg->count++ < arg->len)

        trx->state= TRX_STATE_PREPARED_RECOVERED;

...

        xid= *trx->xid;

So the server match keeps the same with the old one.

To make it short, the issue could happen when restoring a new instance from original crashed data directory USING NEW SERVER ID. The issue doesn't exist in previous major versions before 10.6.

How to fix?

Seems a possible fix is to update the logic of creating the input XID object:

Add original server id in struct xid_recovery_member.
Then update the xarecover_do_commit_or_rollback function to set the original server id properly.

Proposed fix:

I've tested following changes fixed the issue. with the fix db recovered successfully.
While I'm still running the full mtr tests against the clean upstream commits to make sure it doesn't break anything, I paste it here as a raw diff file

If there's no breaking mtr tests and no obvious issues I will create PR to fix it.
Also may worth to add a MTR test, but it's late today I'll work on it tomorrow.

Elkin, could you please help review if the root cause and fix makes sense?

[wenhug@dev-dsk-wenhug-2b-f388f48d RDSMariaDB]$ cat brazil-build/patches/mariadb-1335.diff

diff --git a/sql/handler.cc b/sql/handler.cc

index e4543bde5de..5cad422f45f 100644

--- a/sql/handler.cc

+++ b/sql/handler.cc

@@ -2414,7 +2414,7 @@ struct xarecover_st

*/

 static xid_recovery_member*

 xid_member_insert(HASH *hash_arg, my_xid xid_arg, MEM_ROOT *ptr_mem_root,

-                  XID *full_xid_arg)

+                  XID *full_xid_arg, ulong server_id_arg)

   xid_recovery_member *member= (xid_recovery_member *)

     alloc_root(ptr_mem_root, sizeof(xid_recovery_member));

@@ -2428,7 +2428,7 @@ xid_member_insert(HASH *hash_arg, my_xid xid_arg, MEM_ROOT *ptr_mem_root,

   if (full_xid_arg)

     *xid_full= *full_xid_arg;

-  *member= xid_recovery_member(xid_arg, 1, false, xid_full);

+  *member= xid_recovery_member(xid_arg, 1, false, xid_full, server_id_arg);

   return

     my_hash_insert(hash_arg, (uchar*) member) ? NULL : member;

@@ -2443,14 +2443,15 @@ xid_member_insert(HASH *hash_arg, my_xid xid_arg, MEM_ROOT *ptr_mem_root,

*/

 static bool xid_member_replace(HASH *hash_arg, my_xid xid_arg,

                                MEM_ROOT *ptr_mem_root,

-                               XID *full_xid_arg)

+                               XID *full_xid_arg,

+                               ulong server_id_arg)

   xid_recovery_member* member;

   if ((member= (xid_recovery_member *)

        my_hash_search(hash_arg, (uchar *)& xid_arg, sizeof(xid_arg))))

     member->in_engine_prepare++;

   else

-    member= xid_member_insert(hash_arg, xid_arg, ptr_mem_root, full_xid_arg);

+    member= xid_member_insert(hash_arg, xid_arg, ptr_mem_root, full_xid_arg,  server_id_arg);

   return member == NULL;

@@ -2502,7 +2503,7 @@ static void xarecover_do_commit_or_rollback(handlerton *hton,

   Binlog_offset *ptr_commit_max= arg->binlog_coord;

   if (!member->full_xid)

-    x.set(member->xid);

+    x.set(member->xid, member->server_id);

   else

     x= *member->full_xid;

@@ -2660,7 +2661,8 @@ static my_bool xarecover_handlerton(THD *unused, plugin_ref plugin,

           // remember "full" xid too when it's not in mysql format

           if (xid_member_replace(info->commit_list, x, info->mem_root,

-                                 is_server_xid? NULL : &info->list[i]))

+                                 is_server_xid? NULL : &info->list[i],

+                                 is_server_xid? info->list[i].get_trx_server_id(): server_id))

             info->error= true;

             sql_print_error("Error in memory allocation at xarecover_handlerton");

diff --git a/sql/handler.h b/sql/handler.h

index 0cdaf1dd698..73014093705 100644

--- a/sql/handler.h

+++ b/sql/handler.h

@@ -895,6 +895,18 @@ struct xid_t {

     memcpy(data+MYSQL_XID_OFFSET, &tmp, sizeof(tmp));

     gtrid_length=MYSQL_XID_GTRID_LEN;

+  void set(ulonglong xid, ulong server_id)

+  {

+    my_xid tmp;

+    formatID= 1;

+    set(MYSQL_XID_PREFIX_LEN, 0, MYSQL_XID_PREFIX);

+    memcpy(data+MYSQL_XID_PREFIX_LEN, &server_id, sizeof(server_id));

+    tmp= xid;

+    memcpy(data+MYSQL_XID_OFFSET, &tmp, sizeof(tmp));

+    gtrid_length=MYSQL_XID_GTRID_LEN;

+  }

   void set(long g, long b, const char *d)

     formatID= 1;

@@ -916,6 +928,12 @@ struct xid_t {

            !memcmp(data, MYSQL_XID_PREFIX, MYSQL_XID_PREFIX_LEN) ?

            quick_get_my_xid() : 0;

+  ulong get_trx_server_id()

+  {

+    ulong trx_server_id;

+    memcpy(&trx_server_id, data+MYSQL_XID_PREFIX_LEN, sizeof(server_id));

+    return trx_server_id;

+  }

   uint length()

     return static_cast<uint>(sizeof(formatID)) + key_length();

@@ -957,11 +975,12 @@ struct xid_recovery_member

   bool decided_to_commit;

   Binlog_offset binlog_coord; // semisync recovery binlog offset

   XID *full_xid;           // needed by wsrep or past it recovery

+  ulong server_id;         // server id of original server

   xid_recovery_member(my_xid xid_arg, uint prepare_arg, bool decided_arg,

-                      XID *full_xid_arg)

+                      XID *full_xid_arg, ulong server_id_arg)

     : xid(xid_arg), in_engine_prepare(prepare_arg),

-      decided_to_commit(decided_arg), full_xid(full_xid_arg) {};

+      decided_to_commit(decided_arg), full_xid(full_xid_arg) , server_id(server_id_arg) {};

};

 /* for recover() handlerton call */

Hugo Wen added a comment - 2022-01-12 07:17 After debugging line by line using GDB, root cause was found. When doing the recovery, the server generates a list of XIDs to be committed or rolled-back. Then the callback function of innodb will be called to handle the commit or rollback. In this case, the innodb transaction needs to be rolled-back. However it failed when searching the XID from trx_sys in function trx_get_trx_by_xid_callback . To be specific, failed when comparing xid data in line-2014. Code Failed reason is server id doesn't match . Server ID is saved in xid_t.data[] . The format of xid_t.data[] is: MySQLXid<Server ID 8 bytes><XID 8 bytes> Paste the GDB output here: arg->xid is the input of the function to be searched from the hash. element->trx->xid is the one exists in the hash which expected to match Apparently gtrid_length , bqual_length , and transaction id all match but server ids are different. ((ulonglong )(arg->xid->data+8)) = 1049637262 which is the server id of the restored database. ((ulonglong )(element->trx->xid->data+8)) = 1604102129 which is the server id of the original database that the snapshot taken from. (gdb) p element->id $ 2 = 126555 (gdb) p element->trx $ 3 = (trx_t *) 0x7fffebcca288 (gdb) p element->trx->is_recovered $ 4 = true (gdb) p element->trx->state $ 5 = {m = { static _S_min_alignment = 4 , static _S_alignment = 4 , _M_i = TRX_STATE_PREPARED}} (gdb) p arg->xid $ 6 = ( const XID *) 0x7fffffffad90 (gdb) p element->trx->xid $ 7 = (XID *) 0x7ffff68b9258 (gdb) p arg->xid->gtrid_length $ 8 = 24 (gdb) p element->trx->xid->gtrid_length $ 9 = 24 (gdb) p arg->xid->bqual_length $ 10 = 0 (gdb) p element->trx->xid->bqual_length $ 11 = 0 (gdb) p arg->xid->data $ 12 = "MySQLXid\216\061\220>\000\000\000\000 \317\001" , '\000' <repeats 22 times>, "\002\000\000\000\000\000\000\000\277;W\031\000\000\000`\256\377\377\377\177\000\000J\006\001\000\000\000\000\000\023" , '\000' <repeats 15 times>, "й\377\377\377\177\000\000\266\071\000\000\000\000\000\000Ю\377\377\377\177\000\000\337\361wVUU\000\000\230\000\000\000\000\000\000" (gdb) p element->trx->xid->data $ 13 = "MySQLXid\361\247\234_\000\000\000\000 \317\001" , '\000' <repeats 108 times> (gdb) p *((ulonglong*)(arg->xid->data+ 16 )) $ 14 = 118560 (gdb) p *((ulonglong*)(element->trx->xid->data+ 16 )) $ 15 = 118560 (gdb) p *((ulonglong*)(arg->xid->data+ 8 )) $ 16 = 1049637262 (gdb) p *((ulonglong*)(element->trx->xid->data+ 8 )) $ 17 = 1604102129 Why the server id doesn't match ? After this commit , the input xid creation becomes using the set function: void set(ulonglong xid) { my_xid tmp; formatID= 1 ; set(MYSQL_XID_PREFIX_LEN, 0 , MYSQL_XID_PREFIX); memcpy(data+MYSQL_XID_PREFIX_LEN, &server_id, sizeof(server_id)); tmp= xid; memcpy(data+MYSQL_XID_OFFSET, &tmp, sizeof(tmp)); gtrid_length=MYSQL_XID_GTRID_LEN; } This set function doesn't copy the original server id, instead it copy the new server_id of the new database. memcpy(data+MYSQL_XID_PREFIX_LEN, &server_id, sizeof(server_id)) Therefore the xid can't be find in trx_sys because the server id doesn't match any more. Why there's no such issue in 10.5? In 10.5, the input xid was created directly from original xid : code xid= *trx->xid; XID& xid= arg->xid_list[arg->count]; if (arg->count++ < arg->len) { trx->state= TRX_STATE_PREPARED_RECOVERED; ... xid= *trx->xid; } So the server match keeps the same with the old one. To make it short, the issue could happen when restoring a new instance from original crashed data directory USING NEW SERVER ID . The issue doesn't exist in previous major versions before 10.6. How to fix? Seems a possible fix is to update the logic of creating the input XID object: Add original server id in struct xid_recovery_member . Then update the xarecover_do_commit_or_rollback function to set the original server id properly. Proposed fix: I've tested following changes fixed the issue. with the fix db recovered successfully. While I'm still running the full mtr tests against the clean upstream commits to make sure it doesn't break anything, I paste it here as a raw diff file If there's no breaking mtr tests and no obvious issues I will create PR to fix it. Also may worth to add a MTR test, but it's late today I'll work on it tomorrow. Elkin , could you please help review if the root cause and fix makes sense? [wenhug @dev -dsk-wenhug-2b-f388f48d RDSMariaDB]$ cat brazil-build/patches/mariadb- 1335 .diff diff --git a/sql/handler.cc b/sql/handler.cc index e4543bde5de..5cad422f45f 100644 --- a/sql/handler.cc +++ b/sql/handler.cc @@ - 2414 , 7 + 2414 , 7 @@ struct xarecover_st */ static xid_recovery_member* xid_member_insert(HASH *hash_arg, my_xid xid_arg, MEM_ROOT *ptr_mem_root, - XID *full_xid_arg) + XID *full_xid_arg, ulong server_id_arg) { xid_recovery_member *member= (xid_recovery_member *) alloc_root(ptr_mem_root, sizeof(xid_recovery_member)); @@ - 2428 , 7 + 2428 , 7 @@ xid_member_insert(HASH *hash_arg, my_xid xid_arg, MEM_ROOT *ptr_mem_root, if (full_xid_arg) *xid_full= *full_xid_arg; - *member= xid_recovery_member(xid_arg, 1 , false , xid_full); + *member= xid_recovery_member(xid_arg, 1 , false , xid_full, server_id_arg); return my_hash_insert(hash_arg, (uchar*) member) ? NULL : member; @@ - 2443 , 14 + 2443 , 15 @@ xid_member_insert(HASH *hash_arg, my_xid xid_arg, MEM_ROOT *ptr_mem_root, */ static bool xid_member_replace(HASH *hash_arg, my_xid xid_arg, MEM_ROOT *ptr_mem_root, - XID *full_xid_arg) + XID *full_xid_arg, + ulong server_id_arg) { xid_recovery_member* member; if ((member= (xid_recovery_member *) my_hash_search(hash_arg, (uchar *)& xid_arg, sizeof(xid_arg)))) member->in_engine_prepare++; else - member= xid_member_insert(hash_arg, xid_arg, ptr_mem_root, full_xid_arg); + member= xid_member_insert(hash_arg, xid_arg, ptr_mem_root, full_xid_arg, server_id_arg); return member == NULL; } @@ - 2502 , 7 + 2503 , 7 @@ static void xarecover_do_commit_or_rollback(handlerton *hton, Binlog_offset *ptr_commit_max= arg->binlog_coord; if (!member->full_xid) - x.set(member->xid); + x.set(member->xid, member->server_id); else x= *member->full_xid; @@ - 2660 , 7 + 2661 , 8 @@ static my_bool xarecover_handlerton(THD *unused, plugin_ref plugin, { // remember "full" xid too when it's not in mysql format if (xid_member_replace(info->commit_list, x, info->mem_root, - is_server_xid? NULL : &info->list[i])) + is_server_xid? NULL : &info->list[i], + is_server_xid? info->list[i].get_trx_server_id(): server_id)) { info->error= true ; sql_print_error( "Error in memory allocation at xarecover_handlerton" ); diff --git a/sql/handler.h b/sql/handler.h index 0cdaf1dd698.. 73014093705 100644 --- a/sql/handler.h +++ b/sql/handler.h @@ - 895 , 6 + 895 , 18 @@ struct xid_t { memcpy(data+MYSQL_XID_OFFSET, &tmp, sizeof(tmp)); gtrid_length=MYSQL_XID_GTRID_LEN; } + + void set(ulonglong xid, ulong server_id) + { + my_xid tmp; + formatID= 1 ; + set(MYSQL_XID_PREFIX_LEN, 0 , MYSQL_XID_PREFIX); + memcpy(data+MYSQL_XID_PREFIX_LEN, &server_id, sizeof(server_id)); + tmp= xid; + memcpy(data+MYSQL_XID_OFFSET, &tmp, sizeof(tmp)); + gtrid_length=MYSQL_XID_GTRID_LEN; + } + void set( long g, long b, const char *d) { formatID= 1 ; @@ - 916 , 6 + 928 , 12 @@ struct xid_t { !memcmp(data, MYSQL_XID_PREFIX, MYSQL_XID_PREFIX_LEN) ? quick_get_my_xid() : 0 ; } + ulong get_trx_server_id() + { + ulong trx_server_id; + memcpy(&trx_server_id, data+MYSQL_XID_PREFIX_LEN, sizeof(server_id)); + return trx_server_id; + } uint length() { return static_cast<uint>(sizeof(formatID)) + key_length(); @@ - 957 , 11 + 975 , 12 @@ struct xid_recovery_member bool decided_to_commit; Binlog_offset binlog_coord; // semisync recovery binlog offset XID *full_xid; // needed by wsrep or past it recovery + ulong server_id; // server id of original server xid_recovery_member(my_xid xid_arg, uint prepare_arg, bool decided_arg, - XID *full_xid_arg) + XID *full_xid_arg, ulong server_id_arg) : xid(xid_arg), in_engine_prepare(prepare_arg), - decided_to_commit(decided_arg), full_xid(full_xid_arg) {}; + decided_to_commit(decided_arg), full_xid(full_xid_arg) , server_id(server_id_arg) {}; }; /* for recover() handlerton call */

Andrei Elkin added a comment - 2022-01-12 15:28

wenhug, thanks for digging it out!

Indeed, 6c39eaeb1 made recovery dependent on server_id.
Your diff makes sense, so feel free to go with a PR.
To a lower level, having only browsed it, I am not sure if a new full-blown SET method is necessary (while we only need server_id to overwrite)...

Cheers,

Andrei.

Andrei Elkin added a comment - 2022-01-12 15:28 wenhug , thanks for digging it out! Indeed, 6c39eaeb1 made recovery dependent on server_id. Your diff makes sense, so feel free to go with a PR. To a lower level, having only browsed it, I am not sure if a new full-blown SET method is necessary (while we only need server_id to overwrite)... Cheers, Andrei.

Hugo Wen added a comment - 2022-01-12 21:48

Elkin, I created a PR for the issue https://github.com/MariaDB/server/pull/1983 . Could you help review it ? Thanks a lot!

Hugo Wen added a comment - 2022-01-12 21:48 Elkin , I created a PR for the issue https://github.com/MariaDB/server/pull/1983 . Could you help review it ? Thanks a lot!

Andrei Elkin added a comment - 2022-03-15 17:27

commit dafc5fb9c17637c6233e8a73c890f158d645a37c
Author: Hugo Wen <wenhug@amazon.com>
Date: Fri Feb 4 03:56:08 2022 +0000

is pushed into 10.6.

Andrei Elkin added a comment - 2022-03-15 17:27 commit dafc5fb9c17637c6233e8a73c890f158d645a37c Author: Hugo Wen <wenhug@amazon.com> Date: Fri Feb 4 03:56:08 2022 +0000 is pushed into 10.6.

People

Assignee:: Andrei Elkin

Reporter:: Hugo Wen

Votes:: 2 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 2021-12-22 00:44

Updated:: 2025-03-08 10:42

Resolved:: 2022-03-15 17:26

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server