Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27342

MariaDB 10.6 failed to do crash recovery, potential regression in commit 6c39eaeb

Details

    Description

      Issue description:

      MariaDB is designed to be crash-safe and recent versions of MariaDB (tested with 10.5.12) are perfectly capable of starting and recovering if the /var/lib/mysql data directory is based on a filesystem level snapshot and there is a transaction in progress. However in 10.6.5 this is no longer the case and mariadbd fails to start and recover when the data directory is from a filesystem snapshot. Based on my investigation is seems to be a regression introduced in 6c39eaeb1.

      MariaDB 10.6.5 failed to recover the database from a data directory snapshot which was taken while following transaction is running:

      create database testdb;
      use testdb;
      CREATE TABLE `t1` (`col1` int(11) NOT NULL,`col2` int(11) DEFAULT NULL,PRIMARY KEY (`col1`)) ENGINE=InnoDB DEFAULT CHARSET=latin1;
      DELIMITER $$
      CREATE PROCEDURE generateredo(count int)
      BEGIN
      DECLARE counter INT DEFAULT 1;
      WHILE counter <= count DO
      insert into t1 values(counter,counter);
      update t1 set col2=counter+1 where col1=counter;
      delete from t1 where col1=counter;
      SET counter = counter + 1;
      END WHILE;
      END$$
      DELIMITER ;
       
      call testdb.generateredo(10000000);
      

      The mysql error log when recovering:

      2021-12-20 18:55:50 0 [Note] RocksDB: 2 column families found
      2021-12-20 18:55:50 0 [Note] RocksDB: Column Families at start:
      2021-12-20 18:55:50 0 [Note]   cf=default
      2021-12-20 18:55:50 0 [Note]     write_buffer_size=67108864
      2021-12-20 18:55:50 0 [Note]     target_file_size_base=67108864
      2021-12-20 18:55:50 0 [Note]   cf=__system__
      2021-12-20 18:55:50 0 [Note]     write_buffer_size=67108864
      2021-12-20 18:55:50 0 [Note]     target_file_size_base=67108864
      2021-12-20 18:55:50 0 [Note] RocksDB: Table_store: loaded DDL data for 0 tables
      2021-12-20 18:55:50 0 [Note] RocksDB: global statistics using get_sched_indexer_t indexer
      2021-12-20 18:55:50 0 [Note] MyRocks storage engine plugin has been successfully initialized.
      2021-12-20 18:55:50 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
      2021-12-20 18:55:50 0 [Note] InnoDB: Number of pools: 1
      2021-12-20 18:55:50 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
      2021-12-20 18:55:50 0 [Note] InnoDB: Using Linux native AIO
      2021-12-20 18:55:50 0 [Note] InnoDB: Initializing buffer pool, total size = 49392123904, chunk size = 134217728
      2021-12-20 18:55:50 0 [Note] InnoDB: Completed initialization of buffer pool
      2021-12-20 18:55:50 0 [Note] InnoDB: Transaction 210265 was in the XA prepared state.
      2021-12-20 18:55:50 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo
      2021-12-20 18:55:50 0 [Note] InnoDB: Trx id counter is 210266
      2021-12-20 18:55:50 0 [Note] InnoDB: 128 rollback segments are active.
      2021-12-20 18:55:50 0 [Note] InnoDB: Creating shared tablespace for temporary tables
      2021-12-20 18:55:50 0 [Note] InnoDB: Setting file '/rdsdbdata/db/innodb/ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
      2021-12-20 18:55:50 0 [Note] InnoDB: File '/rdsdbdata/db/innodb/ibtmp1' size is now 12 MB.
      2021-12-20 18:55:50 0 [Note] InnoDB: Starting in background the rollback of recovered transactions
      2021-12-20 18:55:50 0 [Note] InnoDB: Rollback of non-prepared transactions completed
      2021-12-20 18:55:50 0 [Note] InnoDB: 10.6.5 started; log sequence number 25054305; transaction id 210267
      2021-12-20 18:55:50 0 [Note] InnoDB: Loading buffer pool(s) from /rdsdbdata/db/innodb/ib_buffer_pool
      2021-12-20 18:55:50 0 [Note] Recovering after a crash using /rdsdbdata/log/binlog/mysql-bin-changelog
      2021-12-20 18:55:50 0 [Note] Starting table crash recovery...
      2021-12-20 18:55:50 0 [Note] InnoDB: Starting recovery for XA transactions...
      2021-12-20 18:55:50 0 [Note] InnoDB: Transaction 210265 in prepared state after recovery
      2021-12-20 18:55:50 0 [Note] InnoDB: Transaction contains changes to 1 rows
      2021-12-20 18:55:50 0 [Note] InnoDB: 1 transactions in prepared state after recovery
      2021-12-20 18:55:50 0 [Note] Found 1 prepared transaction(s) in InnoDB
      2021-12-20 18:55:50 0 [Note] Crash table recovery finished.
      2021-12-20 18:55:50 0 [Note] InnoDB: Starting recovery for XA transactions...
      2021-12-20 18:55:50 0 [Note] InnoDB: Transaction 210265 in prepared state after recovery
      2021-12-20 18:55:50 0 [Note] InnoDB: Transaction contains changes to 1 rows
      2021-12-20 18:55:50 0 [Note] InnoDB: 1 transactions in prepared state after recovery
      2021-12-20 18:55:50 0 [Note] Found 1 prepared transaction(s) in InnoDB
      2021-12-20 18:55:50 0 [ERROR] Found 1 prepared transactions! It means that server was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start server with --tc-heuristic-recover switch to commit or rollback pending transactions.
      2021-12-20 18:55:50 0 [ERROR] Aborting
      

      The issue is regarded as a regression because the same data files are able to recover with MariaDB 10.5.12 binaries.

      Some investigation:

      Based on the implementation of function init_server_components(), during the recovery, ha_recover() function is called twice

      • Stack trace of the first call: https://github.com/MariaDB/server/blob/10.6/sql/mysqld.cc#L5292

        sql/handler.cc:2706(ha_recover(st_hash*, st_mem_root*))[0x5599477cf057]
        sql/log.cc:11169(MYSQL_BIN_LOG::recover(st_log_info*, char const*, st_io_cache*, Format_description_log_event*, bool))[0x5599478a80dd]
        sql/log.cc:11502(MYSQL_BIN_LOG::do_binlog_recovery(char const*, bool))[0x5599478a8931]
        sql/log.cc:10151(MYSQL_BIN_LOG::open(char const*))[0x5599478aceda]
        sql/mysqld.cc:5289(init_server_components())[0x5599475098ce]
        sql/mysqld.cc:5655(mysqld_main(int, char**))[0x55994750ed55]
        ??:0(__libc_start_main)[0x14f07bff20ba]
        

      • Stack trace of the second call: https://github.com/MariaDB/server/blob/10.6/sql/mysqld.cc#L5298

        sql/handler.cc:2710(ha_recover(st_hash*, st_mem_root*))[0x55e57611247b]
        sql/mysqld.cc:5295(init_server_components())[0x55e575e4c8df]
        sql/mysqld.cc:5655(mysqld_main(int, char**))[0x55e575e51d55]
        ??:0(__libc_start_main)[0x1542b650e0ba]
        ??:0(_start)[0x55e575e4764a]
        

      However, in the commit the ha_recover() function call was moved from the end of function TC_LOG_BINLOG::recover to the beginning.

      Tried a quick fix

      I tried a simple fix of moving the ha_recover() back to the end of recover() then the issue was fixed:

      diff --git a/sql/log.cc b/sql/log.cc
      index 41eed80ef66..ddda2e5d330 100644
      --- a/sql/log.cc
      +++ b/sql/log.cc
      @@ -11166,8 +11166,8 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name,
         fdle->flags&= ~LOG_EVENT_BINLOG_IN_USE_F; // abort on the first error
       
         /* finds xids when root is not NULL */
      -  if (do_xa && ha_recover(&xids, &mem_root))
      -    goto err1;
      +  //if (do_xa && ha_recover(&xids, &mem_root))
      +  //  goto err1;
       
         /*
           Scan the binlog for XIDs that need to be committed if still in the
      @@ -11371,6 +11374,8 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name,
       
         if (do_xa)
         {
      +    if (ha_recover(&xids))
      +      goto err2;
           if (binlog_checkpoint_found)
           {
       #ifndef HAVE_REPLICATION
      

      Error log before and after this fix
      Before the fix, the error was in the second call of ha_recover():

      2021-12-21 20:07:45 0 [Note] InnoDB: 10.6.5 started; log sequence number 15159225; transaction id 126557
      2021-12-21 20:07:45 0 [Note] InnoDB: Loading buffer pool(s) from /rdsdbdata/db/innodb/ib_buffer_pool
      2021-12-21 20:07:45 0 [Note] Plugin 'FEEDBACK' is disabled.
      2021-12-21 20:07:45 0 [Note] Recovering after a crash using /rdsdbdata/log/binlog/mysql-bin-changelog
      2021-12-21 20:07:45 0 [Note] Starting table crash recovery...
      2021-12-21 20:07:45 0 [Note] InnoDB: Starting recovery for XA transactions...
      2021-12-21 20:07:45 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery
      2021-12-21 20:07:45 0 [Note] InnoDB: Transaction contains changes to 1 rows
      2021-12-21 20:07:45 0 [Note] InnoDB: 1 transactions in prepared state after recovery
      2021-12-21 20:07:45 0 [Note] Found 1 prepared transaction(s) in InnoDB
      2021-12-21 20:07:45 0 [Note] Crash table recovery finished.
      2021-12-21 20:07:45 0 [Note] InnoDB: Starting recovery for XA transactions...
      2021-12-21 20:07:45 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery
      2021-12-21 20:07:45 0 [Note] InnoDB: Transaction contains changes to 1 rows
      2021-12-21 20:07:45 0 [Note] InnoDB: 1 transactions in prepared state after recovery
      2021-12-21 20:07:45 0 [Note] Found 1 prepared transaction(s) in InnoDB
      2021-12-21 20:07:45 0 [ERROR] Found 1 prepared transactions! It means that server was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start server with --tc-heuristic-recover switch to commit or rollback pending transactions.
      2021-12-21 20:07:45 0 [ERROR] Aborting

      After the fix, ha_recover() was still called twice(verified by adding some private log) but no error/note logs of the second call:

      2021-12-21  1:00:20 0 [Note] InnoDB: 10.6.5 started; log sequence number 15158841; transaction id 126557
      2021-12-21  1:00:20 0 [Note] InnoDB: Loading buffer pool(s) from /rdsdbdata/db/innodb/ib_buffer_pool
      2021-12-21  1:00:20 0 [Note] Recovering after a crash using /rdsdbdata/log/binlog/mysql-bin-changelog
      2021-12-21  1:00:20 0 [Note] Starting table crash recovery...
      2021-12-21  1:00:20 0 [Note] InnoDB: Starting recovery for XA transactions...
      2021-12-21  1:00:20 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery
      2021-12-21  1:00:20 0 [Note] InnoDB: Transaction contains changes to 1 rows
      2021-12-21  1:00:20 0 [Note] InnoDB: 1 transactions in prepared state after recovery
      2021-12-21  1:00:20 0 [Note] Found 1 prepared transaction(s) in InnoDB
      2021-12-21  1:00:20 0 [Note] Crash table recovery finished.
      2021-12-21  1:00:20 0 [Note] Server socket created on IP: '0.0.0.0'.
      2021-12-21  1:00:20 0 [Note] Server socket created on IP: '::'.
      2021-12-21  1:00:20 0 [Note] InnoDB: Buffer pool(s) load completed at 211221  1:00:20
      2021-12-21  1:00:20 0 [Note] /rdsdbbin/mysql/bin/mysqld: ready for connections.
      

      Could you please investigate whether we should update the function or other conditions need to be considered?

      Attachments

        Activity

          Commit was authored by sujatha.sivakumar and committed by Elkin (latter now assigned to issue). Maybe Andrei can give some pointers of what the original commit tried to achieve so it would be easier to debug what the code is doing?

          otto Otto Kekäläinen added a comment - Commit was authored by sujatha.sivakumar and committed by Elkin (latter now assigned to issue). Maybe Andrei can give some pointers of what the original commit tried to achieve so it would be easier to debug what the code is doing?
          Elkin Andrei Elkin added a comment -

          otto, No pointers can be made at this point but I am going to look into shortly. It feels a bug. You'll see more details from commits.

          Elkin Andrei Elkin added a comment - otto , No pointers can be made at this point but I am going to look into shortly. It feels a bug. You'll see more details from commits.
          Elkin Andrei Elkin added a comment -

          wenhug, thanks for the report! It's not easy to reproduce though. I tried running your SP while kill -9 the server.
          Every time there's a trx in prepared state it gets recovered.

          First I am wondering if my method is really equivalent. Maybe you could give it a try to prove practically.
          Secondly, could you please tell me your server build options and run time options?
          Maybe building and running as you did I will have better luck.

          Notice you can gain really helpful warnings into the error log with setting log_warning > 2.
          See e.g

             if (count_in_prepare > 0 && global_system_variables.log_warnings > 2)
              {
                sql_print_warning("Could not complete %u number of transactions.",
                                  count_in_prepare);
          
          

          Once you'll have more info in the error log I may have some ideas.

          If nothing like that will help us, please consider if I could log in your server and examine the recovery
          with gdb.

          Thank you and Merry Xmas!

          Elkin Andrei Elkin added a comment - wenhug , thanks for the report! It's not easy to reproduce though. I tried running your SP while kill -9 the server. Every time there's a trx in prepared state it gets recovered. First I am wondering if my method is really equivalent. Maybe you could give it a try to prove practically. Secondly, could you please tell me your server build options and run time options? Maybe building and running as you did I will have better luck. Notice you can gain really helpful warnings into the error log with setting log_warning > 2 . See e.g if (count_in_prepare > 0 && global_system_variables.log_warnings > 2 ) { sql_print_warning( "Could not complete %u number of transactions." , count_in_prepare); Once you'll have more info in the error log I may have some ideas. If nothing like that will help us, please consider if I could log in your server and examine the recovery with gdb. Thank you and Merry Xmas!
          Elkin Andrei Elkin added a comment -

          wenhug, to your quick fix,

          it's equivalent of executing --tc-heuristic-recover=ROLLBACK. That's why it works out.

          After closer look into possible causes of non-decided xids by the 2nd ha_recover() time,
          there's a

              if (round == 1)
              {
                if (!binlog_checkpoint_found)
                  break;   
          

          escape from the for-loop and after it

            if (do_xa)
            {
              if (binlog_checkpoint_found)
              {
          #ifndef HAVE_REPLICATION
                if (ha_recover_complete(&xids))
          #else
                if (ctx.complete(this, xids))
          #endif
          

          won't be attended.
          Could you please examine the last binlog file on whether it contains any Binlog_checkpoint_log_event.
          When it does not indeed, your next attempt with log_warnings = 3 obviously won't hit

              if (count_in_prepare > 0 && global_system_variables.log_warnings > 2)
              {
                sql_print_warning("Could not complete %u number of transactions.",
                                  count_in_prepare);
                return false; // there's later dry run ha_recover() to error out
              }
          

          which otherwise (Binlog_checkpoint event is in place) it must do.
          Also more warnings from inside ha_recover_complete would also display then xid:s of undecided sticky transactions.

          Hope this will be helpful.

          Elkin Andrei Elkin added a comment - wenhug , to your quick fix, it's equivalent of executing --tc-heuristic-recover=ROLLBACK. That's why it works out. After closer look into possible causes of non-decided xids by the 2nd ha_recover() time, there's a if (round == 1 ) { if (!binlog_checkpoint_found) break ; escape from the for-loop and after it if (do_xa) { if (binlog_checkpoint_found) { #ifndef HAVE_REPLICATION if (ha_recover_complete(&xids)) # else if (ctx.complete( this , xids)) #endif won't be attended. Could you please examine the last binlog file on whether it contains any Binlog_checkpoint_log_event. When it does not indeed, your next attempt with log_warnings = 3 obviously won't hit if (count_in_prepare > 0 && global_system_variables.log_warnings > 2 ) { sql_print_warning( "Could not complete %u number of transactions." , count_in_prepare); return false ; // there's later dry run ha_recover() to error out } which otherwise (Binlog_checkpoint event is in place) it must do. Also more warnings from inside ha_recover_complete would also display then xid:s of undecided sticky transactions. Hope this will be helpful.
          wenhug Hugo Wen added a comment - - edited

          Elkin Thank you very much for looking into the issue!

          The issue occurred when creating a snapshot of the file system. I couldn't reproduce with `kill -9` either using the same binaries. Also probably there were some other SQL commands ran before the snapshot was taken but I don't know for sure.
          I'll do some more tests and see if there's a steady way to reproduce it. Or maybe can share the data file if that's allowed by our policy.

          I just tried starting mysqld with --log_warnings=3 as you suggested:
          (It's on another instance with the same issue, so the trx id is different to the original one)

          2021-12-26 19:39:04 0 [Note] Loaded 'ha_rocksdb.so' with offset 0x1550a0e1b000
          2021-12-26 19:39:04 0 [Note] RocksDB: 2 column families found
          2021-12-26 19:39:04 0 [Note] RocksDB: Column Families at start:
          2021-12-26 19:39:04 0 [Note]   cf=default
          2021-12-26 19:39:04 0 [Note]     write_buffer_size=67108864
          2021-12-26 19:39:04 0 [Note]     target_file_size_base=67108864
          2021-12-26 19:39:04 0 [Note]   cf=__system__
          2021-12-26 19:39:04 0 [Note]     write_buffer_size=67108864
          2021-12-26 19:39:04 0 [Note]     target_file_size_base=67108864
          2021-12-26 19:39:04 0 [Note] RocksDB: Table_store: loaded DDL data for 0 tables
          2021-12-26 19:39:04 0 [Note] RocksDB: global statistics using get_sched_indexer_t indexer
          2021-12-26 19:39:04 0 [Note] MyRocks storage engine plugin has been successfully initialized.
          2021-12-26 19:39:04 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
          2021-12-26 19:39:04 0 [Note] InnoDB: Number of pools: 1
          2021-12-26 19:39:04 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
          2021-12-26 19:39:04 0 [Note] InnoDB: Using Linux native AIO
          2021-12-26 19:39:04 0 [Note] InnoDB: Initializing buffer pool, total size = 2147483648, chunk size = 134217728
          2021-12-26 19:39:04 0 [Note] InnoDB: Completed initialization of buffer pool
          2021-12-26 19:39:04 0 [Note] InnoDB: Transaction 126555 was in the XA prepared state.
          2021-12-26 19:39:04 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo
          2021-12-26 19:39:04 0 [Note] InnoDB: Trx id counter is 126556
          2021-12-26 19:39:04 0 [Note] InnoDB: 128 rollback segments are active.
          2021-12-26 19:39:04 0 [Note] InnoDB: Creating shared tablespace for temporary tables
          2021-12-26 19:39:04 0 [Note] InnoDB: Starting in background the rollback of recovered transactions
          2021-12-26 19:39:04 0 [Note] InnoDB: Rollback of non-prepared transactions completed
          2021-12-26 19:39:04 0 [Note] InnoDB: Setting file '/rdsdbdata/db/innodb/ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
          2021-12-26 19:39:04 0 [Note] InnoDB: File '/rdsdbdata/db/innodb/ibtmp1' size is now 12 MB.
          2021-12-26 19:39:04 0 [Note] InnoDB: 10.6.5 started; log sequence number 15159285; transaction id 126557
          2021-12-26 19:39:04 0 [Note] InnoDB: Loading buffer pool(s) from /rdsdbdata/db/innodb/ib_buffer_pool
          2021-12-26 19:39:04 0 [Note] Recovering after a crash using /rdsdbdata/log/binlog/mysql-bin-changelog
          2021-12-26 19:39:04 0 [Note] Starting table crash recovery...
          2021-12-26 19:39:04 0 [Note] InnoDB: Starting recovery for XA transactions...
          2021-12-26 19:39:04 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery
          2021-12-26 19:39:04 0 [Note] InnoDB: Transaction contains changes to 1 rows
          2021-12-26 19:39:04 0 [Note] InnoDB: 1 transactions in prepared state after recovery
          2021-12-26 19:39:04 0 [Note] Found 1 prepared transaction(s) in InnoDB
          2021-12-26 19:39:04 0 [Note] Crash table recovery finished.
          2021-12-26 19:39:04 0 [Warning] Found prepared transaction with xid 118560
          2021-12-26 19:39:04 0 [Warning] Could not complete 1 number of transactions.
          2021-12-26 19:39:04 0 [Note] InnoDB: Starting recovery for XA transactions...
          2021-12-26 19:39:04 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery
          2021-12-26 19:39:04 0 [Note] InnoDB: Transaction contains changes to 1 rows
          2021-12-26 19:39:04 0 [Note] InnoDB: 1 transactions in prepared state after recovery
          2021-12-26 19:39:04 0 [Note] Found 1 prepared transaction(s) in InnoDB
          2021-12-26 19:39:04 0 [ERROR] Found 1 prepared transactions! It means that server was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start server with --tc-heuristic-recover switch to commit or rollback pending transactions.
          2021-12-26 19:39:04 0 [ERROR] Aborting
          

          And the binlog files are like this:

          [19:23:09][root]$ ls -alt
          total 13336
          drwxr-xr-x  2 rdsdb rdsdb    4096 Dec 26 19:18 .
          -rw-rw-rw-  1 rdsdb rdsdb     343 Dec 26 19:18 mysql-bin-changelog.index
          -rw-rw----  1 rdsdb rdsdb   67146 Dec 20 20:25 mysql-bin-changelog.000021
          -rw-rw----  1 rdsdb rdsdb 1707199 Dec 20 20:25 mysql-bin-changelog.000020
          -rw-rw----  1 rdsdb rdsdb 4792114 Dec 20 20:25 mysql-bin-changelog.000019
          -rw-rw-r--  1 rdsdb rdsdb 7051915 Dec 20 20:23 mysql-bin-changelog.000018
          -rw-rw-r--  1 rdsdb rdsdb     717 Dec 20 20:20 mysql-bin-changelog.000017
          -rw-rw-r--  1 rdsdb rdsdb     717 Dec 20 20:15 mysql-bin-changelog.000016
          -rw-rw-r--  1 rdsdb rdsdb     717 Dec 20 20:10 mysql-bin-changelog.000015
          drwxr-xr-x 10 rdsdb rdsdb    4096 Dec 20 19:04 ..
          [19:23:11][root]$ 
          [19:23:12][root]$ /rdsdbbin/mysql/bin/mysqlbinlog mysql-bin-changelog.000021 | grep checkpoint
          #211220 20:25:29 server id 1604102129  end_log_pos 352 CRC32 0x1c7cf816 	Binlog checkpoint mysql-bin-changelog.000020
          #211220 20:25:29 server id 1604102129  end_log_pos 627 CRC32 0x6509ef3b 	Binlog checkpoint mysql-bin-changelog.000021
          [19:23:15][root]$ 
          

          I'll be OOO until January 3rd, later I can do more tests and private builds if needed. Happy holidays and happy new year!

          wenhug Hugo Wen added a comment - - edited Elkin Thank you very much for looking into the issue! The issue occurred when creating a snapshot of the file system. I couldn't reproduce with `kill -9` either using the same binaries. Also probably there were some other SQL commands ran before the snapshot was taken but I don't know for sure. I'll do some more tests and see if there's a steady way to reproduce it. Or maybe can share the data file if that's allowed by our policy. I just tried starting mysqld with --log_warnings=3 as you suggested: (It's on another instance with the same issue, so the trx id is different to the original one) 2021 - 12 - 26 19 : 39 : 04 0 [Note] Loaded 'ha_rocksdb.so' with offset 0x1550a0e1b000 2021 - 12 - 26 19 : 39 : 04 0 [Note] RocksDB: 2 column families found 2021 - 12 - 26 19 : 39 : 04 0 [Note] RocksDB: Column Families at start: 2021 - 12 - 26 19 : 39 : 04 0 [Note] cf= default 2021 - 12 - 26 19 : 39 : 04 0 [Note] write_buffer_size= 67108864 2021 - 12 - 26 19 : 39 : 04 0 [Note] target_file_size_base= 67108864 2021 - 12 - 26 19 : 39 : 04 0 [Note] cf=__system__ 2021 - 12 - 26 19 : 39 : 04 0 [Note] write_buffer_size= 67108864 2021 - 12 - 26 19 : 39 : 04 0 [Note] target_file_size_base= 67108864 2021 - 12 - 26 19 : 39 : 04 0 [Note] RocksDB: Table_store: loaded DDL data for 0 tables 2021 - 12 - 26 19 : 39 : 04 0 [Note] RocksDB: global statistics using get_sched_indexer_t indexer 2021 - 12 - 26 19 : 39 : 04 0 [Note] MyRocks storage engine plugin has been successfully initialized. 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: Compressed tables use zlib 1.2 . 11 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: Number of pools: 1 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: Using Linux native AIO 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: Initializing buffer pool, total size = 2147483648 , chunk size = 134217728 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: Completed initialization of buffer pool 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: Transaction 126555 was in the XA prepared state. 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: Trx id counter is 126556 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: 128 rollback segments are active. 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: Creating shared tablespace for temporary tables 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: Starting in background the rollback of recovered transactions 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: Rollback of non-prepared transactions completed 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: Setting file '/rdsdbdata/db/innodb/ibtmp1' size to 12 MB. Physically writing the file full; Please wait ... 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: File '/rdsdbdata/db/innodb/ibtmp1' size is now 12 MB. 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: 10.6 . 5 started; log sequence number 15159285 ; transaction id 126557 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: Loading buffer pool(s) from /rdsdbdata/db/innodb/ib_buffer_pool 2021 - 12 - 26 19 : 39 : 04 0 [Note] Recovering after a crash using /rdsdbdata/log/binlog/mysql-bin-changelog 2021 - 12 - 26 19 : 39 : 04 0 [Note] Starting table crash recovery... 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: Starting recovery for XA transactions... 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: Transaction contains changes to 1 rows 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: 1 transactions in prepared state after recovery 2021 - 12 - 26 19 : 39 : 04 0 [Note] Found 1 prepared transaction(s) in InnoDB 2021 - 12 - 26 19 : 39 : 04 0 [Note] Crash table recovery finished. 2021 - 12 - 26 19 : 39 : 04 0 [Warning] Found prepared transaction with xid 118560 2021 - 12 - 26 19 : 39 : 04 0 [Warning] Could not complete 1 number of transactions. 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: Starting recovery for XA transactions... 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: Transaction contains changes to 1 rows 2021 - 12 - 26 19 : 39 : 04 0 [Note] InnoDB: 1 transactions in prepared state after recovery 2021 - 12 - 26 19 : 39 : 04 0 [Note] Found 1 prepared transaction(s) in InnoDB 2021 - 12 - 26 19 : 39 : 04 0 [ERROR] Found 1 prepared transactions! It means that server was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start server with --tc-heuristic-recover switch to commit or rollback pending transactions. 2021 - 12 - 26 19 : 39 : 04 0 [ERROR] Aborting And the binlog files are like this: [ 19 : 23 : 09 ][root]$ ls -alt total 13336 drwxr-xr-x 2 rdsdb rdsdb 4096 Dec 26 19 : 18 . -rw-rw-rw- 1 rdsdb rdsdb 343 Dec 26 19 : 18 mysql-bin-changelog.index -rw-rw---- 1 rdsdb rdsdb 67146 Dec 20 20 : 25 mysql-bin-changelog. 000021 -rw-rw---- 1 rdsdb rdsdb 1707199 Dec 20 20 : 25 mysql-bin-changelog. 000020 -rw-rw---- 1 rdsdb rdsdb 4792114 Dec 20 20 : 25 mysql-bin-changelog. 000019 -rw-rw-r-- 1 rdsdb rdsdb 7051915 Dec 20 20 : 23 mysql-bin-changelog. 000018 -rw-rw-r-- 1 rdsdb rdsdb 717 Dec 20 20 : 20 mysql-bin-changelog. 000017 -rw-rw-r-- 1 rdsdb rdsdb 717 Dec 20 20 : 15 mysql-bin-changelog. 000016 -rw-rw-r-- 1 rdsdb rdsdb 717 Dec 20 20 : 10 mysql-bin-changelog. 000015 drwxr-xr-x 10 rdsdb rdsdb 4096 Dec 20 19 : 04 .. [ 19 : 23 : 11 ][root]$ [ 19 : 23 : 12 ][root]$ /rdsdbbin/mysql/bin/mysqlbinlog mysql-bin-changelog. 000021 | grep checkpoint # 211220 20 : 25 : 29 server id 1604102129 end_log_pos 352 CRC32 0x1c7cf816 Binlog checkpoint mysql-bin-changelog. 000020 # 211220 20 : 25 : 29 server id 1604102129 end_log_pos 627 CRC32 0x6509ef3b Binlog checkpoint mysql-bin-changelog. 000021 [ 19 : 23 : 15 ][root]$ I'll be OOO until January 3rd, later I can do more tests and private builds if needed. Happy holidays and happy new year!
          Elkin Andrei Elkin added a comment -

          wenhug,

          So it looks very much that the block of static void xarecover_do_commit_or_rollback()

          {
            rc= xarecover_decide_to_commit(member, ptr_commit_max) ?
              hton->commit_by_xid(hton, &x) : hton->rollback_by_xid(hton, &x);
           
            /*
              It's fine to have non-zero rc which would be from transaction
              non-participant hton:s.
            */
            DBUG_ASSERT(rc || member->in_engine_prepare > 0);
          
          

          computes rc >0. It could be XAER_NOTA = -4 or XAER_RMFAIL = -7.
          Also consider the latter code is returned "normally" at times:

          innobase_commit_by_xid(
          /*===================*/
          	handlerton*	hton,
          	XID*		xid)	/*!< in: X/Open XA transaction identification */
          {
          	...
          	if (high_level_read_only) {
          		return(XAER_RMFAIL);
          	}
          

          Enjoy your yearend, bloody well deserved & Happy New one ahead!

          Andrei

          Elkin Andrei Elkin added a comment - wenhug , So it looks very much that the block of static void xarecover_do_commit_or_rollback() { rc= xarecover_decide_to_commit(member, ptr_commit_max) ? hton->commit_by_xid(hton, &x) : hton->rollback_by_xid(hton, &x);   /* It's fine to have non-zero rc which would be from transaction non-participant hton:s. */ DBUG_ASSERT(rc || member->in_engine_prepare > 0 ); computes rc >0 . It could be XAER_NOTA = -4 or XAER_RMFAIL = -7 . Also consider the latter code is returned "normally" at times: innobase_commit_by_xid( /*===================*/ handlerton* hton, XID* xid) /*!< in: X/Open XA transaction identification */ { ... if (high_level_read_only) { return (XAER_RMFAIL); } Enjoy your yearend, bloody well deserved & Happy New one ahead! Andrei
          wenhug Hugo Wen added a comment - - edited

          Elkin,

          I tried several ways to reproduce the issue but seems it's not easy to reproduce with normal pkill commands. Probably requires a file system snapshot or something, I'm still testing.

          However I created a tar file of the crashed data directory and I just sent to you in a separate email. Would you be able to help debugging with those data files?

          Thanks,
          Hugo

          wenhug Hugo Wen added a comment - - edited Elkin , I tried several ways to reproduce the issue but seems it's not easy to reproduce with normal pkill commands. Probably requires a file system snapshot or something, I'm still testing. However I created a tar file of the crashed data directory and I just sent to you in a separate email. Would you be able to help debugging with those data files? Thanks, Hugo
          wenhug Hugo Wen added a comment -

          After debugging line by line using GDB, root cause was found.

          When doing the recovery, the server generates a list of XIDs to be committed or rolled-back.
          Then the callback function of innodb will be called to handle the commit or rollback.

          In this case, the innodb transaction needs to be rolled-back. However it failed when searching the XID from trx_sys in function trx_get_trx_by_xid_callback . To be specific, failed when comparing xid data in line-2014. Code

          Failed reason is server id doesn't match. Server ID is saved in xid_t.data[]. The format of xid_t.data[] is:

          MySQLXid<Server ID 8 bytes><XID 8 bytes>
          

          Paste the GDB output here:

          arg->xid is the input of the function to be searched from the hash.
          element->trx->xid is the one exists in the hash which expected to match

          Apparently gtrid_length, bqual_length, and transaction id all match but server ids are different.

          • ((ulonglong)(arg->xid->data+8)) = 1049637262 which is the server id of the restored database.
          • ((ulonglong)(element->trx->xid->data+8)) = 1604102129 which is the server id of the original database that the snapshot taken from.

          (gdb) p element->id
          $2 = 126555
          (gdb) p element->trx
          $3 = (trx_t *) 0x7fffebcca288
          (gdb) p element->trx->is_recovered
          $4 = true
          (gdb) p element->trx->state
          $5 = {m = {static _S_min_alignment = 4, static _S_alignment = 4, _M_i = TRX_STATE_PREPARED}}
          (gdb) p arg->xid
          $6 = (const XID *) 0x7fffffffad90
          (gdb) p element->trx->xid
          $7 = (XID *) 0x7ffff68b9258
          (gdb) p arg->xid->gtrid_length
          $8 = 24
          (gdb) p element->trx->xid->gtrid_length
          $9 = 24
          (gdb) p arg->xid->bqual_length
          $10 = 0
          (gdb) p element->trx->xid->bqual_length
          $11 = 0
          (gdb) p arg->xid->data
          $12 = "MySQLXid\216\061\220>\000\000\000\000 \317\001", '\000' <repeats 22 times>, "\002\000\000\000\000\000\000\000\277;W\031\000\000\000`\256\377\377\377\177\000\000J\006\001\000\000\000\000\000\023", '\000' <repeats 15 times>, "й\377\377\377\177\000\000\266\071\000\000\000\000\000\000Ю\377\377\377\177\000\000\337\361wVUU\000\000\230\000\000\000\000\000\000"
          (gdb) p element->trx->xid->data
          $13 = "MySQLXid\361\247\234_\000\000\000\000 \317\001", '\000' <repeats 108 times>
          (gdb) p *((ulonglong*)(arg->xid->data+16))
          $14 = 118560
          (gdb) p *((ulonglong*)(element->trx->xid->data+16))
          $15 = 118560
          (gdb) p *((ulonglong*)(arg->xid->data+8))
          $16 = 1049637262
          (gdb) p *((ulonglong*)(element->trx->xid->data+8))
          $17 = 1604102129
          
          

          Why the server id doesn't match ?

          After this commit, the input xid creation becomes using the set function:

            void set(ulonglong xid)
            {
              my_xid tmp;
              formatID= 1;
              set(MYSQL_XID_PREFIX_LEN, 0, MYSQL_XID_PREFIX);
              memcpy(data+MYSQL_XID_PREFIX_LEN, &server_id, sizeof(server_id));
              tmp= xid;
              memcpy(data+MYSQL_XID_OFFSET, &tmp, sizeof(tmp));
              gtrid_length=MYSQL_XID_GTRID_LEN;
            }
          

          This set function doesn't copy the original server id, instead it copy the new server_id of the new database.

          memcpy(data+MYSQL_XID_PREFIX_LEN, &server_id, sizeof(server_id))

          Therefore the xid can't be find in trx_sys because the server id doesn't match any more.

          Why there's no such issue in 10.5?

          In 10.5, the input xid was created directly from original xid : code xid= *trx->xid;

                XID& xid= arg->xid_list[arg->count];
                if (arg->count++ < arg->len)
                {
                  trx->state= TRX_STATE_PREPARED_RECOVERED;
                  ...
                  xid= *trx->xid;
                }
          

          So the server match keeps the same with the old one.

          To make it short, the issue could happen when restoring a new instance from original crashed data directory USING NEW SERVER ID. The issue doesn't exist in previous major versions before 10.6.

          How to fix?

          Seems a possible fix is to update the logic of creating the input XID object:

          Proposed fix:

          I've tested following changes fixed the issue. with the fix db recovered successfully.
          While I'm still running the full mtr tests against the clean upstream commits to make sure it doesn't break anything, I paste it here as a raw diff file

          If there's no breaking mtr tests and no obvious issues I will create PR to fix it.
          Also may worth to add a MTR test, but it's late today I'll work on it tomorrow.

          Elkin, could you please help review if the root cause and fix makes sense?

          [wenhug@dev-dsk-wenhug-2b-f388f48d RDSMariaDB]$ cat brazil-build/patches/mariadb-1335.diff 
          diff --git a/sql/handler.cc b/sql/handler.cc
          index e4543bde5de..5cad422f45f 100644
          --- a/sql/handler.cc
          +++ b/sql/handler.cc
          @@ -2414,7 +2414,7 @@ struct xarecover_st
           */
           static xid_recovery_member*
           xid_member_insert(HASH *hash_arg, my_xid xid_arg, MEM_ROOT *ptr_mem_root,
          -                  XID *full_xid_arg)
          +                  XID *full_xid_arg, ulong server_id_arg)
           {
             xid_recovery_member *member= (xid_recovery_member *)
               alloc_root(ptr_mem_root, sizeof(xid_recovery_member));
          @@ -2428,7 +2428,7 @@ xid_member_insert(HASH *hash_arg, my_xid xid_arg, MEM_ROOT *ptr_mem_root,
           
             if (full_xid_arg)
               *xid_full= *full_xid_arg;
          -  *member= xid_recovery_member(xid_arg, 1, false, xid_full);
          +  *member= xid_recovery_member(xid_arg, 1, false, xid_full, server_id_arg);
           
             return
               my_hash_insert(hash_arg, (uchar*) member) ? NULL : member;
          @@ -2443,14 +2443,15 @@ xid_member_insert(HASH *hash_arg, my_xid xid_arg, MEM_ROOT *ptr_mem_root,
           */
           static bool xid_member_replace(HASH *hash_arg, my_xid xid_arg,
                                          MEM_ROOT *ptr_mem_root,
          -                               XID *full_xid_arg)
          +                               XID *full_xid_arg,
          +                               ulong server_id_arg)
           {
             xid_recovery_member* member;
             if ((member= (xid_recovery_member *)
                  my_hash_search(hash_arg, (uchar *)& xid_arg, sizeof(xid_arg))))
               member->in_engine_prepare++;
             else
          -    member= xid_member_insert(hash_arg, xid_arg, ptr_mem_root, full_xid_arg);
          +    member= xid_member_insert(hash_arg, xid_arg, ptr_mem_root, full_xid_arg,  server_id_arg);
           
             return member == NULL;
           }
          @@ -2502,7 +2503,7 @@ static void xarecover_do_commit_or_rollback(handlerton *hton,
             Binlog_offset *ptr_commit_max= arg->binlog_coord;
           
             if (!member->full_xid)
          -    x.set(member->xid);
          +    x.set(member->xid, member->server_id);
             else
               x= *member->full_xid;
           
          @@ -2660,7 +2661,8 @@ static my_bool xarecover_handlerton(THD *unused, plugin_ref plugin,
                   {
                     // remember "full" xid too when it's not in mysql format
                     if (xid_member_replace(info->commit_list, x, info->mem_root,
          -                                 is_server_xid? NULL : &info->list[i]))
          +                                 is_server_xid? NULL : &info->list[i],
          +                                 is_server_xid? info->list[i].get_trx_server_id(): server_id))
                     {
                       info->error= true;
                       sql_print_error("Error in memory allocation at xarecover_handlerton");
          diff --git a/sql/handler.h b/sql/handler.h
          index 0cdaf1dd698..73014093705 100644
          --- a/sql/handler.h
          +++ b/sql/handler.h
          @@ -895,6 +895,18 @@ struct xid_t {
               memcpy(data+MYSQL_XID_OFFSET, &tmp, sizeof(tmp));
               gtrid_length=MYSQL_XID_GTRID_LEN;
             }
          +
          +  void set(ulonglong xid, ulong server_id)
          +  {
          +    my_xid tmp;
          +    formatID= 1;
          +    set(MYSQL_XID_PREFIX_LEN, 0, MYSQL_XID_PREFIX);
          +    memcpy(data+MYSQL_XID_PREFIX_LEN, &server_id, sizeof(server_id));
          +    tmp= xid;
          +    memcpy(data+MYSQL_XID_OFFSET, &tmp, sizeof(tmp));
          +    gtrid_length=MYSQL_XID_GTRID_LEN;
          +  }
          +
             void set(long g, long b, const char *d)
             {
               formatID= 1;
          @@ -916,6 +928,12 @@ struct xid_t {
                      !memcmp(data, MYSQL_XID_PREFIX, MYSQL_XID_PREFIX_LEN) ?
                      quick_get_my_xid() : 0;
             }
          +  ulong get_trx_server_id()
          +  {
          +    ulong trx_server_id;
          +    memcpy(&trx_server_id, data+MYSQL_XID_PREFIX_LEN, sizeof(server_id));
          +    return trx_server_id;
          +  }
             uint length()
             {
               return static_cast<uint>(sizeof(formatID)) + key_length();
          @@ -957,11 +975,12 @@ struct xid_recovery_member
             bool decided_to_commit;
             Binlog_offset binlog_coord; // semisync recovery binlog offset
             XID *full_xid;           // needed by wsrep or past it recovery
          +  ulong server_id;         // server id of original server
           
             xid_recovery_member(my_xid xid_arg, uint prepare_arg, bool decided_arg,
          -                      XID *full_xid_arg)
          +                      XID *full_xid_arg, ulong server_id_arg)
               : xid(xid_arg), in_engine_prepare(prepare_arg),
          -      decided_to_commit(decided_arg), full_xid(full_xid_arg) {};
          +      decided_to_commit(decided_arg), full_xid(full_xid_arg) , server_id(server_id_arg) {};
           };
           
           /* for recover() handlerton call */
          
          

          wenhug Hugo Wen added a comment - After debugging line by line using GDB, root cause was found. When doing the recovery, the server generates a list of XIDs to be committed or rolled-back. Then the callback function of innodb will be called to handle the commit or rollback. In this case, the innodb transaction needs to be rolled-back. However it failed when searching the XID from trx_sys in function trx_get_trx_by_xid_callback . To be specific, failed when comparing xid data in line-2014. Code Failed reason is server id doesn't match . Server ID is saved in xid_t.data[] . The format of xid_t.data[] is: MySQLXid<Server ID 8 bytes><XID 8 bytes> Paste the GDB output here: arg->xid is the input of the function to be searched from the hash. element->trx->xid is the one exists in the hash which expected to match Apparently gtrid_length , bqual_length , and transaction id all match but server ids are different. ((ulonglong )(arg->xid->data+8)) = 1049637262 which is the server id of the restored database. ((ulonglong )(element->trx->xid->data+8)) = 1604102129 which is the server id of the original database that the snapshot taken from. (gdb) p element->id $ 2 = 126555 (gdb) p element->trx $ 3 = (trx_t *) 0x7fffebcca288 (gdb) p element->trx->is_recovered $ 4 = true (gdb) p element->trx->state $ 5 = {m = { static _S_min_alignment = 4 , static _S_alignment = 4 , _M_i = TRX_STATE_PREPARED}} (gdb) p arg->xid $ 6 = ( const XID *) 0x7fffffffad90 (gdb) p element->trx->xid $ 7 = (XID *) 0x7ffff68b9258 (gdb) p arg->xid->gtrid_length $ 8 = 24 (gdb) p element->trx->xid->gtrid_length $ 9 = 24 (gdb) p arg->xid->bqual_length $ 10 = 0 (gdb) p element->trx->xid->bqual_length $ 11 = 0 (gdb) p arg->xid->data $ 12 = "MySQLXid\216\061\220>\000\000\000\000 \317\001" , '\000' <repeats 22 times>, "\002\000\000\000\000\000\000\000\277;W\031\000\000\000`\256\377\377\377\177\000\000J\006\001\000\000\000\000\000\023" , '\000' <repeats 15 times>, "й\377\377\377\177\000\000\266\071\000\000\000\000\000\000Ю\377\377\377\177\000\000\337\361wVUU\000\000\230\000\000\000\000\000\000" (gdb) p element->trx->xid->data $ 13 = "MySQLXid\361\247\234_\000\000\000\000 \317\001" , '\000' <repeats 108 times> (gdb) p *((ulonglong*)(arg->xid->data+ 16 )) $ 14 = 118560 (gdb) p *((ulonglong*)(element->trx->xid->data+ 16 )) $ 15 = 118560 (gdb) p *((ulonglong*)(arg->xid->data+ 8 )) $ 16 = 1049637262 (gdb) p *((ulonglong*)(element->trx->xid->data+ 8 )) $ 17 = 1604102129 Why the server id doesn't match ? After this commit , the input xid creation becomes using the set function: void set(ulonglong xid) { my_xid tmp; formatID= 1 ; set(MYSQL_XID_PREFIX_LEN, 0 , MYSQL_XID_PREFIX); memcpy(data+MYSQL_XID_PREFIX_LEN, &server_id, sizeof(server_id)); tmp= xid; memcpy(data+MYSQL_XID_OFFSET, &tmp, sizeof(tmp)); gtrid_length=MYSQL_XID_GTRID_LEN; } This set function doesn't copy the original server id, instead it copy the new server_id of the new database. memcpy(data+MYSQL_XID_PREFIX_LEN, &server_id, sizeof(server_id)) Therefore the xid can't be find in trx_sys because the server id doesn't match any more. Why there's no such issue in 10.5? In 10.5, the input xid was created directly from original xid : code xid= *trx->xid; XID& xid= arg->xid_list[arg->count]; if (arg->count++ < arg->len) { trx->state= TRX_STATE_PREPARED_RECOVERED; ... xid= *trx->xid; } So the server match keeps the same with the old one. To make it short, the issue could happen when restoring a new instance from original crashed data directory USING NEW SERVER ID . The issue doesn't exist in previous major versions before 10.6. How to fix? Seems a possible fix is to update the logic of creating the input XID object: Add original server id in struct xid_recovery_member . Then update the xarecover_do_commit_or_rollback function to set the original server id properly. Proposed fix: I've tested following changes fixed the issue. with the fix db recovered successfully. While I'm still running the full mtr tests against the clean upstream commits to make sure it doesn't break anything, I paste it here as a raw diff file If there's no breaking mtr tests and no obvious issues I will create PR to fix it. Also may worth to add a MTR test, but it's late today I'll work on it tomorrow. Elkin , could you please help review if the root cause and fix makes sense? [wenhug @dev -dsk-wenhug-2b-f388f48d RDSMariaDB]$ cat brazil-build/patches/mariadb- 1335 .diff diff --git a/sql/handler.cc b/sql/handler.cc index e4543bde5de..5cad422f45f 100644 --- a/sql/handler.cc +++ b/sql/handler.cc @@ - 2414 , 7 + 2414 , 7 @@ struct xarecover_st */ static xid_recovery_member* xid_member_insert(HASH *hash_arg, my_xid xid_arg, MEM_ROOT *ptr_mem_root, - XID *full_xid_arg) + XID *full_xid_arg, ulong server_id_arg) { xid_recovery_member *member= (xid_recovery_member *) alloc_root(ptr_mem_root, sizeof(xid_recovery_member)); @@ - 2428 , 7 + 2428 , 7 @@ xid_member_insert(HASH *hash_arg, my_xid xid_arg, MEM_ROOT *ptr_mem_root, if (full_xid_arg) *xid_full= *full_xid_arg; - *member= xid_recovery_member(xid_arg, 1 , false , xid_full); + *member= xid_recovery_member(xid_arg, 1 , false , xid_full, server_id_arg); return my_hash_insert(hash_arg, (uchar*) member) ? NULL : member; @@ - 2443 , 14 + 2443 , 15 @@ xid_member_insert(HASH *hash_arg, my_xid xid_arg, MEM_ROOT *ptr_mem_root, */ static bool xid_member_replace(HASH *hash_arg, my_xid xid_arg, MEM_ROOT *ptr_mem_root, - XID *full_xid_arg) + XID *full_xid_arg, + ulong server_id_arg) { xid_recovery_member* member; if ((member= (xid_recovery_member *) my_hash_search(hash_arg, (uchar *)& xid_arg, sizeof(xid_arg)))) member->in_engine_prepare++; else - member= xid_member_insert(hash_arg, xid_arg, ptr_mem_root, full_xid_arg); + member= xid_member_insert(hash_arg, xid_arg, ptr_mem_root, full_xid_arg, server_id_arg); return member == NULL; } @@ - 2502 , 7 + 2503 , 7 @@ static void xarecover_do_commit_or_rollback(handlerton *hton, Binlog_offset *ptr_commit_max= arg->binlog_coord; if (!member->full_xid) - x.set(member->xid); + x.set(member->xid, member->server_id); else x= *member->full_xid; @@ - 2660 , 7 + 2661 , 8 @@ static my_bool xarecover_handlerton(THD *unused, plugin_ref plugin, { // remember "full" xid too when it's not in mysql format if (xid_member_replace(info->commit_list, x, info->mem_root, - is_server_xid? NULL : &info->list[i])) + is_server_xid? NULL : &info->list[i], + is_server_xid? info->list[i].get_trx_server_id(): server_id)) { info->error= true ; sql_print_error( "Error in memory allocation at xarecover_handlerton" ); diff --git a/sql/handler.h b/sql/handler.h index 0cdaf1dd698.. 73014093705 100644 --- a/sql/handler.h +++ b/sql/handler.h @@ - 895 , 6 + 895 , 18 @@ struct xid_t { memcpy(data+MYSQL_XID_OFFSET, &tmp, sizeof(tmp)); gtrid_length=MYSQL_XID_GTRID_LEN; } + + void set(ulonglong xid, ulong server_id) + { + my_xid tmp; + formatID= 1 ; + set(MYSQL_XID_PREFIX_LEN, 0 , MYSQL_XID_PREFIX); + memcpy(data+MYSQL_XID_PREFIX_LEN, &server_id, sizeof(server_id)); + tmp= xid; + memcpy(data+MYSQL_XID_OFFSET, &tmp, sizeof(tmp)); + gtrid_length=MYSQL_XID_GTRID_LEN; + } + void set( long g, long b, const char *d) { formatID= 1 ; @@ - 916 , 6 + 928 , 12 @@ struct xid_t { !memcmp(data, MYSQL_XID_PREFIX, MYSQL_XID_PREFIX_LEN) ? quick_get_my_xid() : 0 ; } + ulong get_trx_server_id() + { + ulong trx_server_id; + memcpy(&trx_server_id, data+MYSQL_XID_PREFIX_LEN, sizeof(server_id)); + return trx_server_id; + } uint length() { return static_cast<uint>(sizeof(formatID)) + key_length(); @@ - 957 , 11 + 975 , 12 @@ struct xid_recovery_member bool decided_to_commit; Binlog_offset binlog_coord; // semisync recovery binlog offset XID *full_xid; // needed by wsrep or past it recovery + ulong server_id; // server id of original server xid_recovery_member(my_xid xid_arg, uint prepare_arg, bool decided_arg, - XID *full_xid_arg) + XID *full_xid_arg, ulong server_id_arg) : xid(xid_arg), in_engine_prepare(prepare_arg), - decided_to_commit(decided_arg), full_xid(full_xid_arg) {}; + decided_to_commit(decided_arg), full_xid(full_xid_arg) , server_id(server_id_arg) {}; }; /* for recover() handlerton call */
          Elkin Andrei Elkin added a comment -

          wenhug, thanks for digging it out!

          Indeed, 6c39eaeb1 made recovery dependent on server_id.
          Your diff makes sense, so feel free to go with a PR.
          To a lower level, having only browsed it, I am not sure if a new full-blown SET method is necessary (while we only need server_id to overwrite)...

          Cheers,

          Andrei.

          Elkin Andrei Elkin added a comment - wenhug , thanks for digging it out! Indeed, 6c39eaeb1 made recovery dependent on server_id. Your diff makes sense, so feel free to go with a PR. To a lower level, having only browsed it, I am not sure if a new full-blown SET method is necessary (while we only need server_id to overwrite)... Cheers, Andrei.
          wenhug Hugo Wen added a comment -

          Elkin, I created a PR for the issue https://github.com/MariaDB/server/pull/1983 . Could you help review it ? Thanks a lot!

          wenhug Hugo Wen added a comment - Elkin , I created a PR for the issue https://github.com/MariaDB/server/pull/1983 . Could you help review it ? Thanks a lot!
          Elkin Andrei Elkin added a comment -

          commit dafc5fb9c17637c6233e8a73c890f158d645a37c
          Author: Hugo Wen <wenhug@amazon.com>
          Date: Fri Feb 4 03:56:08 2022 +0000

          is pushed into 10.6.

          Elkin Andrei Elkin added a comment - commit dafc5fb9c17637c6233e8a73c890f158d645a37c Author: Hugo Wen <wenhug@amazon.com> Date: Fri Feb 4 03:56:08 2022 +0000 is pushed into 10.6.

          People

            Elkin Andrei Elkin
            wenhug Hugo Wen
            Votes:
            2 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.