Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27342

MariaDB 10.6 failed to do crash recovery, potential regression in commit 6c39eaeb

Details

    Description

      Issue description:

      MariaDB is designed to be crash-safe and recent versions of MariaDB (tested with 10.5.12) are perfectly capable of starting and recovering if the /var/lib/mysql data directory is based on a filesystem level snapshot and there is a transaction in progress. However in 10.6.5 this is no longer the case and mariadbd fails to start and recover when the data directory is from a filesystem snapshot. Based on my investigation is seems to be a regression introduced in 6c39eaeb1.

      MariaDB 10.6.5 failed to recover the database from a data directory snapshot which was taken while following transaction is running:

      create database testdb;
      use testdb;
      CREATE TABLE `t1` (`col1` int(11) NOT NULL,`col2` int(11) DEFAULT NULL,PRIMARY KEY (`col1`)) ENGINE=InnoDB DEFAULT CHARSET=latin1;
      DELIMITER $$
      CREATE PROCEDURE generateredo(count int)
      BEGIN
      DECLARE counter INT DEFAULT 1;
      WHILE counter <= count DO
      insert into t1 values(counter,counter);
      update t1 set col2=counter+1 where col1=counter;
      delete from t1 where col1=counter;
      SET counter = counter + 1;
      END WHILE;
      END$$
      DELIMITER ;
       
      call testdb.generateredo(10000000);
      

      The mysql error log when recovering:

      2021-12-20 18:55:50 0 [Note] RocksDB: 2 column families found
      2021-12-20 18:55:50 0 [Note] RocksDB: Column Families at start:
      2021-12-20 18:55:50 0 [Note]   cf=default
      2021-12-20 18:55:50 0 [Note]     write_buffer_size=67108864
      2021-12-20 18:55:50 0 [Note]     target_file_size_base=67108864
      2021-12-20 18:55:50 0 [Note]   cf=__system__
      2021-12-20 18:55:50 0 [Note]     write_buffer_size=67108864
      2021-12-20 18:55:50 0 [Note]     target_file_size_base=67108864
      2021-12-20 18:55:50 0 [Note] RocksDB: Table_store: loaded DDL data for 0 tables
      2021-12-20 18:55:50 0 [Note] RocksDB: global statistics using get_sched_indexer_t indexer
      2021-12-20 18:55:50 0 [Note] MyRocks storage engine plugin has been successfully initialized.
      2021-12-20 18:55:50 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
      2021-12-20 18:55:50 0 [Note] InnoDB: Number of pools: 1
      2021-12-20 18:55:50 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
      2021-12-20 18:55:50 0 [Note] InnoDB: Using Linux native AIO
      2021-12-20 18:55:50 0 [Note] InnoDB: Initializing buffer pool, total size = 49392123904, chunk size = 134217728
      2021-12-20 18:55:50 0 [Note] InnoDB: Completed initialization of buffer pool
      2021-12-20 18:55:50 0 [Note] InnoDB: Transaction 210265 was in the XA prepared state.
      2021-12-20 18:55:50 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo
      2021-12-20 18:55:50 0 [Note] InnoDB: Trx id counter is 210266
      2021-12-20 18:55:50 0 [Note] InnoDB: 128 rollback segments are active.
      2021-12-20 18:55:50 0 [Note] InnoDB: Creating shared tablespace for temporary tables
      2021-12-20 18:55:50 0 [Note] InnoDB: Setting file '/rdsdbdata/db/innodb/ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
      2021-12-20 18:55:50 0 [Note] InnoDB: File '/rdsdbdata/db/innodb/ibtmp1' size is now 12 MB.
      2021-12-20 18:55:50 0 [Note] InnoDB: Starting in background the rollback of recovered transactions
      2021-12-20 18:55:50 0 [Note] InnoDB: Rollback of non-prepared transactions completed
      2021-12-20 18:55:50 0 [Note] InnoDB: 10.6.5 started; log sequence number 25054305; transaction id 210267
      2021-12-20 18:55:50 0 [Note] InnoDB: Loading buffer pool(s) from /rdsdbdata/db/innodb/ib_buffer_pool
      2021-12-20 18:55:50 0 [Note] Recovering after a crash using /rdsdbdata/log/binlog/mysql-bin-changelog
      2021-12-20 18:55:50 0 [Note] Starting table crash recovery...
      2021-12-20 18:55:50 0 [Note] InnoDB: Starting recovery for XA transactions...
      2021-12-20 18:55:50 0 [Note] InnoDB: Transaction 210265 in prepared state after recovery
      2021-12-20 18:55:50 0 [Note] InnoDB: Transaction contains changes to 1 rows
      2021-12-20 18:55:50 0 [Note] InnoDB: 1 transactions in prepared state after recovery
      2021-12-20 18:55:50 0 [Note] Found 1 prepared transaction(s) in InnoDB
      2021-12-20 18:55:50 0 [Note] Crash table recovery finished.
      2021-12-20 18:55:50 0 [Note] InnoDB: Starting recovery for XA transactions...
      2021-12-20 18:55:50 0 [Note] InnoDB: Transaction 210265 in prepared state after recovery
      2021-12-20 18:55:50 0 [Note] InnoDB: Transaction contains changes to 1 rows
      2021-12-20 18:55:50 0 [Note] InnoDB: 1 transactions in prepared state after recovery
      2021-12-20 18:55:50 0 [Note] Found 1 prepared transaction(s) in InnoDB
      2021-12-20 18:55:50 0 [ERROR] Found 1 prepared transactions! It means that server was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start server with --tc-heuristic-recover switch to commit or rollback pending transactions.
      2021-12-20 18:55:50 0 [ERROR] Aborting
      

      The issue is regarded as a regression because the same data files are able to recover with MariaDB 10.5.12 binaries.

      Some investigation:

      Based on the implementation of function init_server_components(), during the recovery, ha_recover() function is called twice

      • Stack trace of the first call: https://github.com/MariaDB/server/blob/10.6/sql/mysqld.cc#L5292

        sql/handler.cc:2706(ha_recover(st_hash*, st_mem_root*))[0x5599477cf057]
        sql/log.cc:11169(MYSQL_BIN_LOG::recover(st_log_info*, char const*, st_io_cache*, Format_description_log_event*, bool))[0x5599478a80dd]
        sql/log.cc:11502(MYSQL_BIN_LOG::do_binlog_recovery(char const*, bool))[0x5599478a8931]
        sql/log.cc:10151(MYSQL_BIN_LOG::open(char const*))[0x5599478aceda]
        sql/mysqld.cc:5289(init_server_components())[0x5599475098ce]
        sql/mysqld.cc:5655(mysqld_main(int, char**))[0x55994750ed55]
        ??:0(__libc_start_main)[0x14f07bff20ba]
        

      • Stack trace of the second call: https://github.com/MariaDB/server/blob/10.6/sql/mysqld.cc#L5298

        sql/handler.cc:2710(ha_recover(st_hash*, st_mem_root*))[0x55e57611247b]
        sql/mysqld.cc:5295(init_server_components())[0x55e575e4c8df]
        sql/mysqld.cc:5655(mysqld_main(int, char**))[0x55e575e51d55]
        ??:0(__libc_start_main)[0x1542b650e0ba]
        ??:0(_start)[0x55e575e4764a]
        

      However, in the commit the ha_recover() function call was moved from the end of function TC_LOG_BINLOG::recover to the beginning.

      Tried a quick fix

      I tried a simple fix of moving the ha_recover() back to the end of recover() then the issue was fixed:

      diff --git a/sql/log.cc b/sql/log.cc
      index 41eed80ef66..ddda2e5d330 100644
      --- a/sql/log.cc
      +++ b/sql/log.cc
      @@ -11166,8 +11166,8 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name,
         fdle->flags&= ~LOG_EVENT_BINLOG_IN_USE_F; // abort on the first error
       
         /* finds xids when root is not NULL */
      -  if (do_xa && ha_recover(&xids, &mem_root))
      -    goto err1;
      +  //if (do_xa && ha_recover(&xids, &mem_root))
      +  //  goto err1;
       
         /*
           Scan the binlog for XIDs that need to be committed if still in the
      @@ -11371,6 +11374,8 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name,
       
         if (do_xa)
         {
      +    if (ha_recover(&xids))
      +      goto err2;
           if (binlog_checkpoint_found)
           {
       #ifndef HAVE_REPLICATION
      

      Error log before and after this fix
      Before the fix, the error was in the second call of ha_recover():

      2021-12-21 20:07:45 0 [Note] InnoDB: 10.6.5 started; log sequence number 15159225; transaction id 126557
      2021-12-21 20:07:45 0 [Note] InnoDB: Loading buffer pool(s) from /rdsdbdata/db/innodb/ib_buffer_pool
      2021-12-21 20:07:45 0 [Note] Plugin 'FEEDBACK' is disabled.
      2021-12-21 20:07:45 0 [Note] Recovering after a crash using /rdsdbdata/log/binlog/mysql-bin-changelog
      2021-12-21 20:07:45 0 [Note] Starting table crash recovery...
      2021-12-21 20:07:45 0 [Note] InnoDB: Starting recovery for XA transactions...
      2021-12-21 20:07:45 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery
      2021-12-21 20:07:45 0 [Note] InnoDB: Transaction contains changes to 1 rows
      2021-12-21 20:07:45 0 [Note] InnoDB: 1 transactions in prepared state after recovery
      2021-12-21 20:07:45 0 [Note] Found 1 prepared transaction(s) in InnoDB
      2021-12-21 20:07:45 0 [Note] Crash table recovery finished.
      2021-12-21 20:07:45 0 [Note] InnoDB: Starting recovery for XA transactions...
      2021-12-21 20:07:45 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery
      2021-12-21 20:07:45 0 [Note] InnoDB: Transaction contains changes to 1 rows
      2021-12-21 20:07:45 0 [Note] InnoDB: 1 transactions in prepared state after recovery
      2021-12-21 20:07:45 0 [Note] Found 1 prepared transaction(s) in InnoDB
      2021-12-21 20:07:45 0 [ERROR] Found 1 prepared transactions! It means that server was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start server with --tc-heuristic-recover switch to commit or rollback pending transactions.
      2021-12-21 20:07:45 0 [ERROR] Aborting

      After the fix, ha_recover() was still called twice(verified by adding some private log) but no error/note logs of the second call:

      2021-12-21  1:00:20 0 [Note] InnoDB: 10.6.5 started; log sequence number 15158841; transaction id 126557
      2021-12-21  1:00:20 0 [Note] InnoDB: Loading buffer pool(s) from /rdsdbdata/db/innodb/ib_buffer_pool
      2021-12-21  1:00:20 0 [Note] Recovering after a crash using /rdsdbdata/log/binlog/mysql-bin-changelog
      2021-12-21  1:00:20 0 [Note] Starting table crash recovery...
      2021-12-21  1:00:20 0 [Note] InnoDB: Starting recovery for XA transactions...
      2021-12-21  1:00:20 0 [Note] InnoDB: Transaction 126555 in prepared state after recovery
      2021-12-21  1:00:20 0 [Note] InnoDB: Transaction contains changes to 1 rows
      2021-12-21  1:00:20 0 [Note] InnoDB: 1 transactions in prepared state after recovery
      2021-12-21  1:00:20 0 [Note] Found 1 prepared transaction(s) in InnoDB
      2021-12-21  1:00:20 0 [Note] Crash table recovery finished.
      2021-12-21  1:00:20 0 [Note] Server socket created on IP: '0.0.0.0'.
      2021-12-21  1:00:20 0 [Note] Server socket created on IP: '::'.
      2021-12-21  1:00:20 0 [Note] InnoDB: Buffer pool(s) load completed at 211221  1:00:20
      2021-12-21  1:00:20 0 [Note] /rdsdbbin/mysql/bin/mysqld: ready for connections.
      

      Could you please investigate whether we should update the function or other conditions need to be considered?

      Attachments

        Activity

          wenhug Hugo Wen added a comment - - edited

          Elkin,

          I tried several ways to reproduce the issue but seems it's not easy to reproduce with normal pkill commands. Probably requires a file system snapshot or something, I'm still testing.

          However I created a tar file of the crashed data directory and I just sent to you in a separate email. Would you be able to help debugging with those data files?

          Thanks,
          Hugo

          wenhug Hugo Wen added a comment - - edited Elkin , I tried several ways to reproduce the issue but seems it's not easy to reproduce with normal pkill commands. Probably requires a file system snapshot or something, I'm still testing. However I created a tar file of the crashed data directory and I just sent to you in a separate email. Would you be able to help debugging with those data files? Thanks, Hugo
          wenhug Hugo Wen added a comment -

          After debugging line by line using GDB, root cause was found.

          When doing the recovery, the server generates a list of XIDs to be committed or rolled-back.
          Then the callback function of innodb will be called to handle the commit or rollback.

          In this case, the innodb transaction needs to be rolled-back. However it failed when searching the XID from trx_sys in function trx_get_trx_by_xid_callback . To be specific, failed when comparing xid data in line-2014. Code

          Failed reason is server id doesn't match. Server ID is saved in xid_t.data[]. The format of xid_t.data[] is:

          MySQLXid<Server ID 8 bytes><XID 8 bytes>
          

          Paste the GDB output here:

          arg->xid is the input of the function to be searched from the hash.
          element->trx->xid is the one exists in the hash which expected to match

          Apparently gtrid_length, bqual_length, and transaction id all match but server ids are different.

          • ((ulonglong)(arg->xid->data+8)) = 1049637262 which is the server id of the restored database.
          • ((ulonglong)(element->trx->xid->data+8)) = 1604102129 which is the server id of the original database that the snapshot taken from.

          (gdb) p element->id
          $2 = 126555
          (gdb) p element->trx
          $3 = (trx_t *) 0x7fffebcca288
          (gdb) p element->trx->is_recovered
          $4 = true
          (gdb) p element->trx->state
          $5 = {m = {static _S_min_alignment = 4, static _S_alignment = 4, _M_i = TRX_STATE_PREPARED}}
          (gdb) p arg->xid
          $6 = (const XID *) 0x7fffffffad90
          (gdb) p element->trx->xid
          $7 = (XID *) 0x7ffff68b9258
          (gdb) p arg->xid->gtrid_length
          $8 = 24
          (gdb) p element->trx->xid->gtrid_length
          $9 = 24
          (gdb) p arg->xid->bqual_length
          $10 = 0
          (gdb) p element->trx->xid->bqual_length
          $11 = 0
          (gdb) p arg->xid->data
          $12 = "MySQLXid\216\061\220>\000\000\000\000 \317\001", '\000' <repeats 22 times>, "\002\000\000\000\000\000\000\000\277;W\031\000\000\000`\256\377\377\377\177\000\000J\006\001\000\000\000\000\000\023", '\000' <repeats 15 times>, "Đą\377\377\377\177\000\000\266\071\000\000\000\000\000\000Đ®\377\377\377\177\000\000\337\361wVUU\000\000\230\000\000\000\000\000\000"
          (gdb) p element->trx->xid->data
          $13 = "MySQLXid\361\247\234_\000\000\000\000 \317\001", '\000' <repeats 108 times>
          (gdb) p *((ulonglong*)(arg->xid->data+16))
          $14 = 118560
          (gdb) p *((ulonglong*)(element->trx->xid->data+16))
          $15 = 118560
          (gdb) p *((ulonglong*)(arg->xid->data+8))
          $16 = 1049637262
          (gdb) p *((ulonglong*)(element->trx->xid->data+8))
          $17 = 1604102129
          
          

          Why the server id doesn't match ?

          After this commit, the input xid creation becomes using the set function:

            void set(ulonglong xid)
            {
              my_xid tmp;
              formatID= 1;
              set(MYSQL_XID_PREFIX_LEN, 0, MYSQL_XID_PREFIX);
              memcpy(data+MYSQL_XID_PREFIX_LEN, &server_id, sizeof(server_id));
              tmp= xid;
              memcpy(data+MYSQL_XID_OFFSET, &tmp, sizeof(tmp));
              gtrid_length=MYSQL_XID_GTRID_LEN;
            }
          

          This set function doesn't copy the original server id, instead it copy the new server_id of the new database.

          memcpy(data+MYSQL_XID_PREFIX_LEN, &server_id, sizeof(server_id))

          Therefore the xid can't be find in trx_sys because the server id doesn't match any more.

          Why there's no such issue in 10.5?

          In 10.5, the input xid was created directly from original xid : code xid= *trx->xid;

                XID& xid= arg->xid_list[arg->count];
                if (arg->count++ < arg->len)
                {
                  trx->state= TRX_STATE_PREPARED_RECOVERED;
                  ...
                  xid= *trx->xid;
                }
          

          So the server match keeps the same with the old one.

          To make it short, the issue could happen when restoring a new instance from original crashed data directory USING NEW SERVER ID. The issue doesn't exist in previous major versions before 10.6.

          How to fix?

          Seems a possible fix is to update the logic of creating the input XID object:

          Proposed fix:

          I've tested following changes fixed the issue. with the fix db recovered successfully.
          While I'm still running the full mtr tests against the clean upstream commits to make sure it doesn't break anything, I paste it here as a raw diff file

          If there's no breaking mtr tests and no obvious issues I will create PR to fix it.
          Also may worth to add a MTR test, but it's late today I'll work on it tomorrow.

          Elkin, could you please help review if the root cause and fix makes sense?

          [wenhug@dev-dsk-wenhug-2b-f388f48d RDSMariaDB]$ cat brazil-build/patches/mariadb-1335.diff 
          diff --git a/sql/handler.cc b/sql/handler.cc
          index e4543bde5de..5cad422f45f 100644
          --- a/sql/handler.cc
          +++ b/sql/handler.cc
          @@ -2414,7 +2414,7 @@ struct xarecover_st
           */
           static xid_recovery_member*
           xid_member_insert(HASH *hash_arg, my_xid xid_arg, MEM_ROOT *ptr_mem_root,
          -                  XID *full_xid_arg)
          +                  XID *full_xid_arg, ulong server_id_arg)
           {
             xid_recovery_member *member= (xid_recovery_member *)
               alloc_root(ptr_mem_root, sizeof(xid_recovery_member));
          @@ -2428,7 +2428,7 @@ xid_member_insert(HASH *hash_arg, my_xid xid_arg, MEM_ROOT *ptr_mem_root,
           
             if (full_xid_arg)
               *xid_full= *full_xid_arg;
          -  *member= xid_recovery_member(xid_arg, 1, false, xid_full);
          +  *member= xid_recovery_member(xid_arg, 1, false, xid_full, server_id_arg);
           
             return
               my_hash_insert(hash_arg, (uchar*) member) ? NULL : member;
          @@ -2443,14 +2443,15 @@ xid_member_insert(HASH *hash_arg, my_xid xid_arg, MEM_ROOT *ptr_mem_root,
           */
           static bool xid_member_replace(HASH *hash_arg, my_xid xid_arg,
                                          MEM_ROOT *ptr_mem_root,
          -                               XID *full_xid_arg)
          +                               XID *full_xid_arg,
          +                               ulong server_id_arg)
           {
             xid_recovery_member* member;
             if ((member= (xid_recovery_member *)
                  my_hash_search(hash_arg, (uchar *)& xid_arg, sizeof(xid_arg))))
               member->in_engine_prepare++;
             else
          -    member= xid_member_insert(hash_arg, xid_arg, ptr_mem_root, full_xid_arg);
          +    member= xid_member_insert(hash_arg, xid_arg, ptr_mem_root, full_xid_arg,  server_id_arg);
           
             return member == NULL;
           }
          @@ -2502,7 +2503,7 @@ static void xarecover_do_commit_or_rollback(handlerton *hton,
             Binlog_offset *ptr_commit_max= arg->binlog_coord;
           
             if (!member->full_xid)
          -    x.set(member->xid);
          +    x.set(member->xid, member->server_id);
             else
               x= *member->full_xid;
           
          @@ -2660,7 +2661,8 @@ static my_bool xarecover_handlerton(THD *unused, plugin_ref plugin,
                   {
                     // remember "full" xid too when it's not in mysql format
                     if (xid_member_replace(info->commit_list, x, info->mem_root,
          -                                 is_server_xid? NULL : &info->list[i]))
          +                                 is_server_xid? NULL : &info->list[i],
          +                                 is_server_xid? info->list[i].get_trx_server_id(): server_id))
                     {
                       info->error= true;
                       sql_print_error("Error in memory allocation at xarecover_handlerton");
          diff --git a/sql/handler.h b/sql/handler.h
          index 0cdaf1dd698..73014093705 100644
          --- a/sql/handler.h
          +++ b/sql/handler.h
          @@ -895,6 +895,18 @@ struct xid_t {
               memcpy(data+MYSQL_XID_OFFSET, &tmp, sizeof(tmp));
               gtrid_length=MYSQL_XID_GTRID_LEN;
             }
          +
          +  void set(ulonglong xid, ulong server_id)
          +  {
          +    my_xid tmp;
          +    formatID= 1;
          +    set(MYSQL_XID_PREFIX_LEN, 0, MYSQL_XID_PREFIX);
          +    memcpy(data+MYSQL_XID_PREFIX_LEN, &server_id, sizeof(server_id));
          +    tmp= xid;
          +    memcpy(data+MYSQL_XID_OFFSET, &tmp, sizeof(tmp));
          +    gtrid_length=MYSQL_XID_GTRID_LEN;
          +  }
          +
             void set(long g, long b, const char *d)
             {
               formatID= 1;
          @@ -916,6 +928,12 @@ struct xid_t {
                      !memcmp(data, MYSQL_XID_PREFIX, MYSQL_XID_PREFIX_LEN) ?
                      quick_get_my_xid() : 0;
             }
          +  ulong get_trx_server_id()
          +  {
          +    ulong trx_server_id;
          +    memcpy(&trx_server_id, data+MYSQL_XID_PREFIX_LEN, sizeof(server_id));
          +    return trx_server_id;
          +  }
             uint length()
             {
               return static_cast<uint>(sizeof(formatID)) + key_length();
          @@ -957,11 +975,12 @@ struct xid_recovery_member
             bool decided_to_commit;
             Binlog_offset binlog_coord; // semisync recovery binlog offset
             XID *full_xid;           // needed by wsrep or past it recovery
          +  ulong server_id;         // server id of original server
           
             xid_recovery_member(my_xid xid_arg, uint prepare_arg, bool decided_arg,
          -                      XID *full_xid_arg)
          +                      XID *full_xid_arg, ulong server_id_arg)
               : xid(xid_arg), in_engine_prepare(prepare_arg),
          -      decided_to_commit(decided_arg), full_xid(full_xid_arg) {};
          +      decided_to_commit(decided_arg), full_xid(full_xid_arg) , server_id(server_id_arg) {};
           };
           
           /* for recover() handlerton call */
          
          

          wenhug Hugo Wen added a comment - After debugging line by line using GDB, root cause was found. When doing the recovery, the server generates a list of XIDs to be committed or rolled-back. Then the callback function of innodb will be called to handle the commit or rollback. In this case, the innodb transaction needs to be rolled-back. However it failed when searching the XID from trx_sys in function trx_get_trx_by_xid_callback . To be specific, failed when comparing xid data in line-2014. Code Failed reason is server id doesn't match . Server ID is saved in xid_t.data[] . The format of xid_t.data[] is: MySQLXid<Server ID 8 bytes><XID 8 bytes> Paste the GDB output here: arg->xid is the input of the function to be searched from the hash. element->trx->xid is the one exists in the hash which expected to match Apparently gtrid_length , bqual_length , and transaction id all match but server ids are different. ((ulonglong )(arg->xid->data+8)) = 1049637262 which is the server id of the restored database. ((ulonglong )(element->trx->xid->data+8)) = 1604102129 which is the server id of the original database that the snapshot taken from. (gdb) p element->id $ 2 = 126555 (gdb) p element->trx $ 3 = (trx_t *) 0x7fffebcca288 (gdb) p element->trx->is_recovered $ 4 = true (gdb) p element->trx->state $ 5 = {m = { static _S_min_alignment = 4 , static _S_alignment = 4 , _M_i = TRX_STATE_PREPARED}} (gdb) p arg->xid $ 6 = ( const XID *) 0x7fffffffad90 (gdb) p element->trx->xid $ 7 = (XID *) 0x7ffff68b9258 (gdb) p arg->xid->gtrid_length $ 8 = 24 (gdb) p element->trx->xid->gtrid_length $ 9 = 24 (gdb) p arg->xid->bqual_length $ 10 = 0 (gdb) p element->trx->xid->bqual_length $ 11 = 0 (gdb) p arg->xid->data $ 12 = "MySQLXid\216\061\220>\000\000\000\000 \317\001" , '\000' <repeats 22 times>, "\002\000\000\000\000\000\000\000\277;W\031\000\000\000`\256\377\377\377\177\000\000J\006\001\000\000\000\000\000\023" , '\000' <repeats 15 times>, "Đą\377\377\377\177\000\000\266\071\000\000\000\000\000\000Đ®\377\377\377\177\000\000\337\361wVUU\000\000\230\000\000\000\000\000\000" (gdb) p element->trx->xid->data $ 13 = "MySQLXid\361\247\234_\000\000\000\000 \317\001" , '\000' <repeats 108 times> (gdb) p *((ulonglong*)(arg->xid->data+ 16 )) $ 14 = 118560 (gdb) p *((ulonglong*)(element->trx->xid->data+ 16 )) $ 15 = 118560 (gdb) p *((ulonglong*)(arg->xid->data+ 8 )) $ 16 = 1049637262 (gdb) p *((ulonglong*)(element->trx->xid->data+ 8 )) $ 17 = 1604102129 Why the server id doesn't match ? After this commit , the input xid creation becomes using the set function: void set(ulonglong xid) { my_xid tmp; formatID= 1 ; set(MYSQL_XID_PREFIX_LEN, 0 , MYSQL_XID_PREFIX); memcpy(data+MYSQL_XID_PREFIX_LEN, &server_id, sizeof(server_id)); tmp= xid; memcpy(data+MYSQL_XID_OFFSET, &tmp, sizeof(tmp)); gtrid_length=MYSQL_XID_GTRID_LEN; } This set function doesn't copy the original server id, instead it copy the new server_id of the new database. memcpy(data+MYSQL_XID_PREFIX_LEN, &server_id, sizeof(server_id)) Therefore the xid can't be find in trx_sys because the server id doesn't match any more. Why there's no such issue in 10.5? In 10.5, the input xid was created directly from original xid : code xid= *trx->xid; XID& xid= arg->xid_list[arg->count]; if (arg->count++ < arg->len) { trx->state= TRX_STATE_PREPARED_RECOVERED; ... xid= *trx->xid; } So the server match keeps the same with the old one. To make it short, the issue could happen when restoring a new instance from original crashed data directory USING NEW SERVER ID . The issue doesn't exist in previous major versions before 10.6. How to fix? Seems a possible fix is to update the logic of creating the input XID object: Add original server id in struct xid_recovery_member . Then update the xarecover_do_commit_or_rollback function to set the original server id properly. Proposed fix: I've tested following changes fixed the issue. with the fix db recovered successfully. While I'm still running the full mtr tests against the clean upstream commits to make sure it doesn't break anything, I paste it here as a raw diff file If there's no breaking mtr tests and no obvious issues I will create PR to fix it. Also may worth to add a MTR test, but it's late today I'll work on it tomorrow. Elkin , could you please help review if the root cause and fix makes sense? [wenhug @dev -dsk-wenhug-2b-f388f48d RDSMariaDB]$ cat brazil-build/patches/mariadb- 1335 .diff diff --git a/sql/handler.cc b/sql/handler.cc index e4543bde5de..5cad422f45f 100644 --- a/sql/handler.cc +++ b/sql/handler.cc @@ - 2414 , 7 + 2414 , 7 @@ struct xarecover_st */ static xid_recovery_member* xid_member_insert(HASH *hash_arg, my_xid xid_arg, MEM_ROOT *ptr_mem_root, - XID *full_xid_arg) + XID *full_xid_arg, ulong server_id_arg) { xid_recovery_member *member= (xid_recovery_member *) alloc_root(ptr_mem_root, sizeof(xid_recovery_member)); @@ - 2428 , 7 + 2428 , 7 @@ xid_member_insert(HASH *hash_arg, my_xid xid_arg, MEM_ROOT *ptr_mem_root, if (full_xid_arg) *xid_full= *full_xid_arg; - *member= xid_recovery_member(xid_arg, 1 , false , xid_full); + *member= xid_recovery_member(xid_arg, 1 , false , xid_full, server_id_arg); return my_hash_insert(hash_arg, (uchar*) member) ? NULL : member; @@ - 2443 , 14 + 2443 , 15 @@ xid_member_insert(HASH *hash_arg, my_xid xid_arg, MEM_ROOT *ptr_mem_root, */ static bool xid_member_replace(HASH *hash_arg, my_xid xid_arg, MEM_ROOT *ptr_mem_root, - XID *full_xid_arg) + XID *full_xid_arg, + ulong server_id_arg) { xid_recovery_member* member; if ((member= (xid_recovery_member *) my_hash_search(hash_arg, (uchar *)& xid_arg, sizeof(xid_arg)))) member->in_engine_prepare++; else - member= xid_member_insert(hash_arg, xid_arg, ptr_mem_root, full_xid_arg); + member= xid_member_insert(hash_arg, xid_arg, ptr_mem_root, full_xid_arg, server_id_arg); return member == NULL; } @@ - 2502 , 7 + 2503 , 7 @@ static void xarecover_do_commit_or_rollback(handlerton *hton, Binlog_offset *ptr_commit_max= arg->binlog_coord; if (!member->full_xid) - x.set(member->xid); + x.set(member->xid, member->server_id); else x= *member->full_xid; @@ - 2660 , 7 + 2661 , 8 @@ static my_bool xarecover_handlerton(THD *unused, plugin_ref plugin, { // remember "full" xid too when it's not in mysql format if (xid_member_replace(info->commit_list, x, info->mem_root, - is_server_xid? NULL : &info->list[i])) + is_server_xid? NULL : &info->list[i], + is_server_xid? info->list[i].get_trx_server_id(): server_id)) { info->error= true ; sql_print_error( "Error in memory allocation at xarecover_handlerton" ); diff --git a/sql/handler.h b/sql/handler.h index 0cdaf1dd698.. 73014093705 100644 --- a/sql/handler.h +++ b/sql/handler.h @@ - 895 , 6 + 895 , 18 @@ struct xid_t { memcpy(data+MYSQL_XID_OFFSET, &tmp, sizeof(tmp)); gtrid_length=MYSQL_XID_GTRID_LEN; } + + void set(ulonglong xid, ulong server_id) + { + my_xid tmp; + formatID= 1 ; + set(MYSQL_XID_PREFIX_LEN, 0 , MYSQL_XID_PREFIX); + memcpy(data+MYSQL_XID_PREFIX_LEN, &server_id, sizeof(server_id)); + tmp= xid; + memcpy(data+MYSQL_XID_OFFSET, &tmp, sizeof(tmp)); + gtrid_length=MYSQL_XID_GTRID_LEN; + } + void set( long g, long b, const char *d) { formatID= 1 ; @@ - 916 , 6 + 928 , 12 @@ struct xid_t { !memcmp(data, MYSQL_XID_PREFIX, MYSQL_XID_PREFIX_LEN) ? quick_get_my_xid() : 0 ; } + ulong get_trx_server_id() + { + ulong trx_server_id; + memcpy(&trx_server_id, data+MYSQL_XID_PREFIX_LEN, sizeof(server_id)); + return trx_server_id; + } uint length() { return static_cast<uint>(sizeof(formatID)) + key_length(); @@ - 957 , 11 + 975 , 12 @@ struct xid_recovery_member bool decided_to_commit; Binlog_offset binlog_coord; // semisync recovery binlog offset XID *full_xid; // needed by wsrep or past it recovery + ulong server_id; // server id of original server xid_recovery_member(my_xid xid_arg, uint prepare_arg, bool decided_arg, - XID *full_xid_arg) + XID *full_xid_arg, ulong server_id_arg) : xid(xid_arg), in_engine_prepare(prepare_arg), - decided_to_commit(decided_arg), full_xid(full_xid_arg) {}; + decided_to_commit(decided_arg), full_xid(full_xid_arg) , server_id(server_id_arg) {}; }; /* for recover() handlerton call */
          Elkin Andrei Elkin added a comment -

          wenhug, thanks for digging it out!

          Indeed, 6c39eaeb1 made recovery dependent on server_id.
          Your diff makes sense, so feel free to go with a PR.
          To a lower level, having only browsed it, I am not sure if a new full-blown SET method is necessary (while we only need server_id to overwrite)...

          Cheers,

          Andrei.

          Elkin Andrei Elkin added a comment - wenhug , thanks for digging it out! Indeed, 6c39eaeb1 made recovery dependent on server_id. Your diff makes sense, so feel free to go with a PR. To a lower level, having only browsed it, I am not sure if a new full-blown SET method is necessary (while we only need server_id to overwrite)... Cheers, Andrei.
          wenhug Hugo Wen added a comment -

          Elkin, I created a PR for the issue https://github.com/MariaDB/server/pull/1983 . Could you help review it ? Thanks a lot!

          wenhug Hugo Wen added a comment - Elkin , I created a PR for the issue https://github.com/MariaDB/server/pull/1983 . Could you help review it ? Thanks a lot!
          Elkin Andrei Elkin added a comment -

          commit dafc5fb9c17637c6233e8a73c890f158d645a37c
          Author: Hugo Wen <wenhug@amazon.com>
          Date: Fri Feb 4 03:56:08 2022 +0000

          is pushed into 10.6.

          Elkin Andrei Elkin added a comment - commit dafc5fb9c17637c6233e8a73c890f158d645a37c Author: Hugo Wen <wenhug@amazon.com> Date: Fri Feb 4 03:56:08 2022 +0000 is pushed into 10.6.

          People

            Elkin Andrei Elkin
            wenhug Hugo Wen
            Votes:
            2 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.