[MDEV-15772] Potential list overrun during XA recovery Created: 2018-04-04  Updated: 2019-05-08  Resolved: 2019-04-24

Status: Closed
Project: MariaDB Server
Component/s: XA
Affects Version/s: 5.5, 10.0, 10.1, 10.2, 10.3
Fix Version/s: 10.2.24, 5.5.64, 10.1.39, 10.3.15, 10.4.5

Type: Bug Priority: Critical
Reporter: Sergey Vojtovich Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-19408 Assertion `trx->state == TRX_STATE_AC... Closed

 Description   

In ha_recover() array of XA transactions is allocated as following:

#define MAX_XID_LIST_SIZE  (1024*128)
#define MIN_XID_LIST_SIZE  128
 
  for (info.len= MAX_XID_LIST_SIZE ;
       info.list==0 && info.len > MIN_XID_LIST_SIZE; info.len/=2)
  {
    info.list=(XID *)my_malloc(info.len*sizeof(XID), MYF(0));
  }
  if (!info.list)
  {
    sql_print_error(ER(ER_OUTOFMEMORY),
                    static_cast<int>(info.len*sizeof(XID)));
    DBUG_RETURN(1);
  }

Then each storage engine fills this array. However at least InnoDB (trx_recover_for_mysql()) doesn't check for boundaries and may overrun this array.



 Comments   
Comment by Thirunarayanan Balathandayuthapani [ 2018-06-07 ]

MAX_XID_LIST_SIZE is 128 *1024. It is because 128 rollback segment and each 1024 slots for 16k page size.

128 rollback segment is constant irrespective of page size.

Page size Number of transaction supported per rollback segment
4k 128
8k 256
16k 512
32k 1024
64k 2048

For page size 64k, there is a possibility of overrun the list of XA transaction.
From mariadb-10.1 only, InnoDB supports 32k and 64k page size.
I would like to change affected version to 10.1

Comment by Sergey Vojtovich [ 2018-06-26 ]

The idea of recovery is such that it doesn't recover all transactions at once, but rather in batches of info.len size.

And in fact there is a check for boundaries:

UNIV_INTERN
int
trx_recover_for_mysql(
/*==================*/
        XID*    xid_list,       /*!< in/out: prepared transactions */
        ulint   len)            /*!< in: number of slots in xid_list */
{
  ...
  while (trx) {
    ...
    if (count == len) {
      break;
    }
    ...
  }
  ...
}

So, no overrun and no bug here, right?

Comment by Thirunarayanan Balathandayuthapani [ 2018-06-26 ]

There is no bug for binlog driven recovery. InnoDB can return the same list again and again if the xa transaction given from external transaction manager.

 
UNIV_INTERN
int
trx_recover_for_mysql(
/*==================*/
        XID*    xid_list,       /*!< in/out: prepared transactions */
        ulint   len)            /*!< in: number of slots in xid_list */
{
 
           for (trx = UT_LIST_GET_FIRST(trx_sys->rw_trx_list);
             trx != NULL;
             trx = UT_LIST_GET_NEXT(trx_list, trx)) {
 
           }

I think, we can fix it inside innodb itself by identifying whether transaction is already part of XA_list.

Comment by Sergey Vojtovich [ 2018-06-26 ]

Our current API goes along with XA standard. And it defines that RM must advance it's cursor past last returned transaction.

In other words InnoDB is responsible for not returning same transactions multiple times.

E.g. see https://www.ibm.com/support/knowledgecenter/en/SSB23S_1.1.0.14/gtpc2/cpp_xa_recover.html

Comment by Marko Mäkelä [ 2018-07-25 ]

svoj, do you mean that multiple successive invocations of XA RECOVER should not keep returning the same data? Currently, it should; only XA COMMIT or XA ROLLBACK will ‘consume’ recovered XIDs.

Comment by Sergey Vojtovich [ 2018-07-25 ]

marko, not XA RECOVER, which is user interface and iterates xid_cache and returns full list. But rather xa_recover(), which is handler::recover() in our case, that is internal interface.

Comment by Thirunarayanan Balathandayuthapani [ 2019-04-22 ]

Following patch sets the recover list length to the minimum value.

diff --git a/sql/handler.cc b/sql/handler.cc
index ab4d9fd..8c6270a 100644
--- a/sql/handler.cc
+++ b/sql/handler.cc
@@ -1740,6 +1740,7 @@ int ha_recover(HASH *commit_list)
   for (info.len= MAX_XID_LIST_SIZE ; 
        info.list==0 && info.len > MIN_XID_LIST_SIZE; info.len/=2)
   {
+    DBUG_EXECUTE_IF("min_xa_len", info.len = 16;);
     info.list=(XID *)my_malloc(info.len*sizeof(XID), MYF(0));
   }
   if (!info.list)

Below test case hangs with the above patch:

 
-- source include/have_innodb.inc
-- source include/not_embedded.inc
create table t1 (a int) engine=innodb;
insert into t1 values(1);
 
let $trial = 50;
while ($trial)
{
--connect (con$trial, localhost, root,,)
let $st_pre = `select concat('test', $trial)`;
echo $st_pre;
eval xa start '$st_pre';
insert into t1 values(1);
eval xa end '$st_pre';
eval xa prepare '$st_pre';
dec $trial;
}
 
connection default;
# Kill and restart the server.
-- exec echo "wait" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
-- shutdown_server 0
-- source include/wait_until_disconnected.inc
 
-- exec echo "restart:--debug_dbug=+d,min_xa_len" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
-- enable_reconnect
-- source include/wait_until_connected_again.inc
-- disable_reconnect
 
drop table t1;

Basically, InnoDB returns the same transaction again and again. So it hangs during the restart of the server.

Comment by Thirunarayanan Balathandayuthapani [ 2019-04-22 ]

Patch is in bb-5.5-thiru

Comment by Marko Mäkelä [ 2019-04-24 ]

I revised thiru’s fix, splitting the PREPARED state into two, and not introducing a Boolean flag. This should ease the merge to 10.2, which allocates transaction objects from a pool.

svoj, the intention was that InnoDB would return the list of XA PREPARE transactions only once. But, it turns out that init_server_components() is invoking ha_recover() both directly and via tc_log->open(). To ensure that the second traversal will get a list of all transactions instead of getting an empty list, we will reset the state of each transaction at the end of the traversal. It would be nice to remove this work-around.

Generated at Thu Feb 08 08:23:54 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.