[MDEV-15772] Potential list overrun during XA recovery - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 5.5(EOL), 10.0(EOL), 10.1(EOL), 10.2(EOL), 10.3(EOL)
Fix Version/s: 10.2.24, 5.5.64, 10.1.39, 10.3.15, 10.4.5
Component/s: XA
Labels:
None

Description

In ha_recover() array of XA transactions is allocated as following:

#define MAX_XID_LIST_SIZE  (1024*128)

#define MIN_XID_LIST_SIZE  128

  for (info.len= MAX_XID_LIST_SIZE ;

       info.list==0 && info.len > MIN_XID_LIST_SIZE; info.len/=2)

    info.list=(XID *)my_malloc(info.len*sizeof(XID), MYF(0));

  if (!info.list)

    sql_print_error(ER(ER_OUTOFMEMORY),

                    static_cast<int>(info.len*sizeof(XID)));

    DBUG_RETURN(1);

Then each storage engine fills this array. However at least InnoDB (trx_recover_for_mysql()) doesn't check for boundaries and may overrun this array.

Attachments

Issue Links

relates to

MDEV-19408 Assertion `trx->state == TRX_STATE_ACTIVE || trx->state == TRX_STATE_PREPARED' failed in ReadView::copy_trx_ids, innodb.xa_debug, innodb.innodb fail in buildbot

Closed

Activity

Ascending order - Click to sort in descending order

View 4 older comments

Marko Mäkelä added a comment - 2018-07-25 09:46

svoj, do you mean that multiple successive invocations of XA RECOVER should not keep returning the same data? Currently, it should; only XA COMMIT or XA ROLLBACK will ‘consume’ recovered XIDs.

Marko Mäkelä added a comment - 2018-07-25 09:46 svoj , do you mean that multiple successive invocations of XA RECOVER should not keep returning the same data? Currently, it should; only XA COMMIT or XA ROLLBACK will ‘consume’ recovered XIDs.

Sergey Vojtovich added a comment - 2018-07-25 16:00

marko, not XA RECOVER, which is user interface and iterates xid_cache and returns full list. But rather xa_recover(), which is handler::recover() in our case, that is internal interface.

Sergey Vojtovich added a comment - 2018-07-25 16:00 marko , not XA RECOVER, which is user interface and iterates xid_cache and returns full list. But rather xa_recover(), which is handler::recover() in our case, that is internal interface.

Thirunarayanan Balathandayuthapani added a comment - 2019-04-22 17:25

Following patch sets the recover list length to the minimum value.

diff --git a/sql/handler.cc b/sql/handler.cc

index ab4d9fd..8c6270a 100644

--- a/sql/handler.cc

+++ b/sql/handler.cc

@@ -1740,6 +1740,7 @@ int ha_recover(HASH *commit_list)

   for (info.len= MAX_XID_LIST_SIZE ;

        info.list==0 && info.len > MIN_XID_LIST_SIZE; info.len/=2)

+    DBUG_EXECUTE_IF("min_xa_len", info.len = 16;);

     info.list=(XID *)my_malloc(info.len*sizeof(XID), MYF(0));

   if (!info.list)

Below test case hangs with the above patch:

-- source include/have_innodb.inc

-- source include/not_embedded.inc

create table t1 (a int) engine=innodb;

insert into t1 values(1);

let $trial = 50;

while ($trial)

--connect (con$trial, localhost, root,,)

let $st_pre = `select concat('test', $trial)`;

echo $st_pre;

eval xa start '$st_pre';

insert into t1 values(1);

eval xa end '$st_pre';

eval xa prepare '$st_pre';

dec $trial;

connection default;

# Kill and restart the server.

-- exec echo "wait" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect

-- shutdown_server 0

-- source include/wait_until_disconnected.inc

-- exec echo "restart:--debug_dbug=+d,min_xa_len" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect

-- enable_reconnect

-- source include/wait_until_connected_again.inc

-- disable_reconnect

drop table t1;

Basically, InnoDB returns the same transaction again and again. So it hangs during the restart of the server.

Thirunarayanan Balathandayuthapani added a comment - 2019-04-22 17:25 Following patch sets the recover list length to the minimum value. diff --git a/sql/handler.cc b/sql/handler.cc index ab4d9fd..8c6270a 100644 --- a/sql/handler.cc +++ b/sql/handler.cc @@ -1740,6 +1740,7 @@ int ha_recover(HASH *commit_list) for (info.len= MAX_XID_LIST_SIZE ; info.list==0 && info.len > MIN_XID_LIST_SIZE; info.len/=2) { + DBUG_EXECUTE_IF("min_xa_len", info.len = 16;); info.list=(XID *)my_malloc(info.len*sizeof(XID), MYF(0)); } if (!info.list) Below test case hangs with the above patch: -- source include/have_innodb.inc -- source include/not_embedded.inc create table t1 (a int) engine=innodb; insert into t1 values(1); let $trial = 50; while ($trial) { --connect (con$trial, localhost, root,,) let $st_pre = `select concat('test', $trial)`; echo $st_pre; eval xa start '$st_pre'; insert into t1 values(1); eval xa end '$st_pre'; eval xa prepare '$st_pre'; dec $trial; } connection default; # Kill and restart the server. -- exec echo "wait" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect -- shutdown_server 0 -- source include/wait_until_disconnected.inc -- exec echo "restart:--debug_dbug=+d,min_xa_len" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect -- enable_reconnect -- source include/wait_until_connected_again.inc -- disable_reconnect drop table t1; Basically, InnoDB returns the same transaction again and again. So it hangs during the restart of the server.

Thirunarayanan Balathandayuthapani added a comment - 2019-04-22 18:03

Patch is in bb-5.5-thiru

Thirunarayanan Balathandayuthapani added a comment - 2019-04-22 18:03 Patch is in bb-5.5-thiru

Marko Mäkelä added a comment - 2019-04-24 09:20

I revised thiru’s fix, splitting the PREPARED state into two, and not introducing a Boolean flag. This should ease the merge to 10.2, which allocates transaction objects from a pool.

svoj, the intention was that InnoDB would return the list of XA PREPARE transactions only once. But, it turns out that init_server_components() is invoking ha_recover() both directly and via tc_log->open(). To ensure that the second traversal will get a list of all transactions instead of getting an empty list, we will reset the state of each transaction at the end of the traversal. It would be nice to remove this work-around.

Marko Mäkelä added a comment - 2019-04-24 09:20 I revised thiru ’s fix, splitting the PREPARED state into two, and not introducing a Boolean flag. This should ease the merge to 10.2, which allocates transaction objects from a pool. svoj , the intention was that InnoDB would return the list of XA PREPARE transactions only once. But, it turns out that init_server_components() is invoking ha_recover() both directly and via tc_log->open() . To ensure that the second traversal will get a list of all transactions instead of getting an empty list, we will reset the state of each transaction at the end of the traversal. It would be nice to remove this work-around.

People

Assignee:: Marko Mäkelä

Reporter:: Sergey Vojtovich

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 2018-04-04 08:48

Updated:: 2019-05-08 04:01

Resolved:: 2019-04-24 09:20

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server