[MDEV-23536] Race condition between KILL and transaction commit - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.2(EOL), 10.3(EOL), 10.4(EOL), 10.5(EOL), 10.6
Fix Version/s: 10.2.37, 10.3.28, 10.4.18, 10.5.9, 10.6.0
Component/s: Locking, Server
Labels:
- ASAN
- race

Description

A race condition may occur between the execution of transaction commit, and an execution of a KILL statement that would attempt to abort that transaction.

~~MDEV-17092~~ worked around this race condition by modifying InnoDB code. After that issue was closed, svoj pointed out that this race condition would better be fixed above the storage engine layer:

If you look carefully into the above, you can conclude that thd->free_connection() can be called concurrently with KILL/thd->awake(). Which is the bug. And it is partially fixed in THD::~THD(), that is destructor waits for KILL completion:

He is quoting this code of THD::~THD() in 10.5:

/*

    Other threads may have a lock on LOCK_thd_kill to ensure that this

    THD is not deleted while they access it. The following mutex_lock

    ensures that no one else is using this THD and it's now safe to delete

*/

  if (WSREP_NNULL(this)) mysql_mutex_lock(&LOCK_thd_data);

  mysql_mutex_lock(&LOCK_thd_kill);

  mysql_mutex_unlock(&LOCK_thd_kill);

  if (WSREP_NNULL(this)) mysql_mutex_unlock(&LOCK_thd_data);

  if (!free_connection_done)

    free_connection();

And he seems to suggest that the empty critical section should be moved to THD::free_connection(). Note: in 10.2 and 10.3, that code is slightly different:

  mysql_mutex_lock(&LOCK_thd_kill);

  mysql_mutex_unlock(&LOCK_thd_kill);

#ifdef WITH_WSREP

  delete wsrep_rgi;

#endif

  if (!free_connection_done)

    free_connection();

Nevertheless, it seems that we might want to do something like

#if MYSQL_VERSION_ID >= 100400

  if (WSREP_NNULL(this)) mysql_mutex_lock(&LOCK_thd_data);

#endif

  if (!free_connection_done)

    free_connection();

  else

    mysql_mutex_lock(&LOCK_thd_kill);

    mysql_mutex_unlock(&LOCK_thd_kill);

#if MYSQL_VERSION_ID >= 100400

  if (WSREP_NNULL(this)) mysql_mutex_unlock(&LOCK_thd_data);

#elif defined WITH_WSREP

  delete wsrep_rgi;

#endif

It might turn out that the else branch is not needed. The empty lock/unlock pair would of course be added to THD::free_connection().

It might also turn out that all the Galera-specific changes need to be done in THD::free_connection(). (In that case, we would likely want to assign wsrep_rgi= NULL).

As part of this fix, the trx_t::free() instrumentation that was modified in ~~MDEV-22782~~ should be tightened: trx_t::mysql_thd and trx_t::state must be poisoned, because innobase_kill_connection() should no longer be invoked on a freed transaction of a freed connection. This should of course be validated with an RQG run similar to the one that reproduced ~~MDEV-17092~~.

Attachments

Issue Links

blocks

MDEV-23328 Server hang due to Galera lock conflict resolution

Closed

relates to

MDEV-24624 main.lock_tables_lost_commit failed in buildbot with timeout, Trying to lock mutex at sql/sql_base.cc, line 958, when the mutex was already locked

Closed

MDEV-26014 Race condition between KILL and client disconnect

Closed

MDEV-17092 ERROR: AddressSanitizer: use-after-poison around lock_trx_handle_wait_low

Closed

MDEV-22782 SUMMARY: AddressSanitizer: unknown-crash storage/innobase/trx/trx0trx.cc:566 in trx_t::commit_state()

Closed

MDEV-23328 Server hang due to Galera lock conflict resolution

Closed

MDEV-24561 rpl.rpl_semi_sync_uninstall_plugin fails in bb with "Found wrong usage of mutex 'LOCK_thd_kill' and 'LOCK_show_status'"

Closed

MDEV-29368 Assertion `trx->mysql_thd == thd' failed in innobase_kill_query from process_timers/timer_handler and use-after-poison in innobase_kill_query

Closed

(3 relates to)

Activity

People

Assignee:: Jan Lindström (Inactive)

Reporter:: Marko Mäkelä

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 2020-08-22 09:52

Updated:: 2024-07-07 21:57

Resolved:: 2021-01-08 15:23

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.