[MDEV-18464] Port kill_one_trx fixes from 10.4 to 10.1 Created: 2019-02-04  Updated: 2023-06-26  Resolved: 2020-03-11

Status: Closed
Project: MariaDB Server
Component/s: Galera, Storage Engine - InnoDB, Storage Engine - XtraDB
Affects Version/s: None
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: Jan Lindström (Inactive) Assignee: Jan Lindström (Inactive)
Resolution: Won't Do Votes: 2
Labels: Test_disabled, test_disabled_ES

Issue Links:
Blocks
blocks MDEV-12009 Allow to force kill user threads/quer... Closed
Relates
relates to MDEV-17063 server locks up and is killed after i... Closed
relates to MDEV-19803 Long semaphore wait error on galera.M... Closed
relates to MDEV-23328 Server hang due to Galera lock confli... Closed
relates to MDEV-18874 Galera test MW-286 causes Mutex = TTA... Closed
relates to MDEV-21910 KIlling thread on Galera could cause ... Closed
relates to MDEV-23851 Galera assertion at lock0lock.cc line... Closed

 Description   

There following issues here:

  • Whenever Galera BF (brute force) transaction decides to abort conflicting transaction it will kill that thread using thd::awake()
  • Whenever replication selects a thread as a victim it will call thd::awake()
  • User KILL [QUERY|CONNECTION] ... for a thread it will also call thd::awake()

Whenever one of these actions is executed we will hold number of InnoDB internal mutexes and thd mutexes.
Sometimes these mutexes are taken in different order causing mutex deadlock (see one detailed case below).

Refs



 Comments   
Comment by Jan Lindström (Inactive) [ 2019-02-05 ]

No need to review Galera only parts, but could you please review kill and lock changes from below:

https://github.com/MariaDB/server/commit/c889c8de46dfa1216a46656700e17ba5e1adfdc1

Comment by Marko Mäkelä [ 2019-03-18 ]

I posted some initial review comments. It is very hard to see what the actual change is, because the commit includes many unrelated changes. Can you please submit a minimal commit for review?

While you are doing this, I would suggest replacing trx_t::abort_type with bool replication_abort.

Comment by Jan Lindström (Inactive) [ 2019-03-22 ]

http://lists.askmonty.org/pipermail/commits/2019-March/013579.html

Comment by Marko Mäkelä [ 2019-03-26 ]

Please address my review comments and push to buildbot for final validation.

Comment by Marko Mäkelä [ 2019-03-28 ]

I am afraid that this code is prone to race conditions:

UNIV_INTERN
dberr_t
lock_trx_handle_wait(
/*=================*/
	trx_t*	trx)	/*!< in/out: trx lock state */
{
	if (!trx->victim) {
		lock_mutex_enter();
		trx_mutex_enter(trx);
	}
 
	dberr_t err = lock_trx_handle_wait_low(trx);
 
	if (!trx->victim) {
		lock_mutex_exit();
		trx_mutex_exit(trx);
	}
 
	return err;
}

We cannot safely read trx->victim without holding one of the mutexes. (This new field is documented to be protected by the two mutexes.)

An easy fix ought to be to do lock_mutex_enter() before reading the field, but we cannot do that if we happen to hold the mutex already. It could be cleaner to pass a parameter to this function to tell that the mutexes have already been acquired.

Here are a few other suggested fixes:

diff --git a/storage/innobase/lock/lock0lock.cc b/storage/innobase/lock/lock0lock.cc
index 33563a03f1a..a35c4623a1e 100644
--- a/storage/innobase/lock/lock0lock.cc
+++ b/storage/innobase/lock/lock0lock.cc
@@ -7956,6 +7956,7 @@ lock_trx_release_locks(
 	the is_recovered flag. */
 
 	trx->is_recovered = FALSE;
+	trx->victim = false;
 
 	trx_mutex_exit(trx);
 
diff --git a/storage/innobase/trx/trx0roll.cc b/storage/innobase/trx/trx0roll.cc
index 127e834335d..fa2bced5ce0 100644
--- a/storage/innobase/trx/trx0roll.cc
+++ b/storage/innobase/trx/trx0roll.cc
@@ -370,7 +370,6 @@ trx_rollback_to_savepoint_for_mysql_low(
 	trx_mark_sql_stat_end(trx);
 
 	trx->op_info = "";
-	trx->victim = false;
 
 	return(err);
 }
diff --git a/storage/innobase/trx/trx0trx.cc b/storage/innobase/trx/trx0trx.cc
index 98ac08a00e7..5359e51c17c 100644
--- a/storage/innobase/trx/trx0trx.cc
+++ b/storage/innobase/trx/trx0trx.cc
@@ -1339,7 +1339,6 @@ trx_commit_in_memory(
 	ut_ad(!trx->in_ro_trx_list);
 	ut_ad(!trx->in_rw_trx_list);
 
-	trx->victim = false;
 	trx->dict_operation = TRX_DICT_OP_NONE;
 
 	trx->error_state = DB_SUCCESS;

I think that the correct place to reset the flag would be in lock_trx_release_locks() when marking the transaction committed. The resetting in trx_rollback_to_savepoint_for_mysql_low() looks definitely wrong. Not only it is not holding either mutex, but it is also resetting the flag while performing a partial rollback (rolling back the current row operation, the current statement, or executing ROLLBACK TO SAVEPOINT).

I think that this fix needs to be tested thoroughly, once it has passed my review.

Comment by Marko Mäkelä [ 2019-03-28 ]

I reverted the change, because I believe that it could cause a regression for replication when Galera is not being used (in addition to suffering from some race conditions with Galera).

I think that it would be cleanest to rename trx->lock.was_chosen_as_deadlock_victim to cover both replication and Galera, and to add a parameter to handlerton::kill_query() that informs innobase_kill_query() that the abort was initiated from InnoDB.

This new parameter probably should not be a Boolean, but instead const handlerton*. InnoDB would pass it as innodb_hton when it is the one that initiated abort (and is holding the InnoDB mutexes).

We should consider a multi-storage-engine scenario. A storage engine should not initiate asynchronous rollback for a transaction if it was already initiated by some other storage engine.

Instead of adding a parameter to handlerton::kill_query(), it might also work to add a data member, something like const handlerton* THD::abort_initiator.

Comment by Sergey Vojtovich [ 2019-06-10 ]

Replication issues should've been fixed in 10.2. I'd leave it up to the replication team to get this backported if needed.

commit cc49f00994aa9fc4698d1ca88414e533258b5cf3
Author: Kristian Nielsen <knielsen@knielsen-hq.org>
Date:   Mon Oct 17 12:52:14 2016 +0200
 
    Move InnoDB/XtraDB to async deadlock kill for parallel replication.
 
    In 10.2, use the thd_rpl_deadlock_check() API. This way, all the
    locking hacks around thd_report_wait_for() can be removed. Now
    parallel replication deadlock kill happens asynchroneously, from the
    slave background thread.
 
    In InnoDB, remove also the buffering of wait reports, to simplify the
    code, as this is no longer needed when the locking issues are gone.
 
    In XtraDB, the buffering is kept for now. This is just because
    presumably XtraDB will eventually be updated to MySQL 5.7-based InnoDB
    as well, so there is little need to modify the existing code only for
    clean-up purposes.
 
    The old synchronous function thd_report_wait_for() is no longer used
    and removed in this patch.
 
    Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>

Comment by Sergey Vojtovich [ 2019-06-11 ]

jplindst, I can suggest 3 options:
1. kill from another thread, similarly to replication
2. postpone kill until after mutex is released and before thread goes asleep
3. refactor innobase_kill_query() so that it awakes pending threads without a mutex

Otherwise I wouldn't dare refactoring THD::awake() in 10.1. The only idea I have is to introduce THD::ref_count, which (I feel) is going to be quite intrusive.

Comment by Marko Mäkelä [ 2019-06-19 ]

Does this bug really affect MariaDB 10.4? Why is this now a Task and not a Bug? What are the affected versions?
Is any change needed for 10.4? At one point, the title was "Port kill_one_trx fixes from 10.4 to 10.1".

Comment by Jan Lindström (Inactive) [ 2019-06-19 ]

Affected versions are 10.1, 10.2, 10.3. Yes, this was originally "Port kill_one_trx fixes from 10.4 to 10.1". However, while I was doing that after first review I found more critical problem on same area of the code. It seems that 10.4 is not effected.

Comment by Jan Lindström (Inactive) [ 2019-06-20 ]

10.4 contains dirty read to trx->lock.was_chosen_as_wsrep_victim variable suspect to race

lock_trx_handle_wait(
/*=================*/
	trx_t*	trx)	/*!< in/out: trx lock state */
{
#ifdef WITH_WSREP
	/* We already own mutexes */
	if (trx->lock.was_chosen_as_wsrep_victim) {
		return lock_trx_handle_wait_low(trx);
	}
#endif /* WITH_WSREP */
	lock_mutex_enter();
	trx_mutex_enter(trx);
	dberr_t err = lock_trx_handle_wait_low(trx);
	lock_mutex_exit();
	trx_mutex_exit(trx);
	return err;
}

Note that LOCK thd_data protects from two concurrent KILLs.

Comment by Jan Lindström (Inactive) [ 2020-01-16 ]

https://github.com/MariaDB/server/commit/82c44f8298831ad167130741b6e48e7316ef8e47

This should fix issue on 10.2/10.3. 10.4 fix will have same idea.

Comment by Marko Mäkelä [ 2020-03-02 ]

Given that there already was a commit in 10.1.39, 10.2.24, 10.3.14, 10.4.4, 10.5.0 that references this ticket, I think that to avoid further confusion, a separate ticket should be filed for the remaining fix and the original title of this ticket should be restored.

Furthermore, given that 10.4 introduced Galera 4 and that there are significant differences around trx_sys between 10.2 and 10.3, I think that for a proper review, we would need thoroughly tested fixes for all of 10.2, 10.3, 10.4.

Comment by Julien Fritsch [ 2020-12-03 ]

Found in disabled.def that it was disabled, also I'm adding the missing label.

Comment by Julien Fritsch [ 2020-12-03 ]

Found in disabled.def for ES that it was disabled, also I'm adding the missing label.

Generated at Thu Feb 08 08:44:17 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.