A race condition may occur between the execution of transaction commit, and an execution of a KILL statement that would attempt to abort that transaction.
MDEV-17092 worked around this race condition by modifying InnoDB code. After that issue was closed, Sergey Vojtovich pointed out that this race condition would better be fixed above the storage engine layer:
If you look carefully into the above, you can conclude that thd->free_connection() can be called concurrently with KILL/thd->awake(). Which is the bug. And it is partially fixed in THD::~THD(), that is destructor waits for KILL completion:
He is quoting this code of THD::~THD() in 10.5:
And he seems to suggest that the empty critical section should be moved to THD::free_connection(). Note: in 10.2 and 10.3, that code is slightly different:
Nevertheless, it seems that we might want to do something like
It might turn out that the else branch is not needed. The empty lock/unlock pair would of course be added to THD::free_connection().
It might also turn out that all the Galera-specific changes need to be done in THD::free_connection(). (In that case, we would likely want to assign wsrep_rgi= NULL).
As part of this fix, the trx_t::free() instrumentation that was modified in
MDEV-22782 should be tightened: trx_t::mysql_thd and trx_t::state must be poisoned, because innobase_kill_connection() should no longer be invoked on a freed transaction of a freed connection. This should of course be validated with an RQG run similar to the one that reproduced MDEV-17092.