[MDEV-24985] Shutdown fails to abort current InnoDB lock waits Created: 2021-02-25  Updated: 2021-06-05  Resolved: 2021-06-05

Status: Closed
Project: MariaDB Server
Component/s: Server
Affects Version/s: 10.6
Fix Version/s: 10.6.2

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Vladislav Vaintroub
Resolution: Fixed Votes: 0
Labels: regression-10.6

Issue Links:
Problem/Incident
is caused by MDEV-24671 Assertion failure in lock_wait_table_... Closed

 Description   

After MDEV-24671, a test is consuming considerably more time:

10.6 7f1ab8f742ac42e82dd39a6e11390254cb72319c

innodb.xa_recovery 'innodb'              [ pass ]   4439

10.6 e71e6133535da8d5eab86e504f0b116a03680780

innodb.xa_recovery 'innodb'              [ pass ]  53591

The time difference is explained by the following: MDEV-24671 typically saves 1 second of each lock wait timeout, and the default innodb_lock_wait_timeout is 50 seconds. If you invoke the test like

/mtr --mysqld=--innodb-lock-wait-timeout=2 innodb.xa_recovery

then it would complete 48 seconds sooner (5782ms for my attempt).

I determined that the time is being spent in the main thread, in close_connections():

  for (int i= 0; (thread_count - binlog_dump_thread_count) && i < 1000; i++)
    my_sleep(20000);

During this time, the lock_wait() for the following is being waited for:

connect (con1,localhost,root);
--send SELECT * FROM t1 LOCK IN SHARE MODE

Only after the lock wait times out, the close_connections() will proceed to actually invoke innobase_end(). So, until that time, InnoDB is not given any signal that a shutdown has been requested.

If InnoDB knew about the shutdown, it could abort the lock wait earlier.

I initially suspected that this bug is not really caused by MDEV-24671, that is, if a transaction was already waiting for a lock, it would remain in that state. But, it looks like even with a patch like this:

diff --git a/mysql-test/suite/innodb/t/xa_recovery.test b/mysql-test/suite/innodb/t/xa_recovery.test
index bb8e3316860..2317bd3f122 100644
--- a/mysql-test/suite/innodb/t/xa_recovery.test
+++ b/mysql-test/suite/innodb/t/xa_recovery.test
@@ -40,6 +40,7 @@ let $wait_condition=
 --source include/wait_condition.inc
 
 DROP TABLE t2;
+sleep 2;
 
 --source include/restart_mysqld.inc
 

the test before MDEV-24671 would finish only 2 seconds slower, and not end up waiting for innodb_lock_wait_timeout.

Before MDEV-24671, the wait is terminated by lock_wait_timeout_task(), which will wake up once per second and wake up any waiting threads:

10.6 7f1ab8f742ac42e82dd39a6e11390254cb72319c

Thread 6 hit Breakpoint 2, lock_wait_timeout_task ()
    at /mariadb/10.6/storage/innobase/lock/lock0wait.cc:475
475	  mysql_mutex_lock(&lock_sys.wait_mutex);
(rr) thread apply 1 backtrace
#9  0x000055a4d8ae1170 in my_sleep (m_seconds=<optimized out>) at /mariadb/10.6/mysys/my_sleep.c:29
#10 0x000055a4d8048fc8 in close_connections () at /mariadb/10.6/sql/mysqld.cc:1769

wlad, can you help with this? You solved similar problems in MDEV-16264. Could we have an innobase_hton->kill_level_changed(THD*) that is always invoked when thd_kill_level() would change?
I am pretty sure that we do want to give non-waiting transactions a chance to complete execution; invoking innobase_hton->close_connection or innobase_hton->kill_query in their current form ought to be out of the question.



 Comments   
Comment by Marko Mäkelä [ 2021-02-25 ]

The following patch to the test suggests that shutdown is missing something that would properly emulate the execution of KILL CONNECTION on every active connection:

diff --git a/mysql-test/suite/innodb/t/xa_recovery.test b/mysql-test/suite/innodb/t/xa_recovery.test
index bb8e3316860..8619958ebc7 100644
--- a/mysql-test/suite/innodb/t/xa_recovery.test
+++ b/mysql-test/suite/innodb/t/xa_recovery.test
@@ -30,6 +30,7 @@ connection default;
 disconnect con1;
 disconnect con2;
 connect (con1,localhost,root);
+let $connection_id= `select connection_id()`;
 --send SELECT * FROM t1 LOCK IN SHARE MODE
 
 connection default;
@@ -40,6 +41,9 @@ let $wait_condition=
 --source include/wait_condition.inc
 
 DROP TABLE t2;
+--disable_query_log
+eval KILL CONNECTION $connection_id;
+--enable_query_log
 
 --source include/restart_mysqld.inc
 

With this patch, the test will complete in 3536ms for me. Without the patch, an additional innodb_lock_wait_timeout will be consumed.

Generated at Thu Feb 08 09:34:13 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.