[MDEV-24985] Shutdown fails to abort current InnoDB lock waits - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.6
Fix Version/s: 10.6.2
Component/s: Server
Labels:
- regression-10.6

Description

After ~~MDEV-24671~~, a test is consuming considerably more time:

10.6 7f1ab8f742ac42e82dd39a6e11390254cb72319c
innodb.xa_recovery 'innodb' [ pass ] 4439

10.6 e71e6133535da8d5eab86e504f0b116a03680780
innodb.xa_recovery 'innodb' [ pass ] 53591

The time difference is explained by the following: ~~MDEV-24671~~ typically saves 1 second of each lock wait timeout, and the default innodb_lock_wait_timeout is 50 seconds. If you invoke the test like

/mtr --mysqld=--innodb-lock-wait-timeout=2 innodb.xa_recovery

then it would complete 48 seconds sooner (5782ms for my attempt).

I determined that the time is being spent in the main thread, in close_connections():

  for (int i= 0; (thread_count - binlog_dump_thread_count) && i < 1000; i++)

    my_sleep(20000);

During this time, the lock_wait() for the following is being waited for:

connect (con1,localhost,root);

--send SELECT * FROM t1 LOCK IN SHARE MODE

Only after the lock wait times out, the close_connections() will proceed to actually invoke innobase_end(). So, until that time, InnoDB is not given any signal that a shutdown has been requested.

If InnoDB knew about the shutdown, it could abort the lock wait earlier.

I initially suspected that this bug is not really caused by ~~MDEV-24671~~, that is, if a transaction was already waiting for a lock, it would remain in that state. But, it looks like even with a patch like this:

diff --git a/mysql-test/suite/innodb/t/xa_recovery.test b/mysql-test/suite/innodb/t/xa_recovery.test

index bb8e3316860..2317bd3f122 100644

--- a/mysql-test/suite/innodb/t/xa_recovery.test

+++ b/mysql-test/suite/innodb/t/xa_recovery.test

@@ -40,6 +40,7 @@ let $wait_condition=

 --source include/wait_condition.inc

 DROP TABLE t2;

+sleep 2;

 --source include/restart_mysqld.inc

the test before ~~MDEV-24671~~ would finish only 2 seconds slower, and not end up waiting for innodb_lock_wait_timeout.

Before ~~MDEV-24671~~, the wait is terminated by lock_wait_timeout_task(), which will wake up once per second and wake up any waiting threads:

10.6 7f1ab8f742ac42e82dd39a6e11390254cb72319c
Thread 6 hit Breakpoint 2, lock_wait_timeout_task ()
at /mariadb/10.6/storage/innobase/lock/lock0wait.cc:475
475 mysql_mutex_lock(&lock_sys.wait_mutex);
(rr) thread apply 1 backtrace
…
#9 0x000055a4d8ae1170 in my_sleep (m_seconds=<optimized out>) at /mariadb/10.6/mysys/my_sleep.c:29
#10 0x000055a4d8048fc8 in close_connections () at /mariadb/10.6/sql/mysqld.cc:1769

wlad, can you help with this? You solved similar problems in ~~MDEV-16264~~. Could we have an innobase_hton->kill_level_changed(THD*) that is always invoked when thd_kill_level() would change?
I am pretty sure that we do want to give non-waiting transactions a chance to complete execution; invoking innobase_hton->close_connection or innobase_hton->kill_query in their current form ought to be out of the question.

Attachments

Issue Links

is caused by

MDEV-24671 Assertion failure in lock_wait_table_reserve_slot()

Closed

Activity

People

Assignee:: Vladislav Vaintroub

Reporter:: Marko Mäkelä

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 2021-02-25 14:51

Updated:: 2021-06-05 14:17

Resolved:: 2021-06-05 14:17

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.