Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-24985

Shutdown fails to abort current InnoDB lock waits

    XMLWordPrintable

Details

    Description

      After MDEV-24671, a test is consuming considerably more time:

      10.6 7f1ab8f742ac42e82dd39a6e11390254cb72319c

      innodb.xa_recovery 'innodb'              [ pass ]   4439
      

      10.6 e71e6133535da8d5eab86e504f0b116a03680780

      innodb.xa_recovery 'innodb'              [ pass ]  53591
      

      The time difference is explained by the following: MDEV-24671 typically saves 1 second of each lock wait timeout, and the default innodb_lock_wait_timeout is 50 seconds. If you invoke the test like

      /mtr --mysqld=--innodb-lock-wait-timeout=2 innodb.xa_recovery
      

      then it would complete 48 seconds sooner (5782ms for my attempt).

      I determined that the time is being spent in the main thread, in close_connections():

        for (int i= 0; (thread_count - binlog_dump_thread_count) && i < 1000; i++)
          my_sleep(20000);
      

      During this time, the lock_wait() for the following is being waited for:

      connect (con1,localhost,root);
      --send SELECT * FROM t1 LOCK IN SHARE MODE
      

      Only after the lock wait times out, the close_connections() will proceed to actually invoke innobase_end(). So, until that time, InnoDB is not given any signal that a shutdown has been requested.

      If InnoDB knew about the shutdown, it could abort the lock wait earlier.

      I initially suspected that this bug is not really caused by MDEV-24671, that is, if a transaction was already waiting for a lock, it would remain in that state. But, it looks like even with a patch like this:

      diff --git a/mysql-test/suite/innodb/t/xa_recovery.test b/mysql-test/suite/innodb/t/xa_recovery.test
      index bb8e3316860..2317bd3f122 100644
      --- a/mysql-test/suite/innodb/t/xa_recovery.test
      +++ b/mysql-test/suite/innodb/t/xa_recovery.test
      @@ -40,6 +40,7 @@ let $wait_condition=
       --source include/wait_condition.inc
       
       DROP TABLE t2;
      +sleep 2;
       
       --source include/restart_mysqld.inc
       
      

      the test before MDEV-24671 would finish only 2 seconds slower, and not end up waiting for innodb_lock_wait_timeout.

      Before MDEV-24671, the wait is terminated by lock_wait_timeout_task(), which will wake up once per second and wake up any waiting threads:

      10.6 7f1ab8f742ac42e82dd39a6e11390254cb72319c

      Thread 6 hit Breakpoint 2, lock_wait_timeout_task ()
          at /mariadb/10.6/storage/innobase/lock/lock0wait.cc:475
      475	  mysql_mutex_lock(&lock_sys.wait_mutex);
      (rr) thread apply 1 backtrace
      #9  0x000055a4d8ae1170 in my_sleep (m_seconds=<optimized out>) at /mariadb/10.6/mysys/my_sleep.c:29
      #10 0x000055a4d8048fc8 in close_connections () at /mariadb/10.6/sql/mysqld.cc:1769
      

      wlad, can you help with this? You solved similar problems in MDEV-16264. Could we have an innobase_hton->kill_level_changed(THD*) that is always invoked when thd_kill_level() would change?
      I am pretty sure that we do want to give non-waiting transactions a chance to complete execution; invoking innobase_hton->close_connection or innobase_hton->kill_query in their current form ought to be out of the question.

      Attachments

        Issue Links

          Activity

            People

              wlad Vladislav Vaintroub
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.