Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-35567

Unexpected table metadata locks held by background operations can cause phantom ER_LOCK_WAIT_TIMEOUT and possibly other errors

Details

    Description

      In the scenario below, simple DML is performed on an InnoDB table and the transaction is committed. Occasionally there is a table metadata lock held by a system thread after that.

      Note that the table is created with InnoDB persistent statistics turned off.

      The behavior apparently changed after this commit in 10.5.1

      commit ea37b144094a0c2ebfc6774047fd473c1b2a8658
      Author: Marko Mäkelä
      Date:   Tue Dec 10 15:42:50 2019 +0200
       
          MDEV-16678 Prefer MDL to dict_sys.latch for innodb background tasks
      

      The lock can be long enough to cause, for example, ER_LOCK_WAIT_TIMEOUT on non-waiting operations, even though from the user's point of view it is a single-threaded action. Possibly in more complicated concurrent scenarios unexpected deadlocks or at least delays (causing performance issues) can also occur.

      The test cases are highly non-deterministic and need to be run with --repeat=N.

      The test case with metadata lock info needs a result file to be recorded (with the empty result set upon SELECT from I_S), so that --repeat=N fails when a lock is found. It fails reasonably quickly for me on 10.5 (within ~5 tries). It takes much longer on 10.6, several hundred attempts, even longer on 10.11, although I got it a couple times there, and I've never seen it on 11.4+ so far (although given the probability trend, it doesn't mean much).

      The test case with non-waiting drop table doesn't need a result file, but it fails more rarely.

      METADATA_LOCK_INFO

      --source include/have_innodb.inc
       
      install soname 'metadata_lock_info';
       
      create table t (a int) engine=InnoDB stats_persistent=0;
       
      start transaction;
      insert into t set a = 1;
      select * from t;
      commit;
       
      select thread_id, lock_mode, lock_type, table_name from information_schema.metadata_lock_info;
       
      drop table t;
      uninstall soname 'metadata_lock_info';
      

      10.5 fdb6db6b47f1825eabffde76c29d9b94545f1ef4

       commit;
       select thread_id, lock_mode, lock_type, table_name from information_schema.metadata_lock_info;
       thread_id	lock_mode	lock_type	table_name
      +0	MDL_SHARED	Table metadata lock	t
       drop table t;
       uninstall soname 'metadata_lock_info';
       
      Result length mismatch
      

      Timeout upon DROP TABLE

      --source include/have_innodb.inc
       
      create table t (a int) engine=InnoDB stats_persistent=0;
      set lock_wait_timeout=0, innodb_lock_wait_timeout=0;
       
      start transaction;
      insert into t set a = 1;
      select * from t;
      commit;
       
      drop table t;
      

      bug.t1                                   [ 20 fail ]
              Test ended at 2024-12-04 14:29:07
       
      CURRENT_TEST: bug.t1
      mysqltest: At line 11: query 'drop table t' failed: 1205: Lock wait timeout exceeded; try restarting transaction
      

      Attachments

        Issue Links

          Activity

            elenst Elena Stepanova created issue -
            elenst Elena Stepanova made changes -
            Field Original Value New Value
            Description In the scenario below, simple DML is performed on an InnoDB table and the transaction is committed. Occasionally there is a table metadata lock held by a system thread after that.

            Note that the table is created with InnoDB persistent statistics turned off.

            The behavior apparently changed after this commit in 10.5.1
            {noformat}
            commit ea37b144094a0c2ebfc6774047fd473c1b2a8658
            Author: Marko Mäkelä
            Date: Tue Dec 10 15:42:50 2019 +0200

                MDEV-16678 Prefer MDL to dict_sys.latch for innodb background tasks
            {noformat}

            The lock can be long enough to cause, for example, ER_LOCK_WAIT_TIMEOUT on non-waiting operations, even though from the user's point of view it is a single-threaded action. Possibly in more complicated concurrent scenarios unexpected deadlocks or at least delays (causing performance issues) can also occur.

            The test cases are highly non-deterministic and need to be run with --repeat=N.

            The test case with metadata lock info needs a result file to be recorded (with the empty result set upon SELECT from I_S), so that --repeat=N fails when a lock is found. It fails reasonably quickly for me on 10.5 (within ~5 tries). It takes *much* longer on 10.6, several hundred attempts, even longer on 10.11, although I got it a couple times there, and I've never seen it on 11.4+ so far (although given the probability trend, it doesn't mean much).

            The test case with non-waiting drop table doesn't need a result file, but it fails more rarely.

            {code:sql|title=METADATA_LOCK_INFO}
            --source include/have_innodb.inc

            install soname 'metadata_lock_info';

            create table t (a int) engine=InnoDB stats_persistent=0;

            start transaction;
            insert into t set a = 1;
            select * from t;
            commit;

            select thread_id, lock_mode, lock_type, table_name from information_schema.metadata_lock_info;

            drop table t;
            uninstall soname 'metadata_lock_info';
            {code}

            {code:sql|title=10.5 fdb6db6b47f1825eabffde76c29d9b94545f1ef4}
             commit;
             select thread_id, lock_mode, lock_type, table_name from information_schema.metadata_lock_info;
             thread_id lock_mode lock_type table_name
            +0 MDL_SHARED Table metadata lock t
             drop table t;
             uninstall soname 'metadata_lock_info';

            Result length mismatch
            {code}

            {code:sql|title=Timeout upon DROP TABLE}
            --source include/have_innodb.inc

            create table t (a int) engine=InnoDB stats_persistent=0;
            set lock_wait_timeout=0, innodb_lock_wait_timeout=0;

            start transaction;
            insert into t set a = 1;
            select * from t;
            commit;

            drop table t;
            {code}
            {code:sql}
            bug.t1 [ 20 fail ]
                    Test ended at 2024-12-04 14:29:07

            CURRENT_TEST: bug.t1
            mysqltest: At line 11: query 'drop table t' failed: 1205: Lock wait timeout exceeded; try restarting transaction
            {code}
            In the scenario below, simple DML is performed on an InnoDB table and the transaction is committed. Occasionally there is a table metadata lock held by a system thread after that.

            Note that the table is created with *InnoDB persistent statistics turned off*.

            The behavior apparently changed after this commit in 10.5.1
            {noformat}
            commit ea37b144094a0c2ebfc6774047fd473c1b2a8658
            Author: Marko Mäkelä
            Date: Tue Dec 10 15:42:50 2019 +0200

                MDEV-16678 Prefer MDL to dict_sys.latch for innodb background tasks
            {noformat}

            The lock can be long enough to cause, for example, ER_LOCK_WAIT_TIMEOUT on non-waiting operations, even though from the user's point of view it is a single-threaded action. Possibly in more complicated concurrent scenarios unexpected deadlocks or at least delays (causing performance issues) can also occur.

            The test cases are highly non-deterministic and need to be run with --repeat=N.

            The test case with metadata lock info needs a result file to be recorded (with the empty result set upon SELECT from I_S), so that --repeat=N fails when a lock is found. It fails reasonably quickly for me on 10.5 (within ~5 tries). It takes *much* longer on 10.6, several hundred attempts, even longer on 10.11, although I got it a couple times there, and I've never seen it on 11.4+ so far (although given the probability trend, it doesn't mean much).

            The test case with non-waiting drop table doesn't need a result file, but it fails more rarely.

            {code:sql|title=METADATA_LOCK_INFO}
            --source include/have_innodb.inc

            install soname 'metadata_lock_info';

            create table t (a int) engine=InnoDB stats_persistent=0;

            start transaction;
            insert into t set a = 1;
            select * from t;
            commit;

            select thread_id, lock_mode, lock_type, table_name from information_schema.metadata_lock_info;

            drop table t;
            uninstall soname 'metadata_lock_info';
            {code}

            {code:sql|title=10.5 fdb6db6b47f1825eabffde76c29d9b94545f1ef4}
             commit;
             select thread_id, lock_mode, lock_type, table_name from information_schema.metadata_lock_info;
             thread_id lock_mode lock_type table_name
            +0 MDL_SHARED Table metadata lock t
             drop table t;
             uninstall soname 'metadata_lock_info';

            Result length mismatch
            {code}

            {code:sql|title=Timeout upon DROP TABLE}
            --source include/have_innodb.inc

            create table t (a int) engine=InnoDB stats_persistent=0;
            set lock_wait_timeout=0, innodb_lock_wait_timeout=0;

            start transaction;
            insert into t set a = 1;
            select * from t;
            commit;

            drop table t;
            {code}
            {code:sql}
            bug.t1 [ 20 fail ]
                    Test ended at 2024-12-04 14:29:07

            CURRENT_TEST: bug.t1
            mysqltest: At line 11: query 'drop table t' failed: 1205: Lock wait timeout exceeded; try restarting transaction
            {code}
            elenst Elena Stepanova made changes -
            Summary Unexpected able metadata locks held by background operations can cause phantom ER_LOCK_WAIT_TIMEOUT and possibly other errors Unexpected table metadata locks held by background operations can cause phantom ER_LOCK_WAIT_TIMEOUT and possibly other errors
            elenst Elena Stepanova made changes -

            The error could be reported due an InnoDB persistent statistics table name, or due to the name of the being-dropped table itself. If stats_persistent=0 had been explicitly specified for the table, we could streamline ha_innobase::delete_table() and avoid opening and locking the persistent statistics tables so that any entries could be deleted from them. If no stats_persistent is specified, then we should still assume that innodb_stats_persistent had been enabled at some point in the history, and try to delete the persistent statistics.

            marko Marko Mäkelä added a comment - The error could be reported due an InnoDB persistent statistics table name, or due to the name of the being-dropped table itself. If stats_persistent=0 had been explicitly specified for the table, we could streamline ha_innobase::delete_table() and avoid opening and locking the persistent statistics tables so that any entries could be deleted from them. If no stats_persistent is specified, then we should still assume that innodb_stats_persistent had been enabled at some point in the history, and try to delete the persistent statistics.

            People

              marko Marko Mäkelä
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.