[MDEV-35567] Unexpected table metadata locks held by background operations can cause phantom ER_LOCK_WAIT_TIMEOUT and possibly other errors - Jira

Details

Type: Bug
Status: Open (View Workflow)
Priority: Minor
Resolution: Unresolved
Affects Version/s: 10.5, 10.6, 10.11
Fix Version/s: 10.5, 10.6, 10.11
Component/s: Locking, Storage Engine - InnoDB
Labels:
None

Description

In the scenario below, simple DML is performed on an InnoDB table and the transaction is committed. Occasionally there is a table metadata lock held by a system thread after that.

Note that the table is created with InnoDB persistent statistics turned off.

The behavior apparently changed after this commit in 10.5.1

commit ea37b144094a0c2ebfc6774047fd473c1b2a8658

Author: Marko Mäkelä

Date:   Tue Dec 10 15:42:50 2019 +0200

    MDEV-16678 Prefer MDL to dict_sys.latch for innodb background tasks

The lock can be long enough to cause, for example, ER_LOCK_WAIT_TIMEOUT on non-waiting operations, even though from the user's point of view it is a single-threaded action. Possibly in more complicated concurrent scenarios unexpected deadlocks or at least delays (causing performance issues) can also occur.

The test cases are highly non-deterministic and need to be run with --repeat=N.

The test case with metadata lock info needs a result file to be recorded (with the empty result set upon SELECT from I_S), so that --repeat=N fails when a lock is found. It fails reasonably quickly for me on 10.5 (within ~5 tries). It takes much longer on 10.6, several hundred attempts, even longer on 10.11, although I got it a couple times there, and I've never seen it on 11.4+ so far (although given the probability trend, it doesn't mean much).

The test case with non-waiting drop table doesn't need a result file, but it fails more rarely.

METADATA_LOCK_INFO
--source include/have_innodb.inc

install soname 'metadata_lock_info';

create table t (a int) engine=InnoDB stats_persistent=0;

start transaction;
insert into t set a = 1;
select * from t;
commit;

select thread_id, lock_mode, lock_type, table_name from information_schema.metadata_lock_info;

drop table t;
uninstall soname 'metadata_lock_info';

10.5 fdb6db6b47f1825eabffde76c29d9b94545f1ef4
commit;
select thread_id, lock_mode, lock_type, table_name from information_schema.metadata_lock_info;
thread_id lock_mode lock_type table_name
+0 MDL_SHARED Table metadata lock t
drop table t;
uninstall soname 'metadata_lock_info';

Result length mismatch

Timeout upon DROP TABLE
--source include/have_innodb.inc

create table t (a int) engine=InnoDB stats_persistent=0;
set lock_wait_timeout=0, innodb_lock_wait_timeout=0;

start transaction;
insert into t set a = 1;
select * from t;
commit;

drop table t;

bug.t1                                   [ 20 fail ]

        Test ended at 2024-12-04 14:29:07

CURRENT_TEST: bug.t1

mysqltest: At line 11: query 'drop table t' failed: 1205: Lock wait timeout exceeded; try restarting transaction

Attachments

Issue Links

is caused by

MDEV-16678 Use MDL for innodb background threads instead of dict_operation_lock

Closed

Activity

Ascending order - Click to sort in descending order

Elena Stepanova created issue - 2024-12-04 12:33

Elena Stepanova made changes - 2024-12-04 12:33

Field	Original Value	New Value
Description	In the scenario below, simple DML is performed on an InnoDB table and the transaction is committed. Occasionally there is a table metadata lock held by a system thread after that. Note that the table is created with InnoDB persistent statistics turned off. The behavior apparently changed after this commit in 10.5.1 {noformat} commit ea37b144094a0c2ebfc6774047fd473c1b2a8658 Author: Marko Mäkelä Date: Tue Dec 10 15:42:50 2019 +0200 ~~MDEV-16678~~ Prefer MDL to dict_sys.latch for innodb background tasks {noformat} The lock can be long enough to cause, for example, ER_LOCK_WAIT_TIMEOUT on non-waiting operations, even though from the user's point of view it is a single-threaded action. Possibly in more complicated concurrent scenarios unexpected deadlocks or at least delays (causing performance issues) can also occur. The test cases are highly non-deterministic and need to be run with --repeat=N. The test case with metadata lock info needs a result file to be recorded (with the empty result set upon SELECT from I_S), so that --repeat=N fails when a lock is found. It fails reasonably quickly for me on 10.5 (within ~5 tries). It takes much longer on 10.6, several hundred attempts, even longer on 10.11, although I got it a couple times there, and I've never seen it on 11.4+ so far (although given the probability trend, it doesn't mean much). The test case with non-waiting drop table doesn't need a result file, but it fails more rarely. {code:sql\|title=METADATA_LOCK_INFO} --source include/have_innodb.inc install soname 'metadata_lock_info'; create table t (a int) engine=InnoDB stats_persistent=0; start transaction; insert into t set a = 1; select * from t; commit; select thread_id, lock_mode, lock_type, table_name from information_schema.metadata_lock_info; drop table t; uninstall soname 'metadata_lock_info'; {code} {code:sql\|title=10.5 fdb6db6b47f1825eabffde76c29d9b94545f1ef4} commit; select thread_id, lock_mode, lock_type, table_name from information_schema.metadata_lock_info; thread_id lock_mode lock_type table_name +0 MDL_SHARED Table metadata lock t drop table t; uninstall soname 'metadata_lock_info'; Result length mismatch {code} {code:sql\|title=Timeout upon DROP TABLE} --source include/have_innodb.inc create table t (a int) engine=InnoDB stats_persistent=0; set lock_wait_timeout=0, innodb_lock_wait_timeout=0; start transaction; insert into t set a = 1; select * from t; commit; drop table t; {code} {code:sql} bug.t1 [ 20 fail ] Test ended at 2024-12-04 14:29:07 CURRENT_TEST: bug.t1 mysqltest: At line 11: query 'drop table t' failed: 1205: Lock wait timeout exceeded; try restarting transaction {code}	In the scenario below, simple DML is performed on an InnoDB table and the transaction is committed. Occasionally there is a table metadata lock held by a system thread after that. Note that the table is created with InnoDB persistent statistics turned off. The behavior apparently changed after this commit in 10.5.1 {noformat} commit ea37b144094a0c2ebfc6774047fd473c1b2a8658 Author: Marko Mäkelä Date: Tue Dec 10 15:42:50 2019 +0200 ~~MDEV-16678~~ Prefer MDL to dict_sys.latch for innodb background tasks {noformat} The lock can be long enough to cause, for example, ER_LOCK_WAIT_TIMEOUT on non-waiting operations, even though from the user's point of view it is a single-threaded action. Possibly in more complicated concurrent scenarios unexpected deadlocks or at least delays (causing performance issues) can also occur. The test cases are highly non-deterministic and need to be run with --repeat=N. The test case with metadata lock info needs a result file to be recorded (with the empty result set upon SELECT from I_S), so that --repeat=N fails when a lock is found. It fails reasonably quickly for me on 10.5 (within ~5 tries). It takes much longer on 10.6, several hundred attempts, even longer on 10.11, although I got it a couple times there, and I've never seen it on 11.4+ so far (although given the probability trend, it doesn't mean much). The test case with non-waiting drop table doesn't need a result file, but it fails more rarely. {code:sql\|title=METADATA_LOCK_INFO} --source include/have_innodb.inc install soname 'metadata_lock_info'; create table t (a int) engine=InnoDB stats_persistent=0; start transaction; insert into t set a = 1; select * from t; commit; select thread_id, lock_mode, lock_type, table_name from information_schema.metadata_lock_info; drop table t; uninstall soname 'metadata_lock_info'; {code} {code:sql\|title=10.5 fdb6db6b47f1825eabffde76c29d9b94545f1ef4} commit; select thread_id, lock_mode, lock_type, table_name from information_schema.metadata_lock_info; thread_id lock_mode lock_type table_name +0 MDL_SHARED Table metadata lock t drop table t; uninstall soname 'metadata_lock_info'; Result length mismatch {code} {code:sql\|title=Timeout upon DROP TABLE} --source include/have_innodb.inc create table t (a int) engine=InnoDB stats_persistent=0; set lock_wait_timeout=0, innodb_lock_wait_timeout=0; start transaction; insert into t set a = 1; select * from t; commit; drop table t; {code} {code:sql} bug.t1 [ 20 fail ] Test ended at 2024-12-04 14:29:07 CURRENT_TEST: bug.t1 mysqltest: At line 11: query 'drop table t' failed: 1205: Lock wait timeout exceeded; try restarting transaction {code}

Elena Stepanova made changes - 2024-12-04 12:33

Summary

Unexpected able metadata locks held by background operations can cause phantom ER_LOCK_WAIT_TIMEOUT and possibly other errors

Unexpected table metadata locks held by background operations can cause phantom ER_LOCK_WAIT_TIMEOUT and possibly other errors

Elena Stepanova made changes - 2024-12-04 12:36

Link

This issue is caused by ~~MDEV-16678~~ [ ~~MDEV-16678~~ ]

Marko Mäkelä added a comment - 2024-12-04 12:56

The error could be reported due an InnoDB persistent statistics table name, or due to the name of the being-dropped table itself. If stats_persistent=0 had been explicitly specified for the table, we could streamline ha_innobase::delete_table() and avoid opening and locking the persistent statistics tables so that any entries could be deleted from them. If no stats_persistent is specified, then we should still assume that innodb_stats_persistent had been enabled at some point in the history, and try to delete the persistent statistics.

Marko Mäkelä added a comment - 2024-12-04 12:56 The error could be reported due an InnoDB persistent statistics table name, or due to the name of the being-dropped table itself. If stats_persistent=0 had been explicitly specified for the table, we could streamline ha_innobase::delete_table() and avoid opening and locking the persistent statistics tables so that any entries could be deleted from them. If no stats_persistent is specified, then we should still assume that innodb_stats_persistent had been enabled at some point in the history, and try to delete the persistent statistics.

MariaDB Server

Unexpected table metadata locks held by background operations can cause phantom ER_LOCK_WAIT_TIMEOUT and possibly other errors

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration