[MDEV-29456] Spider hangs in 'Waiting for table metadata lock' state on CREATE TABLE after LOCK TABLES Created: 2022-09-03  Updated: 2023-11-22  Resolved: 2023-11-17

Status: Closed
Project: MariaDB Server
Component/s: Locking, Storage Engine - Spider
Affects Version/s: 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 10.10, 10.11, 11.1
Fix Version/s: 10.4.33, 10.5.24, 10.6.17, 10.11.7, 11.0.5, 11.1.4, 11.2.3

Type: Bug Priority: Major
Reporter: Roel Van de Paar Assignee: Yuchen Pei
Resolution: Fixed Votes: 0
Labels: hang, locking, thread_hang

Issue Links:
Blocks
is blocked by MDEV-30014 heap-use-after-free in ha_spider::lo... Closed
PartOf
is part of MDEV-30014 heap-use-after-free in ha_spider::lo... Closed
Relates
relates to MDEV-29484 Assertion `!trx_free || !trx->locked_... Closed

 Description   

This is a serious locking bug where Spider fails to see that table t was already created previously (and not locked?).
Affects optimized and debug builds. Issue is not sporadic.

INSTALL PLUGIN Spider SONAME 'ha_spider.so';
CREATE USER Spider@localhost IDENTIFIED BY 'PWD123';
CREATE SERVER srv FOREIGN DATA WRAPPER MYSQL OPTIONS (SOCKET '../socket.sock',DATABASE 'test',user 'Spider',PASSWORD 'PWD123');
CREATE TABLE t (c INT) ENGINE=InnoDB;
CREATE TABLE t1 (c INT) ENGINE=Spider;
CREATE TABLE t2 (c INT) ENGINE=Spider COMMENT='WRAPPER "mysql",srv "srv",TABLE "t"';
LOCK TABLES t2 WRITE;
LOCK TABLES t1 READ;
CREATE TABLE t (c INT) ENGINE=Spider;

Leads to:

10.11.0 bc563f1a4b0b38de3b41fd0f0d3d8b7f1aacbd8b (Optimized)

10.11.0-dbg>LOCK TABLES t1 READ;
ERROR 1429 (HY000): Unable to connect to foreign data source: localhost
10.11.0-dbg>CREATE TABLE t(c INT) ENGINE=Spider;
# Hang

10.11.0 bc563f1a4b0b38de3b41fd0f0d3d8b7f1aacbd8b (Optimized)

10.11.0-opt>SHOW FULL PROCESSLIST;
+----+-------------+-----------+------+---------+------+----------------------------------------------------+-------------------------------------+----------+
| Id | User        | Host      | db   | Command | Time | State                                              | Info                                | Progress |
+----+-------------+-----------+------+---------+------+----------------------------------------------------+-------------------------------------+----------+
|  5 | root        | localhost | test | Query   |   77 | Waiting for table metadata lock                    | CREATE TABLE t(c INT) ENGINE=Spider |    0.000 |
|  6 | system user |           | NULL | Daemon  | NULL | Spider table background statistics action handler  | NULL                                |    0.000 |
|  7 | system user |           | NULL | Daemon  | NULL | Spider table background cardinality action handler | NULL                                |    0.000 |
|  8 | system user |           | NULL | Daemon  | NULL | Spider table background statistics action handler  | NULL                                |    0.000 |
|  9 | system user |           | NULL | Daemon  | NULL | Spider table background cardinality action handler | NULL                                |    0.000 |
| 10 | system user |           | NULL | Daemon  | NULL | Spider table background statistics action handler  | NULL                                |    0.000 |
| 11 | system user |           | NULL | Daemon  | NULL | Spider table background statistics action handler  | NULL                                |    0.000 |
| 12 | system user |           | NULL | Sleep   |   77 | Reset for next command                             | NULL                                |    0.000 |
| 13 | system user |           | NULL | Daemon  | NULL | Spider table background statistics action handler  | NULL                                |    0.000 |
| 14 | system user |           | NULL | Daemon  | NULL | Spider table background cardinality action handler | NULL                                |    0.000 |
| 17 | system user |           | NULL | Daemon  | NULL | Spider table background cardinality action handler | NULL                                |    0.000 |
| 18 | system user |           | NULL | Daemon  | NULL | Spider table background cardinality action handler | NULL                                |    0.000 |
| 20 | system user |           | NULL | Daemon  | NULL | Spider table background cardinality action handler | NULL                                |    0.000 |
| 19 | system user |           | NULL | Daemon  | NULL | Spider table background cardinality action handler | NULL                                |    0.000 |
| 24 | system user |           | NULL | Daemon  | NULL | Spider table background cardinality action handler | NULL                                |    0.000 |
| 15 | system user |           | NULL | Daemon  | NULL | Spider table background cardinality action handler | NULL                                |    0.000 |
| 22 | system user |           | NULL | Daemon  | NULL | Spider table background statistics action handler  | NULL                                |    0.000 |
| 16 | system user |           | NULL | Daemon  | NULL | Spider table background cardinality action handler | NULL                                |    0.000 |
| 21 | system user |           | NULL | Daemon  | NULL | Spider table background statistics action handler  | NULL                                |    0.000 |
| 23 | system user |           | NULL | Daemon  | NULL | Spider table background statistics action handler  | NULL                                |    0.000 |
| 25 | system user |           | NULL | Daemon  | NULL | Spider table background statistics action handler  | NULL                                |    0.000 |
| 26 | spider      | localhost | test | Sleep   |   77 |                                                    | NULL                                |    0.000 |
| 27 | root        | localhost | test | Query   |    0 | starting                                           | SHOW FULL PROCESSLIST               |    0.000 |
+----+-------------+-----------+------+---------+------+----------------------------------------------------+-------------------------------------+----------+
23 rows in set (0.001 sec)



 Comments   
Comment by Roel Van de Paar [ 2022-09-03 ]

Removing only table t1 references (which should make no difference) leads to this odd result (it seems that instead the server should report that table t already exists):

10.11.0 bc563f1a4b0b38de3b41fd0f0d3d8b7f1aacbd8b (Optimized)

10.11.0-opt>INSTALL PLUGIN Spider SONAME 'ha_spider.so';
Query OK, 0 rows affected, 1 warning (0.008 sec)
 
10.11.0-opt>CREATE USER Spider@localhost IDENTIFIED BY 'PWD123';
Query OK, 0 rows affected (0.027 sec)
 
10.11.0-opt>CREATE SERVER srv FOREIGN DATA WRAPPER MYSQL OPTIONS (SOCKET '../socket.sock',DATABASE 'test',user 'Spider',PASSWORD 'PWD123');
Query OK, 0 rows affected (0.001 sec)
 
10.11.0-opt>CREATE TABLE t (c INT) ENGINE=InnoDB;
Query OK, 0 rows affected (0.022 sec)
 
10.11.0-opt>CREATE TABLE t2 (c INT) ENGINE=Spider COMMENT='WRAPPER "mysql",srv "srv",TABLE "t"';
Query OK, 0 rows affected (0.380 sec)
 
10.11.0-opt>LOCK TABLES t2 WRITE;
Query OK, 0 rows affected (0.013 sec)
 
10.11.0-opt>CREATE TABLE t (c INT) ENGINE=Spider;
ERROR 1100 (HY000): Table 't' was not locked with LOCK TABLES

Comment by Nayuta Yanagisawa (Inactive) [ 2022-09-05 ]

The table t is locked by LOCK TABLES t2 WRITE because it is referenced by t2 and the lock is kept even the failing LOCK TABLES t1 READ is executed. The table metadata lock is acquired before the table existence check, and thus, it is natural for CREATE TABLE t (c INT) ENGINE=Spider to result in 'Waiting for table metadata lock.'

One thing to be noted here is that all the table locks on the root user connection are released by LOCK TABLES t1 READ, but the ones on the Spider user connection are not released.

The table locks on the Spider user connection might be better to be released by the failing LOCK TABLES t1 READ too, but I think this wouldn't be a serious bug.

Comment by Roel Van de Paar [ 2022-09-06 ]

nayuta-yanagisawa Thank you for the analysis. I would like to understand better and have three questions. The last question also has three parts.

1. If I create a similar setup in InnoDB;

10.11.0 bc563f1a4b0b38de3b41fd0f0d3d8b7f1aacbd8b (Debug)

10.11.0-dbg>CREATE TABLE t (c INT) ENGINE=InnoDB;
Query OK, 0 rows affected (0.019 sec)
 
10.11.0-dbg>LOCK TABLES t WRITE;
Query OK, 0 rows affected (0.002 sec)
 
10.11.0-dbg>CREATE TABLE t (c2 INT) ENGINE=InnoDB;
ERROR 1050 (42S01): Table 't' already exists

And, perhaps, less relevant, but interesting;

10.11.0 bc563f1a4b0b38de3b41fd0f0d3d8b7f1aacbd8b (Debug)

10.11.0-dbg>CREATE TABLE t (c INT) ENGINE=InnoDB;
Query OK, 0 rows affected (0.023 sec)
 
10.11.0-dbg>LOCK TABLES t WRITE ;
Query OK, 0 rows affected (0.002 sec)
 
10.11.0-dbg>LOCK TABLES t READ;
Query OK, 0 rows affected (0.000 sec)
 
10.11.0-dbg>CREATE TABLE t (c2 INT) ENGINE=InnoDB;
ERROR 1099 (HY000): Table 't' was locked with a READ lock and can't be updated

An ERROR 1050 is returned when attempting to create a table which already exists - should Spider behave in the same way?

2. In the first comment below the description, removing the t1 references should not have made any difference?

3. In the same comment, it is shown that the LOCK TABLES t2 WRITE does not lock 'c' correctly based on the ERROR 1100? Also, why did the server not report that the table existed? Finally, why does it show the 1100 ERROR?

Thank you!

Comment by Nayuta Yanagisawa (Inactive) [ 2022-09-07 ]

The point is that there are two connections. One is the connection by the MariaDB client. I call this "the root user connection". The other connection is by Spider. I call this "the Spider user connection" because it uses the user named Spider in the issue description.

  1. I don't surprise that the first InnoDB example behaves differently from the Spider example in the issue description. In the Spider example, the InnoDB table is locked by the Spider user connection but CREATE TABLE t (c INT) ENGINE=Spider is executed by the root user connection. On the other hand, in the above InnoDB example, both LOCK TABLES t WRITE and CREATE TABLE t (c2 INT) ENGINE=InnoDB are executed by the same root user connection.
  2. What do you mean by "t1 references" in this context? Just mean LOCK TABLES t1 READ? If so, I think it should make difference because the lock on t2 should be released by the lock table statement on t1.
  3. CREATE TABLE t (c INT) ENGINE=Spider is executed by the root user connection, and the table t is locked by the Spider user connection. So, it doesn't look odd to me that the create table statement reports ERROR 1100 because the root user connection is locking t2.
Comment by Roel Van de Paar [ 2022-09-07 ]

nayuta-yanagisawa Thank you for the input.

So if I understand you correctly, for items 1+3, you are saying that LOCK TABLES t2 WRITE; - executed by the root at the CLI - will trigger the Spider engine to automatically lock table t - in a background (and non-user) thread, which uses the Spider UserID, and is done by the server/the Spider SE - because t is being referenced/linked in the t2 table definition?

For item 2, "t1 references" means all SQL which refers to t1, i.e. the test as given in this comment. I was unaware that the write lock on t2 was supposed to be released by the read lock on t1.

Comment by Roel Van de Paar [ 2022-09-07 ]

Please see MDEV-29484 Assertion `!trx_free || !trx->locked_connections' failed in spider_free_trx_conn on LOCK TABLES

Comment by Roel Van de Paar [ 2022-09-07 ]

Please note that SET GLOBAL spider_same_server_link=ON; is not used here.

Comment by Nayuta Yanagisawa (Inactive) [ 2022-09-07 ]

So if I understand you correctly, for items 1+3, you are saying that LOCK TABLES t2 WRITE; - executed by the root at the CLI - will trigger the Spider engine to automatically lock table t - in a background (and non-user) thread, which uses the Spider UserID, and is done by the server/the Spider SE - because t is being referenced/linked in the t2 table definition?

Mostly yes, but there is no background thread. The thread handling the user CLI connection also handles the Spider (background) connection.

Comment by Nayuta Yanagisawa (Inactive) [ 2022-09-15 ]

I believe that the following test case found in the investigation for MDEV-29484 reproduces the present bug.

--disable_query_log
--disable_result_log
--source ../../t/test_init.inc
--enable_result_log
--enable_query_log
 
--connection child2_1
CREATE DATABASE auto_test_remote;
USE auto_test_remote;
eval CREATE TABLE tbl_a (
    a INT
) $CHILD2_1_ENGINE $CHILD2_1_CHARSET;
 
--connection master_1
CREATE DATABASE auto_test_local;
USE auto_test_local;
eval CREATE TABLE tbl_a (
    a INT
) $MASTER_1_ENGINE $MASTER_1_CHARSET COMMENT='table "tbl_a", srv "s_2_1"';
eval CREATE TABLE tbl_b (
    a INT
) $MASTER_1_ENGINE $MASTER_1_CHARSET;
 
LOCK TABLES tbl_a WRITE;
--error 1429
LOCK TABLES tbl_b READ, tbl_a READ;
 
--connection master_1
DROP DATABASE IF EXISTS auto_test_local;
 
--connection child2_1
DROP DATABASE IF EXISTS auto_test_remote;
 
--disable_query_log
--disable_result_log
--source ../t/test_deinit.inc
--enable_query_log
--enable_result_log

Comment by Yuchen Pei [ 2023-05-16 ]

I can reprod the hang at 11.1 (4e5b771e980edfdad5c5414aa62c81d409d585a4) when running the test case in the previous comment.

Comment by Yuchen Pei [ 2023-11-17 ]

fixed by MDEV-30014

Generated at Thu Feb 08 10:08:44 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.