[MDEV-32451] Hang when locking both spider table and remote table in the same statement Created: 2023-10-12  Updated: 2023-11-28

Status: Confirmed
Project: MariaDB Server
Component/s: Storage Engine - Spider
Affects Version/s: 10.4, 10.5, 10.6, 10.9, 10.10, 10.11, 11.0, 11.1, 11.2, 11.3
Fix Version/s: 10.4, 10.5, 10.6, 10.11, 11.0, 11.1, 11.2

Type: Bug Priority: Critical
Reporter: Yuchen Pei Assignee: Yuchen Pei
Resolution: Unresolved Votes: 0
Labels: hang, locking

Issue Links:
Issue split
split from MDEV-29854 SIGSEGV in spider_string::length on ... Closed
Relates
relates to MDEV-29854 SIGSEGV in spider_string::length on ... Closed

 Description   

A simplified version of a test mentioned in a comment in MDEV-29854[1], tested at 11.0 0199a5d0f634051f23d4c7f7ce7651c7f93c818f. Probably also existing in other versions.

--echo #
--echo # MDEV-XXXXX Hang when locking both spider table and remote table in the same statement
--echo #
--disable_query_log
--disable_result_log
--source ../../t/test_init.inc
--enable_result_log
--enable_query_log
evalp CREATE SERVER srv FOREIGN DATA WRAPPER mysql
OPTIONS (SOCKET "$MASTER_1_MYSOCK", DATABASE 'test',user 'root');
 
create table t (c int);
create table t2 (c int) ENGINE=Spider
COMMENT='WRAPPER "mysql", srv "srv",TABLE "t"';
LOCK TABLES t WRITE,t2 WRITE;
drop table t, t2;
 
drop server srv;
--disable_query_log
--disable_result_log
--source ../../t/test_deinit.inc
--enable_result_log
--enable_query_log
--echo #
--echo # end of test mdev_XXXXX
--echo #

The metadata lock of t is acquired when opening tables. Then in lock_external() it iterates over all tables to lock. After locking t, when trying to lock t2, spider sends a query to the data node to lock t again. Somehow the MDL lock of t is not released yet, so the triggered open_table() hangs at waiting for MDL until lock_wait_timeout (default 1h) has passed.

Not sure whether it counts as a spider bug or a sql layer bug.

[1] https://jira.mariadb.org/browse/MDEV-29854?focusedCommentId=238531&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-238531

For comparison, if we simply lock tables with the same name / alias in one statement, the parser would pick out the issue and throw a 1066 ER_NONUNIQ_TABLE



 Comments   
Comment by Yuchen Pei [ 2023-10-12 ]

One possible solution is the same as MDEV-29676: set lock wait timeout to 1 second when sending queries to lock remote tables.

Comment by Roel Van de Paar [ 2023-10-14 ]

I used the original testcase routed via CLI to check for the hangs

INSTALL PLUGIN Spider SONAME 'ha_spider.so';
CREATE SERVER srv FOREIGN DATA WRAPPER MYSQL OPTIONS (SOCKET'../socket.sock',DATABASE'',USER'',PASSWORD '');  # Socket location must be valid. Other values not.
CREATE TABLE t (a INT KEY,b INT UNIQUE) ENGINE=InnoDB;
CREATE TABLE t2 (id INT,id2 INT,UNIQUE (id,id2),CONSTRAINT t_id_fk FOREIGN KEY(id2,id) REFERENCES t (id,id2)) ENGINE=Spider COMMENT='WRAPPER "mysql",srv "srv",TABLE "t"';
LOCK TABLES t WRITE,t2 WRITE;

All versions 10.4-11.3 dbg+opt hung. I confirmed the issue based on this.
I also tried the MTR testcase on 11.3, but got:

11.3.0 905c3d61e18ae6222d0d195c43d335046eec65d9

mysqltest: At line 15: query 'LOCK TABLES t WRITE,t2 WRITE' failed: ER_CONNECT_TO_FOREIGN_DATA_SOURCE (1429): Unable to connect to foreign data source: srv

Same outcome (1429) on 11.0.4 9880006b5ba28d7c7dc1596d958295c67efa5ad0 (Debug)

Generated at Thu Feb 08 10:31:27 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.