Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
10.5
Description
When a self-referencing spider table creation requires connection to send queries to the data node, it could cause a hang on an MDL deadlock, until lock wait timeout lapses. Here are two examples:
CREATE TABLE t ENGINE=Spider COMMENT='WRAPPER "mysql",srv "srv",TABLE "t"' AS SELECT 1; |
CREATE TABLE t ENGINE=Spider COMMENT='WRAPPER "mysql",srv "srv",TABLE "t"'; |
Note that the second example requires table discovery which results in spider connecting to the data node to query for the table structure.
In both examples, a set global lock_wait_timeout= 1; reveals that the hang is indeed waiting for the lock wait timeout.
The spider self-referencing detection is not called because the deadlock happens before call to any spider handler methods. To see this, place a set global lock_wait_timeout= 1; before either example. Run mtr --rr on the testcase Place a breakpoint at my_error and my_message on the condition that the error number is 1205 ER_LOCK_WAIT_TIMEOUT. Continue till the breakpoint is reached. Place another break point at mysql_parse, as well as rbreak ha_spider::.. Then do reverse-continue. The mysql_parse breakpoint is reached.
One solution is to temporarily set lock_wait_timeout to a small value when doing the "pre-query" in the first example, and table discovery query in the second, and suggest checking self-referencing in the error message, just like MDEV-29676.
Attachments
Issue Links
- relates to
-
MDEV-35783 create table ... as select ... results in ghost tdc if opening the table fails
-
- Open
-
- split from
-
MDEV-29605 SIGSEGV in spider_db_ping, ASAN heap-use-after-free in spider_db_ping and UBSAN dynamic-type-mismatch in spider_db_ping on CREATE TABLE
-
- Closed
-
- split to
-
MDEV-35783 create table ... as select ... results in ghost tdc if opening the table fails
-
- Open
-
-
MDEV-35794 spider table discovery does not clear errors
-
- Open
-
It is not as a quickfix as I expected because of issues revealed: MDEV-35783 and MDEV-35794. The set statement lock_wait_timeout should fix the bug in this issue (see e.g. bb-10.5-mdev-35781 1e88a92e767e0edc9e7b5442eced2b4422e63434) but the testcase will not work cleanly without addressing those two other issues. I will get back to critical bugs and return to these issues later.