[MDEV-29676] Dual thread hang in 'closing tables' and 'Waiting for table metadata lock' on Spider CREATE OR REPLACE TABLE Created: 2022-10-01  Updated: 2023-10-19  Resolved: 2023-04-28

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - Spider
Affects Version/s: None
Fix Version/s: 11.1.1, 11.0.2

Type: Bug Priority: Critical
Reporter: Roel Van de Paar Assignee: Yuchen Pei
Resolution: Fixed Votes: 0
Labels: regression, shutdown_hang, thread_hang

Issue Links:
Blocks
blocks MDEV-31138 Enable the test spider/bugfix.mdev_29... Open
Problem/Incident
causes MDEV-31463 SIGSEGV in server_mysql_send_query | ... Closed
is caused by MDEV-25292 Atomic CREATE OR REPLACE TABLE Stalled
Relates
relates to MDEV-29583 Disallow Spider self/dual/multi/cross... Closed
relates to MDEV-30580 Investigate whether self-/multi- refe... Open
relates to MDEV-30994 Earlier detection of spider table sel... Open

 Description   

Please change socket file location as required. Use CLI for replay, or MTR testcase (ref comment below). Use branch bb-10.11-midenok-MDEV-25292 (atm).

INSTALL PLUGIN Spider SONAME 'ha_spider.so';
CREATE USER Spider@localhost IDENTIFIED BY 'PWD1';
CREATE SERVER srv FOREIGN DATA WRAPPER MYSQL OPTIONS (SOCKET '../socket.sock',DATABASE 'test',user 'Spider',PASSWORD 'PWD1');
CREATE TABLE t (c INT) ENGINE=Spider COMMENT='WRAPPER "mysql",srv "srv",TABLE "t"';
CREATE OR REPLACE TABLE t (c INT) ENGINE=Spider;

Will make 10.11 (opt+dbg) hang with:

10.11.0 49cee4e21a8e3cc0eccff3a6f9e493247344e24f (Optimized)

10.11.0-opt>SHOW FULL PROCESSLIST;
+----+-------------+-----------+------+---------+------+----------------------------------------------------+-------------------------------------------------+----------+
| Id | User        | Host      | db   | Command | Time | State                                              | Info                                            | Progress |
+----+-------------+-----------+------+---------+------+----------------------------------------------------+-------------------------------------------------+----------+
|  4 | root        | localhost | test | Query   |  303 | closing tables                                     | CREATE OR REPLACE TABLE t (c INT) ENGINE=Spider |    0.000 |
|  5 | system user |           | NULL | Daemon  | NULL | Spider table background cardinality action handler | NULL                                            |    0.000 |
|  6 | system user |           | NULL | Daemon  | NULL | Spider table background cardinality action handler | NULL                                            |    0.000 |
|  7 | system user |           | NULL | Daemon  | NULL | Spider table background cardinality action handler | NULL                                            |    0.000 |
|  8 | system user |           | NULL | Daemon  | NULL | Spider table background statistics action handler  | NULL                                            |    0.000 |
|  9 | system user |           | NULL | Daemon  | NULL | Spider table background cardinality action handler | NULL                                            |    0.000 |
| 11 | system user |           | NULL | Daemon  | NULL | Spider table background cardinality action handler | NULL                                            |    0.000 |
| 12 | system user |           | NULL | Daemon  | NULL | Spider table background statistics action handler  | NULL                                            |    0.000 |
| 13 | system user |           | NULL | Daemon  | NULL | Spider table background cardinality action handler | NULL                                            |    0.000 |
| 14 | system user |           | NULL | Daemon  | NULL | Spider table background cardinality action handler | NULL                                            |    0.000 |
| 15 | system user |           | NULL | Daemon  | NULL | Spider table background statistics action handler  | NULL                                            |    0.000 |
| 16 | system user |           | NULL | Sleep   |  303 | Reset for next command                             | NULL                                            |    0.000 |
| 17 | system user |           | NULL | Daemon  | NULL | Spider table background statistics action handler  | NULL                                            |    0.000 |
| 19 | system user |           | NULL | Daemon  | NULL | Spider table background statistics action handler  | NULL                                            |    0.000 |
| 20 | system user |           | NULL | Daemon  | NULL | Spider table background statistics action handler  | NULL                                            |    0.000 |
| 18 | system user |           | NULL | Daemon  | NULL | Spider table background statistics action handler  | NULL                                            |    0.000 |
| 10 | system user |           | NULL | Daemon  | NULL | Spider table background statistics action handler  | NULL                                            |    0.000 |
| 22 | system user |           | NULL | Daemon  | NULL | Spider table background cardinality action handler | NULL                                            |    0.000 |
| 21 | system user |           | NULL | Daemon  | NULL | Spider table background cardinality action handler | NULL                                            |    0.000 |
| 23 | system user |           | NULL | Daemon  | NULL | Spider table background statistics action handler  | NULL                                            |    0.000 |
| 24 | system user |           | NULL | Daemon  | NULL | Spider table background cardinality action handler | NULL                                            |    0.000 |
| 25 | Spider      | localhost | test | Query   |  303 | Waiting for table metadata lock                    | show table status from `test` like 't'          |    0.000 |
| 26 | root        | localhost | test | Query   |    0 | starting                                           | SHOW FULL PROCESSLIST                           |    0.000 |
+----+-------------+-----------+------+---------+------+----------------------------------------------------+-------------------------------------------------+----------+
23 rows in set (0.000 sec)



 Comments   
Comment by Roel Van de Paar [ 2022-10-01 ]

Commit cf6c5176328 (the last of the 14 commits related to MDEV-25929) is bad, commit 5369df741b3 is good.

Comment by Roel Van de Paar [ 2022-10-01 ]

After 20 minutes we see

10.11.0 6ebdd3013a18b01dbecec76b870810329eb76586 (Optimized)

...
10.11.0-opt>CREATE OR REPLACE TABLE t (c INT) ENGINE=Spider;
Query OK, 0 rows affected, 4 warnings (20 min 0.228 sec)
 
10.11.0-opt>SHOW WARNINGS;
+-------+------+-------------------------------------------+
| Level | Code | Message                                   |
+-------+------+-------------------------------------------+
| Error | 1159 | Got timeout reading communication packets |
| Error | 1159 |                                           |
| Error | 1159 | Got timeout reading communication packets |
| Error | 1159 |                                           |
+-------+------+-------------------------------------------+
4 rows in set (0.000 sec)

Comment by Sergei Golubchik [ 2022-10-01 ]

please, create an MTR test case for that

Comment by Aleksey Midenkov [ 2022-10-03 ]

CREATE OR REPLACE order of execution is like this :

1. MDL_EXCLUSIVE acquired;
2. Table is opened here:

#10 0x000056125f219eb5 in open_table (thd=0x7f29f002b7b8, table_list=0x7f29e80f5e30, ot_ctx=0x7f29e80f5df0) at ../src/sql/sql_base.cc:2179
#11 0x000056125f4c2d4a in create_table_impl (thd=0x7f29f002b7b8, orig_db=..., orig_table_name=..., db=..., table_name=..., path=..., options=..., create_info=0x7f29e80f6db0, alter_info=0x7f29e80f6cc0, create_table_mode=1, is_trans=0x7f29e80f6a67, key_info=0x7f29e80f6898, key_count=0x7f29e80f6894, frm=0x7f29e80f6660) at ../src/sql/sql_table.cc:4767

This table open is required for foreign key check. If there are references, we fail CREATE OR REPLACE. It is so early because there can be lengthy CREATE OR REPLACE .. SELECT .. and we don't need it to finish for that check.

3. The above table open triggers Spider command:

#0  mysql_real_query (mysql=0x7f29f0091b18, query=0x7f29f003ef08 "show table status from `test` like 't'", length=38) at ../src/sql-common/client.c:3553
#1  0x00007f2a035cbde2 in spider_db_mbase::exec_query (this=0x7f29f0053450, query=0x7f29f003ef08 "show table status from `test` like 't'", length=38, quick_mode=-1) at ../src/storage/spider/spd_db_mysql.cc:2078
#2  0x00007f2a034c5261 in spider_db_query (conn=0x7f29f0077538, query=0x7f29f003ef08 "show table status from `test` like 't'", length=38, quick_mode=-1, need_mon=0x7f29f0047b20) at ../src/storage/spider/spd_db_conn.cc:668
#3  0x00007f2a0360fe25 in spider_mbase_handler::show_table_status (this=0x7f29f0075670, link_idx=0, sts_mode=1, flag=88) at ../src/storage/spider/spd_db_mysql.cc:13558
#4  0x00007f2a034dbf5d in spider_db_show_table_status (spider=0x7f29f00438e0, link_idx=0, sts_mode=1, flag=88) at ../src/storage/spider/spd_db_conn.cc:5225
#5  0x00007f2a0353fcef in spider_get_sts (share=0x7f29f00400e8, link_idx=0, tmp_time=1664818372, spider=0x7f29f00438e0, sts_interval=10, sts_mode=1, sts_sync=0, sts_sync_level=1, flag=88) at ../src/storage/spider/spd_table.cc:7152
#6  0x00007f2a0353c5e2 in spider_get_share (table_name=0x7f29f0070e60 "./test/t", table=0x7f29f0047268, thd=0x7f29f002b7b8, spider=0x7f29f00438e0, error_num=0x7f29e80f4f54) at ../src/storage/spider/spd_table.cc:5062
#7  0x00007f2a03581e55 in ha_spider::open (this=0x7f29f00438e0, name=0x7f29f0070e60 "./test/t", mode=2, test_if_locked=18) at ../src/storage/spider/ha_spider.cc:313
#8  0x000056125efe86d2 in handler::ha_open (this=0x7f29f00438e0, table_arg=0x7f29f0047268, name=0x7f29f0070e60 "./test/t", mode=2, test_if_locked=18, mem_root=0x0, partitions_to_open=0x0) at ../src/sql/handler.cc:3332
#9  0x000056125f52dc55 in open_table_from_share (thd=0x7f29f002b7b8, share=0x7f29f00708a0, alias=0x7f29e80f5e50, db_stat=33, prgflag=8, ha_open_flags=18, outparam=0x7f29f0047268, is_create_table=false, partitions_to_open=0x0) at ../src/sql/table.cc:4451
#10 0x000056125f219eb5 in open_table (thd=0x7f29f002b7b8, table_list=0x7f29e80f5e30, ot_ctx=0x7f29e80f5df0) at ../src/sql/sql_base.cc:2179
#11 0x000056125f4c2d4a in create_table_impl (thd=0x7f29f002b7b8, orig_db=..., orig_table_name=..., db=..., table_name=..., path=..., options=..., create_info=0x7f29e80f6db0, alter_info=0x7f29e80f6cc0, create_table_mode=1, is_trans=0x7f29e80f6a67, key_info=0x7f29e80f6898, key_count=0x7f29e80f6894, frm=0x7f29e80f6660) at ../src/sql/sql_table.cc:4767

Since the spider table is on the same server SHOW TABLE STATUS tries to acquire MDL_SHARED_HIGH_PRIO and suspends.

From sql layer point of view there is nothing wrong with open_table() under MDL_EXCLUSIVE. The problem is Spider-local. The similar happens under 5369df741b3:

CREATE TABLE t (c INT) ENGINE=Spider COMMENT='WRAPPER "mysql",srv "srv",TABLE "t"';
lock table t write;

LOCK TABLE command halts. The bug is the duplicate for this test case.

Comment by Roel Van de Paar [ 2022-10-04 ]

--let $SOCKET= `SELECT @@global.socket`
INSTALL PLUGIN Spider SONAME 'ha_spider.so';
CREATE USER Spider@localhost IDENTIFIED BY 'PWD1';
eval CREATE SERVER srv FOREIGN DATA WRAPPER MYSQL OPTIONS (SOCKET "$SOCKET", DATABASE 'test',user 'Spider',PASSWORD 'PWD1');
CREATE TABLE t (c INT) ENGINE=Spider COMMENT='WRAPPER "mysql",srv "srv",TABLE "t"';
CREATE OR REPLACE TABLE t (c INT) ENGINE=Spider;

Comment by Roel Van de Paar [ 2022-10-04 ]

Whist the regression is triggered or introduced by MDEV-25292, from the above input by midenok and short discussion with nayuta-yanagisawa this looks to be a Spider issue. Assigning to nayuta-yanagisawa as requested.

Comment by Nayuta Yanagisawa (Inactive) [ 2022-10-04 ]

--echo #
--echo # MDEV-29676 Dual thread hang in 'closing tables' and 'Waiting for table metadata lock' on Spider CREATE OR REPLACE TABLE
--echo #
 
--disable_query_log
--disable_result_log
--source ../../t/test_init.inc
--enable_result_log
--enable_query_log
 
eval CREATE SERVER srv FOREIGN DATA WRAPPER MYSQL OPTIONS (SOCKET "$MASTER_1_MYSOCK", DATABASE 'test',user 'root');
CREATE TABLE t (c INT) ENGINE=Spider COMMENT='WRAPPER "mysql", srv "srv",TABLE "t"';
 
SET @old_lock_wait_timeout=@@global.lock_wait_timeout;
SET GLOBAL lock_wait_timeout=1;
CREATE OR REPLACE TABLE t (c INT) ENGINE=Spider;
SET GLOBAL lock_wait_timeout=@old_lock_wait_timeout;
 
DROP TABLE t;
 
--disable_query_log
--disable_result_log
--source ../../t/test_deinit.inc
--enable_result_log
--enable_query_log

Comment by Nayuta Yanagisawa (Inactive) [ 2022-10-04 ]

The first bad commit is 93c8252f02faa8ad8dc5f005e52f1990c29d4a0d, as expected.

Comment by Nayuta Yanagisawa (Inactive) [ 2022-10-04 ]

Roel Just FYI. Without test_deinit.inc, we need to clean up Spider's system tables manually. That is the reason why I prefer test_init.inc and test_deinit.inc.

Comment by Elena Stepanova [ 2022-10-04 ]

Note that the guilty CREATE OR REPLACE doesn't even have to have Spider as a resulting engine, any will do.

Comment by Roel Van de Paar [ 2022-10-04 ]

nayuta-yanagisawa Thank you.
Btw, if I run that testcase against a tarball build, it will result in:

CURRENT_TEST: main.test
mysqltest: At line 6: Could not open '../../t/test_init.inc' for reading, errno: 2

Which I understand is a known issue.

Comment by Nayuta Yanagisawa (Inactive) [ 2022-10-05 ]

elenst Thank you for pointing it out. I didn't aware of it.

Roel I believe that is because you put the test case in storage/spider/mysql-test/spider/t. In this case, the include path need to be changed to

--source test_init.inc

Comment by Yuchen Pei [ 2023-01-16 ]

The mtr test passes on the current 10.11 (bb3a63903e5192d39d93259ce2bb7635fcfca7a0). 10.11 contains the "bad" commit 93c8252f02faa8ad8dc5f005e52f1990c29d4a0d but the latter was reverted in 2bd41fc5bf765bb9b1b2caa872070b271fc7c039.

It also "passes" on the "bad" commit itself, but with timeout errors, so is bb-10.11-midenok-MDEV-25292 (21350534434) mentioned in [1]. If the timeout errors manifest the problem in the current report, then we need a better mtr case that fails at the timeout.

[1] https://jira.mariadb.org/browse/MDEV-29676?focusedCommentId=247200&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-247200

worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019
#
# MDEV-29676 Dual thread hang in 'closing tables' and 'Waiting for table metadata lock' on Spider CREATE OR REPLACE TABLE
#
for master_1
for child2
for child3
CREATE SERVER srv FOREIGN DATA WRAPPER MYSQL OPTIONS (SOCKET "/home/ycp/source/mariadb-server/mdev-29676/build/mysql-test/var/tmp/mysqld.1.1.sock", DATABASE 'test',user 'root');
CREATE TABLE t (c INT) ENGINE=Spider COMMENT='WRAPPER "mysql", srv "srv",TABLE "t"';
SET @old_lock_wait_timeout=@@global.lock_wait_timeout;
SET GLOBAL lock_wait_timeout=1;
CREATE OR REPLACE TABLE t (c INT) ENGINE=Spider;
Warnings:
Error	1205	Lock wait timeout exceeded; try restarting transaction
Error	1205	Lock wait timeout exceeded; try restarting transaction
SET GLOBAL lock_wait_timeout=@old_lock_wait_timeout;
DROP TABLE t;
for master_1
for child2
for child3
spider/bugfix.mdev_29676                 [ pass ]   2109
--------------------------------------------------------------------------
The servers were restarted 0 times
Spent 2.109 of 8 seconds executing testcases
 
Completed: All 1 tests were successful.

Comment by Elena Stepanova [ 2023-01-17 ]

The default lock_wait_timeout is 86400 seconds (1 day). Nobody usually waits this long in interactive tests, and non-interactive tools have shorter timeouts of their own. So yes, in the vast majority of cases, a "hang" is actually a lock wait until the timeout occurs. They should be distinguished from true infinite deadlocks by decreasing the lock_wait_timeout value and checking whether it affects the outcome.

Note that in case of Spider (or Federated), a special care should be taken to make sure that the modification of lock_wait_timeout applies to the right connection.

Comment by Yuchen Pei [ 2023-01-25 ]

Adding the testcase at https://github.com/MariaDB/server/commit/4a5f5f0d38e, and will push it once MDEV-30464 is reviewed. Still working on the analysis.

Comment by Roel Van de Paar [ 2023-01-31 ]

Confirmed that this issue does not seem to block MDEV-25292 because when self-reference is not used the server does not hang. However, the source of the hang remains MDEV-25292.

Comment by Yuchen Pei [ 2023-03-22 ]

The problem seems to be that when spider opens a table, it sends a
show table status query to the sql layer for the underlying data
node table, and the show table status query will try to open the
data node table. When the spider and the data node table are the
same (self-referencing), the lock conflict occurs.

Possible fixes:
1. Ban spider from opening data node tables in its open() method
2. Ban spider from opening data node tables when the original query
is CoR of the spider table
3. Make self-referencing detection happen before spider opens the
data node tables

On a higher level, when spider opens a table, it tries to get table
status from the system sts table or the result of a {{show table
status}} query for the data node table into share->stat. The
system sts table is updated using share->stat only when freeing
the share, which happens in "teardown" parts, for example when
spider closes a table. The same goes for the crd table, which
corresponds to executing show index.

This results in some interesting behaviour. For example, in the
following case, the sts and crd system tables are empty when a
spider table t2 is created referencing a non-spider table t1. And
when we replace t2 with an InnoDB table, there's a row for t2. The
row remains even when both t1 and t2 are dropped.

--source include/have_innodb.inc
--disable_query_log
--disable_result_log
--source ../../t/test_init.inc
--enable_result_log
--enable_query_log
 
eval CREATE SERVER srv FOREIGN DATA WRAPPER MYSQL OPTIONS (SOCKET "$MASTER_1_MYSOCK", DATABASE 'test',user 'root');
 
CREATE TABLE t1 (c INT);
CREATE TABLE t2 (c INT) ENGINE=Spider COMMENT='WRAPPER "mysql", srv "srv",TABLE "t1"';
 
select * from mysql.spider_table_sts; # empty
select * from mysql.spider_table_crd; # empty
 
CREATE OR REPLACE TABLE t2 (c INT) ENGINE=InnoDB;
 
select * from mysql.spider_table_sts; # row for t2
select * from mysql.spider_table_crd; # row for t2
 
DROP TABLE t1, t2;
 
select * from mysql.spider_table_sts; # row for t2
select * from mysql.spider_table_crd; # row for t2

I am not sure if the staleness is intended, or more generally
whether we need the sts and crd tables. This ticket could be
considered together with with other sts tickets, like MDEV-27233 and
MDEV-27253.

Comment by Yuchen Pei [ 2023-03-24 ]

I have a patch[1] that fixes this problem and passes all existing
spider tests, but I need to justify not getting crd/sts for the spider
share during CoR.

[1] https://github.com/MariaDB/server/commit/64a9504b074

Comment by Yuchen Pei [ 2023-03-28 ]

Do we really need the stats and cardinality in SPIDER_SHARE / SPIDER_WIDE_SHARE? Commenting out the code block calling spider_get_sts() and spider_get_crd()[1] seem to be rather inconsequential, as in 90+% of spider tests still pass, and the failed tests seem to be trivial failures (see below for org mode notes about the 10 failures)

[1] https://github.com/MariaDB/server/commit/aee7ac434b1

**** spider/bugfix.mdev_27239 (missing some warnings):
  #+begin_src diff
--- /home/ycp/source/mariadb-server/mdev-29676/src/storage/spider/mysql-test/spider/bugfix/r/mdev_27239.result	2023-01-16 22:43:30.875022163 +1100
+++ /home/ycp/source/mariadb-server/mdev-29676/src/storage/spider/mysql-test/spider/bugfix/r/mdev_27239.reject	2023-03-28 13:01:07.202951422 +1100
@@ -8,11 +8,6 @@
 USE auto_test_local;
 CREATE TABLE tbl_a (a INT) ENGINE=SPIDER;
 FLUSH TABLE tbl_a WITH READ LOCK;
-Warnings:
-Error	1429	Unable to connect to foreign data source: localhost
-Error	1429	Unable to connect to foreign data source: localhost
-Error	1429	Unable to connect to foreign data source: localhost
-Error	1429	Unable to connect to foreign data source: localhost
 BEGIN;
 DROP DATABASE auto_test_local;
 for master_1
  #+end_src
**** spider/bugfix.spider_table_sts
  #+begin_example
mysqltest: At line 33: query 'SELECT pkey FROM tbl_a' failed with wrong errno <Unknown> (12720): 'Host:127.0.0.1 and Port:16040 aim self server. Please change spider_same_server_link parameter if this link is required.', instead of <Unknown> (12609)...
 
The result from queries just before the failure was:
< snip >
insert into mysql.spider_table_sts values ('auto_test_local', 'tbl_a', 0, 0, 0, 0, 0, '2019-01-01 00:00:00', '2019-01-01 00:00:00', '2019-01-01 00:00:00');
 
this test is for MDEV-19842
 
drop and create databases
connection master_1;
CREATE DATABASE auto_test_local;
USE auto_test_local;
 
create table
connection master_1;
CREATE TABLE tbl_a (
pkey int NOT NULL,
PRIMARY KEY (pkey)
) MASTER_1_ENGINE MASTER_1_CHARSET MASTER_1_COMMENT_2_1
 
select test 1
connection master_1;
SELECT pkey FROM tbl_a;
  #+end_example
  [[/home/ycp/source/mariadb-server/mdev-29676/src/storage/spider/spd_db_mysql.cc:1944][spider_db_mbase::connect]] > [[/home/ycp/source/mariadb-server/mdev-29676/src/storage/spider/spd_db_conn.cc:149][spider_db_connect]] > [[/home/ycp/source/mariadb-server/mdev-29676/src/storage/spider/spd_db_conn.cc:262][spider_db_conn_queue_action]] > [[/home/ycp/source/mariadb-server/mdev-29676/src/storage/spider/spd_db_conn.cc:585][spider_db_before_query]] > [[/home/ycp/source/mariadb-server/mdev-29676/src/storage/spider/spd_db_conn.cc:813][spider_db_set_names_internal]] > [[/home/ycp/source/mariadb-server/mdev-29676/src/storage/spider/spd_db_conn.cc:856][spider_db_set_names]] > [[/home/ycp/source/mariadb-server/mdev-29676/src/storage/spider/spd_db_mysql.cc:13494][spider_mbase_handler::show_table_status]] > [[/home/ycp/source/mariadb-server/mdev-29676/src/storage/spider/spd_db_conn.cc:5223][spider_db_show_table_status]] > [[/home/ycp/source/mariadb-server/mdev-29676/src/storage/spider/spd_table.cc:7114][spider_get_sts]] > [[/home/ycp/source/mariadb-server/mdev-29676/src/storage/spider/ha_spider.cc:6700][ha_spider::info]] > [[/home/ycp/source/mariadb-server/mdev-29676/src/sql/table.cc:9715][TABLE_LIST::fetch_number_of_rows]] > [[/home/ycp/source/mariadb-server/mdev-29676/src/sql/sql_select.cc:5321][make_join_statistics]] > [[/home/ycp/source/mariadb-server/mdev-29676/src/sql/sql_select.cc:2535][JOIN::optimize_inner]] > [[/home/ycp/source/mariadb-server/mdev-29676/src/sql/sql_select.cc:1871][JOIN::optimize]] > [[/home/ycp/source/mariadb-server/mdev-29676/src/sql/sql_select.cc:5067][mysql_select]] > [[/home/ycp/source/mariadb-server/mdev-29676/src/sql/sql_select.cc:582][handle_select]] > [[/home/ycp/source/mariadb-server/mdev-29676/src/sql/sql_parse.cc:6268][execute_sqlcom_select]] > [[/home/ycp/source/mariadb-server/mdev-29676/src/sql/sql_parse.cc:3952][mysql_execute_command]] > [[/home/ycp/source/mariadb-server/mdev-29676/src/sql/sql_parse.cc:8005][mysql_parse]] > [[/home/ycp/source/mariadb-server/mdev-29676/src/sql/sql_parse.cc:1894][dispatch_command]] > [[/home/ycp/source/mariadb-server/mdev-29676/src/sql/sql_parse.cc:1407][do_command]] > [[/home/ycp/source/mariadb-server/mdev-29676/src/sql/sql_connect.cc:1416][do_handle_one_connection]] > [[/home/ycp/source/mariadb-server/mdev-29676/src/sql/sql_connect.cc:1318][handle_one_connection]] > [[/home/ycp/source/mariadb-server/mdev-29676/src/storage/perfschema/pfs.cc:2201][pfs_spawn_thread]] > [[/lib/x86_64-linux-gnu/libpthread.so.0][start_thread]] > [[/lib/x86_64-linux-gnu/libc.so.6][clone]]
  what happens is that:
  - before commenting out spider_get_sts() etc. from spider_get_share(), spider_get_sts() gets called there that executed the query, then again in info(), where it manages to get the table stats through spider_sys_get_table_sts(). at some point it threw the wrong version error
    - the wrong version error is caused by the sts table having wrong number of columns
  - after commenting out, spider_get_sts() gets called in info() to execute the query, which somehow throws the same link error
    - and then share->init is set to true in spider_get_share(), so when spider_get_sts() is called again, it skips the attempt to read from table with spider_sys_get_table_sts()
**** spider/bugfix.sql_mode_mariadb, spider/bugfix.sql_mode_mysql
  just missing the timezone and loopcheck queries
  #+begin_src diff
--- /home/ycp/source/mariadb-server/mdev-29676/src/storage/spider/mysql-test/spider/bugfix/r/sql_mode_mariadb.result	2023-01-16 22:43:30.875022163 +1100
+++ /home/ycp/source/mariadb-server/mdev-29676/src/storage/spider/mysql-test/spider/bugfix/r/sql_mode_mariadb.reject	2023-03-28 13:04:09.149413636 +1100
@@ -50,7 +50,7 @@
 connection child2_1;
 SELECT argument FROM mysql.general_log WHERE command_type != 'Execute' AND argument LIKE '%sql_mode%';
 argument
-set session transaction isolation level repeatable read;set session autocommit = 1;set session sql_log_off = 0;set session wait_timeout = 604800;set session sql_mode = 'real_as_float,ignore_bad_table_options,no_unsigned_subtraction,no_dir_in_create,no_auto_value_on_zero,strict_trans_tables,strict_all_tables,no_zero_in_date,no_zero_date,allow_invalid_dates,error_for_division_by_zero,no_auto_create_user,high_not_precedence,no_engine_substitution,pad_char_to_full_length,empty_string_is_null,simultaneous_assignment,time_round_fractional';set session time_zone = '+00:00';set @`spider_lc_./auto_test_remote/tbl_a` = '-xxxxxxxxxxxx-xxxxx-./auto_test_local/tbl_a-';start transaction
+set session transaction isolation level repeatable read;set session autocommit = 1;set session sql_log_off = 0;set session wait_timeout = 604800;set session sql_mode = 'real_as_float,ignore_bad_table_options,no_unsigned_subtraction,no_dir_in_create,no_auto_value_on_zero,strict_trans_tables,strict_all_tables,no_zero_in_date,no_zero_date,allow_invalid_dates,error_for_division_by_zero,no_auto_create_user,high_not_precedence,no_engine_substitution,pad_char_to_full_length,empty_string_is_null,simultaneous_assignment,time_round_fractional';start transaction
 SELECT argument FROM mysql.general_log WHERE command_type != 'Execute' AND argument LIKE '%sql_mode%'
 SELECT pkey FROM tbl_a ORDER BY pkey;
 pkey
  #+end_src
**** spider/bugfix.slave_trx_isolation, spider.slave_trx_isolation
  misplaced timezone and loopcheck queries
  #+begin_src diff
--- /home/ycp/source/mariadb-server/mdev-29676/src/storage/spider/mysql-test/spider/bugfix/r/slave_trx_isolation.result	2023-01-16 22:43:30.875022163 +1100
+++ /home/ycp/source/mariadb-server/mdev-29676/src/storage/spider/mysql-test/spider/bugfix/r/slave_trx_isolation.reject	2023-03-28 13:01:11.410596389 +1100
@@ -48,9 +48,8 @@
 connection child2_1;
 SELECT argument FROM mysql.general_log WHERE command_type != 'Execute' AND argument LIKE '%set %';
 argument
-set session time_zone = '+00:00';set @`spider_lc_./auto_test_remote/tbl_a` = '-xxxxxxxxxxxx-xxxxx-./auto_test_local/tbl_a-'
+set session transaction isolation level read committed;set session autocommit = 1;set session wait_timeout = 604800;set session sql_mode = 'strict_trans_tables,error_for_division_by_zero,no_auto_create_user,no_engine_substitution';set session time_zone = '+00:00';set @`spider_lc_./auto_test_remote/tbl_a` = '-xxxxxxxxxxxx-xxxxx-./auto_test_local/tbl_a-';start transaction
 SET NAMES utf8mb3
-set session transaction isolation level read committed;set session autocommit = 1;set session wait_timeout = 604800;set session sql_mode = 'strict_trans_tables,error_for_division_by_zero,no_auto_create_user,no_engine_substitution';start transaction
 SELECT argument FROM mysql.general_log WHERE command_type != 'Execute' AND argument LIKE '%set %'
 SELECT pkey FROM tbl_a ORDER BY pkey;
 pkey
  #+end_src
**** spider/regression/e1121.direct_join_by_pkey_key
  missing show index query (prob from spider_get_crd)
  #+begin_src diff
--- /home/ycp/source/mariadb-server/mdev-29676/src/storage/spider/mysql-test/spider/regression/e1121/r/direct_join_by_pkey_key.result	2023-01-16 22:43:30.879022098 +1100
+++ /home/ycp/source/mariadb-server/mdev-29676/src/storage/spider/mysql-test/spider/regression/e1121/r/direct_join_by_pkey_key.reject	2023-03-28 13:01:15.438258616 +1100
@@ -46,6 +46,7 @@
 SELECT argument FROM mysql.general_log WHERE command_type != 'Execute' AND argument LIKE '%`tbl_a`%' ;
 SELECT argument FROM mysql.general_log WHERE command_type != 'Execute' AND argument LIKE '%`tbl_b`%';
 argument
+show index from `auto_test_remote`.`tbl_a`
 select t0.`val` `val`,t0.`akey` `akey` from `auto_test_remote`.`tbl_a` t0 where (t0.`akey` = '4')
 SELECT argument FROM mysql.general_log WHERE command_type != 'Execute' AND argument LIKE '%`tbl_a`%' ;
 SELECT argument FROM mysql.general_log WHERE command_type != 'Execute' AND argument LIKE '%`tbl_b`%'
  #+end_src
**** spider.direct_join, spider.direct_join_using
  wrong join order: ((t0.`a` = t1.`a`) and (t2.`a` = t1.`a`)) vs ((t0.`a` = t2.`a`) and (t1.`a` = t2.`a`))
  #+begin_src diff
--- /home/ycp/source/mariadb-server/mdev-29676/src/storage/spider/mysql-test/spider/r/direct_join.result	2023-01-17 10:02:25.679773048 +1100
+++ /home/ycp/source/mariadb-server/mdev-29676/src/storage/spider/mysql-test/spider/r/direct_join.reject	2023-03-28 13:01:22.397679716 +1100
@@ -76,7 +76,7 @@
 connection child2_1;
 SELECT argument FROM mysql.general_log WHERE command_type != 'Execute' AND argument LIKE '%select %';
 argument
-select t0.`b` `b`,t0.`a` `a`,t2.`b` `b`,t2.`c` `c` from `auto_test_remote`.`ta_r` t0 join `auto_test_remote`.`ta_r_3` t1 join `auto_test_remote`.`ta_r_int` t2 where ((t0.`a` = t1.`a`) and (t2.`a` = t1.`a`)) order by t0.`b` desc limit 1,2
+select t0.`b` `b`,t0.`a` `a`,t2.`b` `b`,t2.`c` `c` from `auto_test_remote`.`ta_r` t0 join `auto_test_remote`.`ta_r_3` t1 join `auto_test_remote`.`ta_r_int` t2 where ((t0.`a` = t2.`a`) and (t1.`a` = t2.`a`)) order by t0.`b` desc limit 1,2
 SELECT argument FROM mysql.general_log WHERE command_type != 'Execute' AND argument LIKE '%select %'
 SELECT a, b, date_format(c, '%Y-%m-%d %H:%i:%s') FROM ta_r ORDER BY a;
 a	b	date_format(c, '%Y-%m-%d %H:%i:%s')
  #+end_src
**** spider.direct_left_join_nullable
  wrong join: an extra =and (t2.`b` is not null))=
  #+begin_src diff
--- /home/ycp/source/mariadb-server/mdev-29676/src/storage/spider/mysql-test/spider/r/direct_left_join_nullable.result	2023-03-24 12:49:43.668846710 +1100
+++ /home/ycp/source/mariadb-server/mdev-29676/src/storage/spider/mysql-test/spider/r/direct_left_join_nullable.reject	2023-03-28 13:01:28.213200522 +1100
@@ -87,7 +87,7 @@
 connection child2_1;
 SELECT argument FROM mysql.general_log WHERE command_type != 'Execute' AND argument LIKE '%select %';
 argument
-select t0.`a` `a`,t2.`b` `b`,t2.`c` `c`,t3.`a` `a` from `auto_test_remote`.`ta_r_no_idx` t0 left join `auto_test_remote`.`ta_r_auto_inc` t1 on (t1.`a` = t0.`a`) left join `auto_test_remote`.`ta_r_3` t2 on (t2.`c` = t1.`c`) left join `auto_test_remote`.`ta_r` t3 on (t3.`b` = t2.`b`) where 1 order by t0.`a` desc
+select t0.`a` `a`,t2.`b` `b`,t2.`c` `c`,t3.`a` `a` from `auto_test_remote`.`ta_r_no_idx` t0 left join `auto_test_remote`.`ta_r_auto_inc` t1 on (t1.`a` = t0.`a`) left join `auto_test_remote`.`ta_r_3` t2 on (t2.`c` = t1.`c`) left join `auto_test_remote`.`ta_r` t3 on ((t3.`b` = t2.`b`) and (t2.`b` is not null)) where 1 order by t0.`a` desc
 SELECT argument FROM mysql.general_log WHERE command_type != 'Execute' AND argument LIKE '%select %'
 SELECT a, b, date_format(c, '%Y-%m-%d %H:%i:%s') FROM ta_r ORDER BY a;
 a	b	date_format(c, '%Y-%m-%d %H:%i:%s')
  #+end_src

Comment by Yuchen Pei [ 2023-04-03 ]

I cleaned up spider_get_share() and friends and have a better
understanding of what happens there.

It might make sense to get sts and crd when creating a spider table
after all. Consider the following case:

eval CREATE SERVER srv FOREIGN DATA WRAPPER MYSQL OPTIONS (SOCKET "$MASTER_1_MYSOCK", DATABASE 'test',user 'root');
CREATE TABLE t1 (c INT);
CREATE TABLE t3 (c INT);
INSERT INTO t3 (c) VALUES (42);
SELECT c from t3;
CREATE OR REPLACE TABLE t2 (c INT) ENGINE=Spider COMMENT='WRAPPER "mysql", srv "srv",TABLE "t1"'
SELECT c from t3;

Because of the SELECT in the CREATE statement, we do need to
open the table (in select_create::create_table_from_items()),
and then invoke ha_spider::write_row() to write to the spider
table the results of the SELECT. It may not be safe to avoid
getting sts during open, because ha_spider::write() invokes
ha_spider::info() which uses the results of sts in
share->stats. Even though in my experiment, lines using
share->stats seemed to be skipped, it would require further
analysis to find out whether they will always be skipped.

On the other hand, a much simpler and cleaner fix might be to fix
the check for self-reference. The check is already the first step of
spider_get_share(), but somehow it could not pick out the
0-length self-reference in the test. So we just need to figure out
why and fix it.

Comment by Yuchen Pei [ 2023-04-04 ]

Here's a testcase where a length-2 self-reference causes the same
problem.

--disable_query_log
--disable_result_log
--source ../../t/test_init.inc
--enable_result_log
--enable_query_log
 
eval CREATE SERVER srv FOREIGN DATA WRAPPER MYSQL OPTIONS (SOCKET "$MASTER_1_MYSOCK", DATABASE 'test',user 'root');
create table t2 (c int);
create table t1 (c int) ENGINE=Spider COMMENT='WRAPPER "mysql", srv "srv",TABLE "t2"';
create table t0 (c int) ENGINE=Spider COMMENT='WRAPPER "mysql", srv "srv",TABLE "t1"';
alter table t2 ENGINE=Spider COMMENT='WRAPPER "mysql", srv "srv",TABLE "t0"';
SET GLOBAL lock_wait_timeout=1;
create or replace table t0 (c int);
SET GLOBAL lock_wait_timeout=default;
drop table t0, t1, t2;
 
--disable_query_log
--disable_result_log
--source ../../t/test_deinit.inc
--enable_result_log
--enable_query_log

An analysis of the self-referencing check causing opening the same
table twice, and a proposal of preemptive self-reference detection
that can fix this issue is filed as MDEV-30994.

Alternatively, a quicker and dirtier fix would be to set the
lock_wait_timeout to a small value (like 1 as in the testcase) on
the remote server before issuing the queries to get remote table
status and index, and restoring it afterwards. If the failure is a
lock wait timeout, print a hint asking the user to check for
self-reference using a select statement, and abort the
opening. Given there are other bugs to attend to, maybe this is not
a bad idea at this moment.

Comment by Yuchen Pei [ 2023-04-11 ]

The lock_wait_timeout idea seems to be working. I have a wip patch at https://github.com/MariaDB/server/commit/3f852814983 and shall post a patch for review soon.

Comment by Yuchen Pei [ 2023-04-13 ]

Hi holyfoot, can you take a look at my patches for this ticket? In order, they are:

> 9309d0c9cd2 MDEV-30265 Document some sts related functions in spider.

The only reason this commit is not squashed into the next one is
because it is purely a documentation change.

> 425722e8f08 MDEV-29676 refactored and documented spider_get_share() and friends

Refactoring and cleaning up spider_get_share() before fixing the
bug. The refactoring is rather faithful and straightforward, with
logical changes made in a separate commit below.

> eb03f927824 MDEV-29676 Some changes in behaviour w.r.t. spider sts crd

A bit of logical change on top of the refactoring for ease of
review. Will probably squash with the refactoring after the review. It
would be good to backport the squashed refactoring to older versions
too.

> ede6198cf71 MDEV-29676 adding lock wait timeout to get_sts and crd

This is the commit that fixes the bug in the present ticket.

The other commits in the same branch like f3353ca474d, 2544cace482
are irrelevant as they are either offhand fixes or have already been
pushed for other tickets.

Comment by Alexey Botchkov [ 2023-04-23 ]

you removed roop_count (not root)
otherwise ok to push.

Comment by Yuchen Pei [ 2023-04-27 ]

The development of this issue has been based on bb-11.0-midenok-MDEV-25292 which is in turn based on 11.0, and the fix can definitely benefit 11.0. Therefore I add 11.0 as a fixversion for free. I will run tests on both 11.0 and 11.1 before pushing.

I also add 11.2 because it is the latest development version.

Comment by Yuchen Pei [ 2023-04-27 ]

Thanks for the review holyfoot.

I have rebased the patches on 11.0 (instead of bb-11.0-midenok-MDEV-25292) with the test disabled (see MDEV-31138). I will push to 11.0 once I have an answer on whether / how to push the post-merge fixup for MDEV-29644 https://github.com/MariaDB/server/commit/e724017debb

Comment by Yuchen Pei [ 2023-04-28 ]

Pushed to 11.0. Also tested on 11.1 (but without rebasing on an MDEV-25292 commit because there is no such commit based on 11.1)

Generated at Thu Feb 08 10:10:26 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.