[MDEV-29484] Assertion `!trx_free || !trx->locked_connections' failed in spider_free_trx_conn on LOCK TABLES Created: 2022-09-07  Updated: 2023-10-06  Resolved: 2022-10-04

Status: Closed
Project: MariaDB Server
Component/s: Locking, Storage Engine - Spider
Affects Version/s: 10.5, 10.6, 10.7, 10.8, 10.9, 10.10, 10.11
Fix Version/s: 10.5.18, 10.6.11, 10.7.7, 10.8.6, 10.9.4, 10.10.2, 10.11.1

Type: Bug Priority: Blocker
Reporter: Roel Van de Paar Assignee: Nayuta Yanagisawa (Inactive)
Resolution: Fixed Votes: 0
Labels: locking, not-10.4, regression-10.5

Issue Links:
Relates
relates to MDEV-29667 Server hangs on DROP DATABASE after f... Closed
relates to MDEV-29456 Spider hangs in 'Waiting for table me... Closed
relates to MDEV-30014 heap-use-after-free in ha_spider::lo... Closed

 Description   

INSTALL PLUGIN Spider SONAME 'ha_spider.so';
CREATE USER Spider@localhost IDENTIFIED BY 'PWD123';
CREATE SERVER srv FOREIGN DATA WRAPPER MYSQL OPTIONS (SOCKET '../socket.sock',DATABASE 'test',user 'Spider',PASSWORD 'PWD123');
CREATE TABLE t (c INT) ENGINE=InnoDB;
CREATE TABLE t1 (c1 INT) ENGINE=Spider;
CREATE TABLE t2 (f1 INT) ENGINE=Spider COMMENT='WRAPPER "mysql",srv "srv",TABLE "t"';
LOCK TABLES t2 WRITE;
LOCK TABLES t1 READ,t2 READ;

Leads to:

10.11.0 bc563f1a4b0b38de3b41fd0f0d3d8b7f1aacbd8b (Debug)

mysqld: /test/10.11_dbg/storage/spider/spd_trx.cc:101: int spider_free_trx_conn(SPIDER_TRX*, bool): Assertion `!trx_free || !trx->locked_connections' failed.

10.11.0 bc563f1a4b0b38de3b41fd0f0d3d8b7f1aacbd8b (Debug)

Core was generated by `/test/MD190822-mariadb-10.11.0-linux-x86_64-dbg/bin/mysqld --no-defaults --core'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
[Current thread is 1 (Thread 0x1470500a0700 (LWP 184539))]
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x0000147069c23859 in __GI_abort () at abort.c:79
#2  0x0000147069c23729 in __assert_fail_base (fmt=0x147069db9588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x1470275bec08 "!trx_free || !trx->locked_connections", file=0x1470275beba8 "/test/10.11_dbg/storage/spider/spd_trx.cc", line=101, function=<optimized out>) at assert.c:92
#3  0x0000147069c34fd6 in __GI___assert_fail (assertion=assertion@entry=0x1470275bec08 "!trx_free || !trx->locked_connections", file=file@entry=0x1470275beba8 "/test/10.11_dbg/storage/spider/spd_trx.cc", line=line@entry=101, function=function@entry=0x1470275bebd8 "int spider_free_trx_conn(SPIDER_TRX*, bool)") at assert.c:101
#4  0x00001470274e8d6c in spider_free_trx_conn (trx=trx@entry=0x146fd0061948, trx_free=trx_free@entry=true) at /test/10.11_dbg/storage/spider/spd_trx.cc:101
#5  0x00001470274efc57 in spider_free_trx_alloc (trx=trx@entry=0x146fd0061948) at /test/10.11_dbg/storage/spider/spd_trx.cc:1138
#6  0x00001470274efe6b in spider_free_trx (trx=trx@entry=0x146fd0061948, need_lock=need_lock@entry=true, reset_ha_data=reset_ha_data@entry=false) at /test/10.11_dbg/storage/spider/spd_trx.cc:1455
#7  0x000014702751c2ff in spider_close_connection (hton=<optimized out>, thd=0x146fd0000db8) at /test/10.11_dbg/storage/spider/spd_table.cc:6313
#8  0x00005654cdcf410e in ha_close_connection (thd=thd@entry=0x146fd0000db8) at /test/10.11_dbg/sql/handler.cc:906
#9  0x00005654cd99e0a5 in THD::free_connection (this=this@entry=0x146fd0000db8) at /test/10.11_dbg/sql/sql_class.cc:1598
#10 0x00005654cd8d31bc in unlink_thd (thd=0x146fd0000db8) at /test/10.11_dbg/sql/mysqld.cc:2765
#11 0x00005654cdb6c3b4 in do_handle_one_connection (connect=<optimized out>, connect@entry=0x5654d18394d8, put_in_cache=put_in_cache@entry=true) at /test/10.11_dbg/sql/sql_connect.cc:1429
#12 0x00005654cdb6c6e3 in handle_one_connection (arg=0x5654d18394d8) at /test/10.11_dbg/sql/sql_connect.cc:1312
#13 0x000014706a134609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#14 0x0000147069d20133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Bug confirmed present in:
MariaDB: 10.5.18 (dbg), 10.6.10 (dbg), 10.7.6 (dbg), 10.8.5 (dbg), 10.9.2 (dbg), 10.10.2 (dbg), 10.11.0 (dbg)

Bug (or feature/syntax) confirmed not present in:
MariaDB: 10.4.27 (dbg), 10.4.27 (opt), 10.5.18 (opt), 10.6.10 (opt), 10.7.6 (opt), 10.8.5 (opt), 10.9.2 (opt), 10.10.2 (opt), 10.11.0 (opt)
MySQL: 5.5.62 (dbg), 5.5.62 (opt), 5.6.51 (dbg), 5.6.51 (opt), 5.7.38 (dbg), 5.7.38 (opt), 8.0.29 (dbg), 8.0.29 (opt)



 Comments   
Comment by Roel Van de Paar [ 2022-09-07 ]

Please note that SET GLOBAL spider_same_server_link=ON; is not used here.

Comment by Roel Van de Paar [ 2022-09-07 ]

Bug was introduced between 12/06/22 and 12/07/22 AEST

Comment by Roel Van de Paar [ 2022-09-07 ]

Execution at the CLI may not work, try:

./bin/mysql -A -uroot -S./socket.sock --force --binary-mode test < ./in.sql > ./mysql.out 2>&1

Comment by Nayuta Yanagisawa (Inactive) [ 2022-09-07 ]

The failing assertion is newly added by the following commit: https://github.com/MariaDB/server/commit/a26700cca579926cddf9a48c45f13b32785746bb

Comment by Nayuta Yanagisawa (Inactive) [ 2022-09-14 ]

I can reproduce the bug with ./bin/mysql -A -uroot -S./socket.sock --force --binary-mode test < ./in.sql > ./mysql.out 2>&1 but not with CLI or MTR.

Comment by Nayuta Yanagisawa (Inactive) [ 2022-09-14 ]

trx->locked_connections is incremented (0 -> 1) by the first LOCK TABLE statement. Then, at the second failing LOCK TABLE statement, the variable is not decremented and the corresponding lock is not released.

(rr) watch -l trx->locked_connections
Hardware watchpoint 1: -location trx->locked_connections
(rr) rc
Continuing.
....
 
Thread 2 hit Hardware watchpoint 1: -location trx->locked_connections
 
Old value = 1
New value = 0
0x00007f131526e2f0 in spider_mbase_handler::lock_tables (this=0x7f12f40d5f10, link_idx=0) at /home/nayuta_mariadb/repo/mariadb-server/10.5/storage/spider/spd_db_mysql.cc:15454
b15454	    spider->wide_handler->trx->locked_connections++;
(rr) bt
#0  0x00007f131526e2f0 in spider_mbase_handler::lock_tables (this=0x7f12f40d5f10, link_idx=0)
    at /home/nayuta_mariadb/repo/mariadb-server/10.5/storage/spider/spd_db_mysql.cc:15454
#1  0x00007f1315142b3f in spider_db_lock_tables (spider=0x7f12f40c81c0, link_idx=0) at /home/nayuta_mariadb/repo/mariadb-server/10.5/storage/spider/spd_db_conn.cc:1406
#2  0x00007f131522162f in ha_spider::lock_tables (this=0x7f12f40c81c0) at /home/nayuta_mariadb/repo/mariadb-server/10.5/storage/spider/ha_spider.cc:16295
#3  0x00007f13151ece34 in ha_spider::external_lock (this=0x7f12f40c81c0, thd=0x7f12f40012e8, lock_type=1)
    at /home/nayuta_mariadb/repo/mariadb-server/10.5/storage/spider/ha_spider.cc:1267
#4  0x000055ff7669c2ef in handler::ha_external_lock (this=0x7f12f40c81c0, thd=0x7f12f40012e8, lock_type=1)
    at /home/nayuta_mariadb/repo/mariadb-server/10.5/sql/handler.cc:6737
#5  0x000055ff767eb8cd in lock_external (thd=0x7f12f40012e8, tables=0x7f12f411b4f8, count=1) at /home/nayuta_mariadb/repo/mariadb-server/10.5/sql/lock.cc:396
#6  0x000055ff767eb532 in mysql_lock_tables (thd=0x7f12f40012e8, sql_lock=0x7f12f411b4c8, flags=0) at /home/nayuta_mariadb/repo/mariadb-server/10.5/sql/lock.cc:341
#7  0x000055ff767eb3de in mysql_lock_tables (thd=0x7f12f40012e8, tables=0x7f12f4019e50, count=1, flags=0)
    at /home/nayuta_mariadb/repo/mariadb-server/10.5/sql/lock.cc:304
#8  0x000055ff7625305f in lock_tables (thd=0x7f12f40012e8, tables=0x7f12f40195b0, count=1, flags=0)
    at /home/nayuta_mariadb/repo/mariadb-server/10.5/sql/sql_base.cc:5484
#9  0x000055ff762fbf5a in lock_tables_open_and_lock_tables (thd=0x7f12f40012e8, tables=0x7f12f40195b0)
    at /home/nayuta_mariadb/repo/mariadb-server/10.5/sql/sql_parse.cc:3024
#10 0x000055ff763035d6 in mysql_execute_command (thd=0x7f12f40012e8) at /home/nayuta_mariadb/repo/mariadb-server/10.5/sql/sql_parse.cc:5186
#11 0x000055ff7630d217 in mysql_parse (thd=0x7f12f40012e8, rawbuf=0x7f12f40194f0 "LOCK TABLES t2 WRITE", length=20, parser_state=0x7f13168f22b0, is_com_multi=false, 
    is_next_command=false) at /home/nayuta_mariadb/repo/mariadb-server/10.5/sql/sql_parse.cc:8101
#12 0x000055ff762f89d8 in dispatch_command (command=COM_QUERY, thd=0x7f12f40012e8, packet=0x7f12f400d179 "LOCK TABLES t2 WRITE", packet_length=20, is_com_multi=false, 
    is_next_command=false) at /home/nayuta_mariadb/repo/mariadb-server/10.5/sql/sql_parse.cc:1891
#13 0x000055ff762f717a in do_command (thd=0x7f12f40012e8) at /home/nayuta_mariadb/repo/mariadb-server/10.5/sql/sql_parse.cc:1375
#14 0x000055ff764b29dc in do_handle_one_connection (connect=0x55ff79a07a58, put_in_cache=true) at /home/nayuta_mariadb/repo/mariadb-server/10.5/sql/sql_connect.cc:1416
#15 0x000055ff764b274c in handle_one_connection (arg=0x55ff79a07a58) at /home/nayuta_mariadb/repo/mariadb-server/10.5/sql/sql_connect.cc:1318
#16 0x000055ff769fc874 in pfs_spawn_thread (arg=0x55ff792ab868) at /home/nayuta_mariadb/repo/mariadb-server/10.5/storage/perfschema/pfs.cc:2201
#17 0x00007f13238e4b43 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#18 0x00007f1323975bb4 in clone () from /lib/x86_64-linux-gnu/libc.so.6

Comment by Nayuta Yanagisawa (Inactive) [ 2022-09-14 ]

If I understand correctly, Spider is designed so that it doesn't release table locks on a backend server by the second LOCK TABLE statement. This seems to be reasonable in some sense because, once the second LOCK TABLE is transmitted to the backend server, the locks by the first LOCK TABLE is automatically released.

So, the assertion introduced by a26700c doesn't hold in some cases. I expect a much simpler test case, which just locks a Spider table and then shutdowns the server or something, can result in the same assertion error (while I've not yet found such test case).

Comment by Nayuta Yanagisawa (Inactive) [ 2022-09-15 ]

The following test case hangs at dropping auto_test_remote when it is run by MTR. This happens on 10.5.16 and 10.4 HEAD which do not have a26700c. So, it is likely to be an old problem.

--disable_query_log
--disable_result_log
--source ../../t/test_init.inc
--enable_result_log
--enable_query_log
 
--connection child2_1
CREATE DATABASE auto_test_remote;
USE auto_test_remote;
eval CREATE TABLE tbl_a (
    a INT
) $CHILD2_1_ENGINE $CHILD2_1_CHARSET;
 
--connection master_1
CREATE DATABASE auto_test_local;
USE auto_test_local;
eval CREATE TABLE tbl_a (
    a INT
) $MASTER_1_ENGINE $MASTER_1_CHARSET COMMENT='table "tbl_a", srv "s_2_1"';
eval CREATE TABLE tbl_b (
    a INT
) $MASTER_1_ENGINE $MASTER_1_CHARSET;
 
LOCK TABLES tbl_a WRITE;
--error 1429
LOCK TABLES tbl_b READ, tbl_a READ;
 
--connection master_1
DROP DATABASE IF EXISTS auto_test_local;
 
--connection child2_1
DROP DATABASE IF EXISTS auto_test_remote;
 
--disable_query_log
--disable_result_log
--source ../t/test_deinit.inc
--enable_query_log
--enable_result_log

Comment by Nayuta Yanagisawa (Inactive) [ 2022-09-15 ]

OK. Now, I can reproduce the bug by MTR

--disable_query_log
--disable_result_log
--source ../../t/test_init.inc
--enable_result_log
--enable_query_log
 
--connection child2_1
CREATE DATABASE auto_test_remote;
USE auto_test_remote;
eval CREATE TABLE tbl_a (
    a INT
) $CHILD2_1_ENGINE $CHILD2_1_CHARSET;
 
--connection master_1
CREATE DATABASE auto_test_local;
USE auto_test_local;
 
eval CREATE TABLE tbl_a (
    a INT
) $MASTER_1_ENGINE $MASTER_1_CHARSET COMMENT='table "tbl_a", srv "s_2_1"';
 
eval CREATE TABLE tbl_b (
    a INT
) $MASTER_1_ENGINE $MASTER_1_CHARSET;
 
LOCK TABLES tbl_a WRITE;
--error 1429
LOCK TABLES tbl_b READ, tbl_a READ;
 
--disconnect master_1

Comment by Roel Van de Paar [ 2022-09-28 ]

Additional similar testcase

INSTALL PLUGIN Spider SONAME 'ha_spider.so';
CREATE USER Spider@localhost IDENTIFIED BY 'PWD123';
CREATE SERVER srv FOREIGN DATA WRAPPER MYSQL OPTIONS (SOCKET '../socket.sock',DATABASE 'test',user 'Spider',PASSWORD 'PWD123');
CREATE TABLE t (c INT);
CREATE TABLE t4 (c1 CHAR(1)) DEFAULT CHARSET=sjis COMMENT='WRAPPER "mysql",srv "srv",TABLE "t"' COMMENT='WRAPPER "mysql",srv "srv",TABLE "t"';
SET default_storage_engine=Spider;
CREATE TABLE t3 (COUNT INT) COMMENT='WRAPPER "mysql",srv "srv",TABLE "t"';
CREATE TABLE t1 (c1 CHAR(1) KEY,c2 INT,c3 CHAR(1));
CREATE TABLE t2 (f1 INT,f2 INT) COMMENT='WRAPPER "mysql",srv "srv",TABLE "t"';
LOCK TABLES t2 WRITE;
LOCK TABLES t1 READ,t2 READ,t3 READ,t4 READ;

Comment by Nayuta Yanagisawa (Inactive) [ 2022-09-29 ]

The following assertion in spider_free_trx_conn() has been added by my recent commit. It holds in many cases, but it doesn't hold for the case when the connection to a remote data node is disconnected.

 DBUG_ASSERT(!trx_free || !trx->locked_connections);

Comment by Nayuta Yanagisawa (Inactive) [ 2022-09-29 ]

However, even though I remove the assertion, the test case result in a hang. According to the debug trace, the hang seems to occur at Query_cache::lock_and_suspend().

...
T@1    : | | | | | | | | | | | | | <ma_commit
T@1    : | | | | | | | | | | | | | >reset_thd_trn
T@1    : | | | | | | | | | | | | | | mutex: LOCK_thd_data (0x55fae5b020f8) locking
T@1    : | | | | | | | | | | | | | | mutex: LOCK_thd_data (0x55fae5b020f8) locked
T@1    : | | | | | | | | | | | | | | mutex: LOCK_thd_data (0x55fae5b020f8) unlocking
T@1    : | | | | | | | | | | | | | | >plugin_unlock
T@1    : | | | | | | | | | | | | | | | mutex: LOCK_plugin (0x55fae3d23060) locking
T@1    : | | | | | | | | | | | | | | | mutex: LOCK_plugin (0x55fae3d23060) locked
T@1    : | | | | | | | | | | | | | | | >intern_plugin_unlock
T@1    : | | | | | | | | | | | | | | | | >my_free
T@1    : | | | | | | | | | | | | | | | | | my: ptr: 0x7f844c0e9ae8
T@1    : | | | | | | | | | | | | | | | | | info: global thd memory_used: -73984  size: -32
T@1    : | | | | | | | | | | | | | | | | <my_free
T@1    : | | | | | | | | | | | | | | | | lock: thd: 0x55fae5b00398  plugin: "Aria" UNLOCK ref_count: 3
T@1    : | | | | | | | | | | | | | | | <intern_plugin_unlock
T@1    : | | | | | | | | | | | | | | | mutex: LOCK_plugin (0x55fae3d23060) unlocking
T@1    : | | | | | | | | | | | | | | <plugin_unlock
T@1    : | | | | | | | | | | | | | | info: table: 0x55fae5cb7d88  trn: 0x7f843c01c128 -> NULL
T@1    : | | | | | | | | | | | | | | >Query_cache::invalidate_by_MyISAM_filename
T@1    : | | | | | | | | | | | | | | <Query_cache::invalidate_by_MyISAM_filename
T@1    : | | | | | | | | | | | | | | >check_integrity
T@1    : | | | | | | | | | | | | | | | THD::enter_stage: Waiting for query cache lock at /home/nayuta_mariadb/repo/mariadb-server/bb-10.5-MDEV-29484/sql/sql_cache.cc:692
T@1    : | | | | | | | | | | | | | | | >Query_cache::lock_and_suspend
T@1    : | | | | | | | | | | | | | | | | mutex: structure_guard_mutex (0x55fae3d1cd80) locking

Comment by Nayuta Yanagisawa (Inactive) [ 2022-09-29 ]

OK. The hang has been introduced by https://github.com/MariaDB/server/commit/a26700cca579926cddf9a48c45f13b32785746bb.

Comment by Nayuta Yanagisawa (Inactive) [ 2022-09-29 ]

Please review: https://github.com/MariaDB/server/commit/1ae6d188c161cf07d0c389d5705b913b9504fc44

Comment by Alexey Botchkov [ 2022-10-04 ]

ok to push.

Generated at Thu Feb 08 10:08:57 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.