[MDEV-29962] SIGSEGV in ha_spider::lock_tables on BEGIN after table lock Created: 2022-11-07  Updated: 2024-01-29

Status: Stalled
Project: MariaDB Server
Component/s: Locking, Storage Engine - Spider
Affects Version/s: 10.5, 10.6, 10.7, 10.8, 10.9, 10.10, 10.11
Fix Version/s: 10.5, 10.6

Type: Bug Priority: Major
Reporter: Roel Van de Paar Assignee: Yuchen Pei
Resolution: Unresolved Votes: 0
Labels: locking

Issue Links:
Blocks
blocks MDEV-28683 Spider: SIGSEGV in spider_db_direct_d... Closed
PartOf
includes MDEV-28683 Spider: SIGSEGV in spider_db_direct_d... Closed
includes MDEV-31996 Segfault when setting spider_delete_a... Closed
Relates
relates to MDEV-28775 Implement connection manager in Spider Stalled
relates to MDEV-27239 Spider: Assertion `thd->transaction->... Closed
relates to MDEV-27526 Spider: SIGSEGV in ha_spider::lock_ta... Closed
relates to MDEV-32492 SIGSEGV in spider_conn_first_link_idx... Confirmed

 Description   

Not a new regression. The crash is present in optimized builds only, debug builds seem to work fine.

INSTALL PLUGIN Spider SONAME 'ha_spider.so';
CREATE USER Spider@localhost IDENTIFIED BY '';
CREATE SERVER srv FOREIGN DATA WRAPPER MYSQL OPTIONS (SOCKET '../socket.sock',DATABASE 'test',user 'Spider',PASSWORD '');
CREATE TABLE t (c INT) ENGINE=InnoDB;
CREATE TABLE t1 (c INT) ENGINE=Spider COMMENT='WRAPPER "mysql",srv "srv",TABLE "t"';
SELECT * FROM t1;
LOCK TABLES t1 WRITE CONCURRENT,t1 AS t2 READ;
BEGIN;

Leads to:

10.11.1 50c5743adc87e1cdec1431a02558f6540fe5a6d5 (Optimized)

Core was generated by `/test/MD221022-mariadb-10.11.1-linux-x86_64-opt/bin/mysqld --no-defaults --core'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000152d3d82cd4e in ha_spider::lock_tables (
    this=this@entry=0x152ccc04bb50)
    at /test/10.11_opt/storage/spider/ha_spider.cc:12184
12184	        if (
[Current thread is 1 (Thread 0x152d3d8d6700 (LWP 354245))]
(gdb) bt
#0  0x0000152d3d82cd4e in ha_spider::lock_tables (this=this@entry=0x152ccc04bb50) at /test/10.11_opt/storage/spider/ha_spider.cc:12184
#1  0x0000152d3d82d200 in ha_spider::external_lock (this=0x152ccc04bb50, thd=<optimized out>, lock_type=2) at /test/10.11_opt/storage/spider/ha_spider.cc:921
#2  0x000055aa825b5d78 in handler::ha_external_lock (this=0x152ccc04bb50, thd=thd@entry=0x152ccc000c58, lock_type=lock_type@entry=2) at /test/10.11_opt/sql/handler.cc:7095
#3  0x000055aa826cb55c in handler::ha_external_unlock (thd=0x152ccc000c58, this=<optimized out>) at /test/10.11_opt/sql/handler.h:3538
#4  unlock_external (count=<optimized out>, table=0x152ccc0813f8, thd=0x152ccc000c58) at /test/10.11_opt/sql/lock.cc:744
#5  mysql_unlock_tables (thd=0x152ccc000c58, sql_lock=0x152ccc0813b8, free_lock=<optimized out>) at /test/10.11_opt/sql/lock.cc:435
#6  0x000055aa826cbad8 in mysql_unlock_tables (thd=thd@entry=0x152ccc000c58, sql_lock=<optimized out>) at /test/10.11_opt/sql/lock.cc:418
#7  0x000055aa822f2209 in close_thread_tables (thd=thd@entry=0x152ccc000c58) at /test/10.11_opt/sql/sql_base.cc:953
#8  0x000055aa822f307b in Locked_tables_list::unlock_locked_tables (thd=0x152ccc000c58, this=0x152ccc004af8) at /test/10.11_opt/sql/sql_base.cc:2571
#9  Locked_tables_list::unlock_locked_tables (this=this@entry=0x152ccc004af8, thd=thd@entry=0x152ccc000c58) at /test/10.11_opt/sql/sql_base.cc:2542
#10 0x000055aa8248f28d in trans_begin (thd=thd@entry=0x152ccc000c58, flags=0) at /test/10.11_opt/sql/transaction.cc:115
#11 0x000055aa8236a1ae in mysql_execute_command (thd=0x152ccc000c58, is_called_from_prepared_stmt=<optimized out>) at /test/10.11_opt/sql/sql_parse.cc:5605
#12 0x000055aa82358335 in mysql_parse (rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>, thd=0x152ccc000c58) at /test/10.11_opt/sql/sql_parse.cc:8023
#13 mysql_parse (thd=0x152ccc000c58, rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>) at /test/10.11_opt/sql/sql_parse.cc:7945
#14 0x000055aa823640ea in dispatch_command (command=COM_QUERY, thd=0x152ccc000c58, packet=<optimized out>, packet_length=<optimized out>, blocking=<optimized out>) at /test/10.11_opt/sql/sql_class.h:1346
#15 0x000055aa82365ee2 in do_command (thd=0x152ccc000c58, blocking=blocking@entry=true) at /test/10.11_opt/sql/sql_parse.cc:1407
#16 0x000055aa8247ffbf in do_handle_one_connection (connect=<optimized out>, connect@entry=0x55aa8599ac18, put_in_cache=put_in_cache@entry=true) at /test/10.11_opt/sql/sql_connect.cc:1416
#17 0x000055aa8248029d in handle_one_connection (arg=0x55aa8599ac18) at /test/10.11_opt/sql/sql_connect.cc:1318
#18 0x0000152d63b56609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#19 0x0000152d63742133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Bug confirmed present in:
MariaDB: 10.5.18 (opt), 10.6.10 (opt), 10.7.6 (opt), 10.8.5 (opt), 10.9.3 (opt), 10.10.2 (opt), 10.11.1 (opt)

Bug (or feature/syntax) confirmed not present in:
MariaDB: 10.3.37 (dbg), 10.3.37 (opt), 10.4.27 (dbg), 10.4.27 (opt), 10.5.18 (dbg), 10.6.10 (dbg), 10.7.6 (dbg), 10.8.5 (dbg), 10.9.3 (dbg), 10.10.2 (dbg), 10.11.1 (dbg)
MySQL: 5.5.62 (dbg), 5.5.62 (opt), 5.6.51 (dbg), 5.6.51 (opt), 5.7.38 (dbg), 5.7.38 (opt), 8.0.29 (dbg), 8.0.29 (opt)



 Comments   
Comment by Roel Van de Paar [ 2022-11-07 ]

I made a matching MTR test as well:

--source include/have_innodb.inc
--let $SOCKET= `SELECT @@global.socket`
INSTALL PLUGIN Spider SONAME 'ha_spider.so';
CREATE USER Spider@localhost IDENTIFIED BY '';
eval CREATE SERVER srv FOREIGN DATA WRAPPER MYSQL OPTIONS (SOCKET "$SOCKET",DATABASE 'test',user 'Spider',PASSWORD '');
CREATE TABLE t (c INT) ENGINE=InnoDB;
CREATE TABLE t1 (c INT) ENGINE=Spider COMMENT='WRAPPER "mysql",srv "srv",TABLE "t"';
SELECT * FROM t1;
LOCK TABLES t1 WRITE CONCURRENT,t1 AS t2 READ;
BEGIN;

Which crashes in the same way. When ENGINE=InnoDB clause is removed (i.e. MyISAM default for MTR), it produces another outcome:

10.11.1 50c5743adc87e1cdec1431a02558f6540fe5a6d5 (Optimized)

CURRENT_TEST: main.test
mysqltest: At line 10: query 'BEGIN' failed: ER_GET_ERRMSG (1296): Got error 3 'unknown' from SPIDER

Comment by Roel Van de Paar [ 2022-11-07 ]

When trying the alternative MTR-like (i.e. using MYISAM) testcase at the CLI:

INSTALL PLUGIN Spider SONAME 'ha_spider.so';
CREATE USER Spider@localhost IDENTIFIED BY '';
CREATE SERVER srv FOREIGN DATA WRAPPER MYSQL OPTIONS (SOCKET '../socket.sock',DATABASE 'test',user 'Spider',PASSWORD '');
CREATE TABLE t (c INT) ENGINE=MyISAM;
CREATE TABLE t1 (c INT) ENGINE=Spider COMMENT='WRAPPER "mysql",srv "srv",TABLE "t"';
SELECT * FROM t1;
LOCK TABLES t1 WRITE CONCURRENT,t1 AS t2 READ;
BEGIN;

We get the same crash, so it is interesting that MTR fails in a different way when using MyISAM.

Comment by Yuchen Pei [ 2023-10-11 ]

Hi holyfoot, ptal thanks

commit a5e1be90a1b22569cf1ec33f0cee57d481385e3a (bb-10.10-mdev-29962)
Author: Yuchen Pei <ycp@mariadb.com>
Date:   Wed Oct 11 18:04:40 2023 +1100
 
    MDEV-29962 Spider: check trx and get conn before lock_tables()
    
    Same cause as MDEV-31996. This prevents reuse of conns freed
    previously, e.g. commit of previous statement.

The above is based on 11.0. Also tested on

  • 10.5 867ff4dc29ac19f0691780f674315ff7098ec87a
  • 10.10 a5e1be90a1b22569cf1ec33f0cee57d481385e3a
  • enterprise-23.08 1d01b86ee157ec1e0b11c5f11aa63bb39321a227
Comment by Yuchen Pei [ 2023-10-11 ]

Sorry, I need to re-work this, because the problem in MDEV-27902
also appears in 10.4, and a consideration of choosing between
dml_init() and check_trx_and_get_conn() needs to be made. Will need
to check MDEV-31996 and MDEV-28683 for this too.

Here's the case for 10.4, it is the same as the original case and
two other cases in MDEV-27902, but it fails at
ha_spider::external_lock() for the same reason, and a fix
(cd86deb2e98e133ce5d287e92ca048e04b4726b9) like the one in the
previous comment which adds check_trx_and_get_conn() in
ha_spider::external_lock() fixes it.

# original case
CREATE TABLE t (c INT) ENGINE=Spider;
HANDLER t OPEN;
--error ER_CONNECT_TO_FOREIGN_DATA_SOURCE
HANDLER t READ next;
--error ER_PARSE_ERROR
dummy;
--error ER_CONNECT_TO_FOREIGN_DATA_SOURCE
HANDLER t READ next;
 
drop table t;
 
# case by nayuta
CREATE TABLE t (c INT) ENGINE=Spider;
HANDLER t OPEN;
--error ER_CONNECT_TO_FOREIGN_DATA_SOURCE
HANDLER t READ FIRST;
--error ER_CONNECT_TO_FOREIGN_DATA_SOURCE
HANDLER t READ NEXT;
 
drop table t;
 
# Another case by Roel
CREATE TABLE t (c INT) ENGINE=Spider;
HANDLER t OPEN;
--error ER_CONNECT_TO_FOREIGN_DATA_SOURCE
HANDLER t READ NEXT;
--error ER_CONNECT_TO_FOREIGN_DATA_SOURCE
SELECT * FROM t;
--error ER_CONNECT_TO_FOREIGN_DATA_SOURCE
HANDLER t READ NEXT;
 
drop table t;

We also need to check why the heap-use-after-free happens at
different time between 10.4 and higher versions.

We will use this ticket to fix all heap-use-after-free of
connections, including MDEV-27902 and possibly others.

Comment by Yuchen Pei [ 2023-12-07 ]

I think we should just take the simple approach, like in MDEV-28683,
and call spider_check_trx_and_get_conn() when needed.

I'll take a look at the problem similar to MDEV-26540 when running the
MDEV-27902 case in 10.4 (see also comment[1] there), to rule out any
issues that can be relevant to this one.

[1] https://jira.mariadb.org/browse/MDEV-26540?focusedCommentId=271610&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-271610

Comment by Yuchen Pei [ 2023-12-07 ]

Hi holyfoot, ptal thanks

upstream/bb-11.0-mdev-29962 617abe5303ffae3270f85bd1dee83bdfc5f63ee1
MDEV-29962 Spider: creates connections if needed before lock_tables()
 
Same cause as MDEV-31996. This prevents reuse of conns freed
previously, e.g. during the commit of previous statement.
 
It does not occur in 10.4 because of the commit for MDEV-19002 removed
the call to spider_check_trx_and_get_conn().

for a 10.5 patch, see

d1110a81484 upstream/bb-10.5-mdev-29962 MDEV-29962 Spider: creates connections if needed before lock_tables()

Comment by Alexey Botchkov [ 2023-12-20 ]

please check if the spider_check_trx_and_get_conn() can only be called
when (lock_type != F_UNLCK).

Comment by Yuchen Pei [ 2023-12-21 ]

holyfoot, if I do that (i.e. apply the following diff on top, see commit 1b8db551a7d5dba054e80e74682c47899ac1b2af) I get heap-use-after-free when testing spider/bugfix.mdev_29962, so I think the function should be called regardless.

modified   storage/spider/ha_spider.cc
@@ -897,8 +897,9 @@ int ha_spider::external_lock(
     }
   }
 
-  if ((error_num= spider_check_trx_and_get_conn(thd, this)))
-    DBUG_RETURN(error_num);
+  if (lock_type != F_UNLCK)
+    if ((error_num= spider_check_trx_and_get_conn(thd, this)))
+      DBUG_RETURN(error_num);
   if (!partition_handler || !partition_handler->handlers)
   {
     DBUG_RETURN(lock_tables()); /* Non-partitioned table */

Generated at Thu Feb 08 10:12:36 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.