[MDEV-31475] UBSAN: runtime error: applying non-zero offset in spider_free_mem and SIGSEGV in spider_free_mem on SELECT Created: 2023-06-14  Updated: 2023-11-28

Status: Stalled
Project: MariaDB Server
Component/s: Partitioning, Storage Engine - Spider
Affects Version/s: 10.5, 10.6, 10.9, 10.10, 10.11, 11.0, 11.1, 11.2, 11.3
Fix Version/s: 10.5, 10.6, 10.11, 11.0, 11.1, 11.2

Type: Bug Priority: Major
Reporter: Roel Van de Paar Assignee: Yuchen Pei
Resolution: Unresolved Votes: 0
Labels: UBSAN, affects-tests, init, not-10.4, race

Issue Links:
Relates
relates to MDEV-27369 Renaming Spider table fails due to ER... Stalled

 Description   

SET sql_mode='';
INSTALL PLUGIN Spider SONAME 'ha_spider.so';
CREATE TABLE t (c INT) ENGINE=Spider PARTITION BY HASH (c) PARTITIONS 2;
INSERT INTO t VALUES (0);
INSERT INTO t VALUES (0);
SELECT * FROM t WHERE c=1;
ANALYZE TABLE t;
SELECT * FROM t;

Leads to:

11.1.0 4e5b771e980edfdad5c5414aa62c81d409d585a4 (Debug)

Core was generated by `/test/MD120523-mariadb-11.1.0-linux-x86_64-dbg/bin/mariadbd --no-defaults --cor'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  spider_free_mem (trx=0x14e9800545b8, ptr=ptr@entry=0x0, 
    my_flags=my_flags@entry=0)
    at /test/11.1_dbg/storage/spider/spd_malloc.cc:180
[Current thread is 1 (Thread 0x14e9e48dd640 (LWP 3317775))]
(gdb) bt
#0  spider_free_mem (trx=0x14e9800545b8, ptr=ptr@entry=0x0, my_flags=my_flags@entry=0) at /test/11.1_dbg/storage/spider/spd_malloc.cc:180
#1  0x000014e9e48132ac in ha_spider::~ha_spider (this=0x14e98008f6d0, __in_chrg=<optimized out>) at /test/11.1_dbg/storage/spider/ha_spider.cc:182
#2  0x000014e9e481338b in ha_spider::~ha_spider (this=0x14e98008f6d0, __in_chrg=<optimized out>) at /test/11.1_dbg/storage/spider/ha_spider.cc:187
#3  0x0000558f35f884ec in ha_partition::~ha_partition (this=0x14e98008e560, __in_chrg=<optimized out>) at /test/11.1_dbg/sql/ha_partition.cc:487
#4  0x0000558f35f88613 in ha_partition::~ha_partition (this=0x14e98008e560, __in_chrg=<optimized out>) at /test/11.1_dbg/sql/ha_partition.cc:501
#5  0x0000558f35b4189e in open_table_from_share (thd=thd@entry=0x14e980000d58, share=share@entry=0x14e980030f20, alias=alias@entry=0x14e980013860, db_stat=db_stat@entry=33, prgflag=prgflag@entry=8, ha_open_flags=<optimized out>, outparam=0x14e980032768, is_create_table=false, partitions_to_open=0x0) at /test/11.1_dbg/sql/table.cc:4643
#6  0x0000558f359a6e0d in open_table (thd=thd@entry=0x14e980000d58, table_list=table_list@entry=0x14e980013818, ot_ctx=ot_ctx@entry=0x14e9e48db7b0) at /test/11.1_dbg/sql/sql_base.cc:2210
#7  0x0000558f359aa86d in open_and_process_table (ot_ctx=0x14e9e48db7b0, has_prelocking_list=false, prelocking_strategy=0x14e9e48db8a0, flags=0, counter=0x14e9e48db84c, tables=0x14e980013818, thd=0x14e980000d58) at /test/11.1_dbg/sql/sql_base.cc:4140
#8  open_tables (thd=thd@entry=0x14e980000d58, options=@0x14e980006520: {m_options = DDL_options_st::OPT_NONE}, start=start@entry=0x14e9e48db838, counter=counter@entry=0x14e9e48db84c, flags=flags@entry=0, prelocking_strategy=prelocking_strategy@entry=0x14e9e48db8a0) at /test/11.1_dbg/sql/sql_base.cc:4627
#9  0x0000558f359ab70e in open_and_lock_tables (thd=thd@entry=0x14e980000d58, options=<optimized out>, tables=<optimized out>, tables@entry=0x14e980013818, derived=derived@entry=true, flags=flags@entry=0, prelocking_strategy=prelocking_strategy@entry=0x14e9e48db8a0) at /test/11.1_dbg/sql/sql_base.cc:5601
#10 0x0000558f35a19f7d in open_and_lock_tables (flags=0, derived=true, tables=0x14e980013818, thd=0x14e980000d58) at /test/11.1_dbg/sql/sql_base.h:525
#11 execute_sqlcom_select (thd=thd@entry=0x14e980000d58, all_tables=0x14e980013818) at /test/11.1_dbg/sql/sql_parse.cc:5944
#12 0x0000558f35a25a1c in mysql_execute_command (thd=thd@entry=0x14e980000d58, is_called_from_prepared_stmt=is_called_from_prepared_stmt@entry=false) at /test/11.1_dbg/sql/sql_parse.cc:3944
#13 0x0000558f35a2bfad in mysql_parse (thd=thd@entry=0x14e980000d58, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x14e9e48dc230) at /test/11.1_dbg/sql/sql_parse.cc:7760
#14 0x0000558f35a2e141 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x14e980000d58, packet=packet@entry=0x14e98000ae49 "SELECT * FROM t", packet_length=packet_length@entry=15, blocking=blocking@entry=true) at /test/11.1_dbg/sql/sql_class.h:242
#15 0x0000558f35a2ff9d in do_command (thd=0x14e980000d58, blocking=blocking@entry=true) at /test/11.1_dbg/sql/sql_parse.cc:1405
#16 0x0000558f35b81e5a in do_handle_one_connection (connect=<optimized out>, connect@entry=0x558f397cd4c8, put_in_cache=put_in_cache@entry=true) at /test/11.1_dbg/sql/sql_connect.cc:1416
#17 0x0000558f35b820b9 in handle_one_connection (arg=0x558f397cd4c8) at /test/11.1_dbg/sql/sql_connect.cc:1318
#18 0x000014ea03094b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#19 0x000014ea03126a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

And:

11.0.2 368dd22a816f3b437bccd0b9ff28b9de9b1abf0a (Debug, UBASAN)

/test/11.0_dbg_san/storage/spider/spd_malloc.cc:179:11: runtime error: applying non-zero offset 18446744073709551608 to null pointer
    #0 0x153db9652176 in spider_free_mem(st_spider_transaction*, void*, unsigned long) /test/11.0_dbg_san/storage/spider/spd_malloc.cc:179
    #1 0x153db967a713 in ha_spider::~ha_spider() /test/11.0_dbg_san/storage/spider/ha_spider.cc:182
    #2 0x153db967aae6 in ha_spider::~ha_spider() /test/11.0_dbg_san/storage/spider/ha_spider.cc:187
    #3 0x561000228724 in ha_partition::~ha_partition() /test/11.0_dbg_san/sql/ha_partition.cc:487
    #4 0x5610002290aa in ha_partition::~ha_partition() /test/11.0_dbg_san/sql/ha_partition.cc:501
    #5 0x560ffde58d68 in open_table_from_share(THD*, TABLE_SHARE*, st_mysql_const_lex_string const*, unsigned int, unsigned int, unsigned int, TABLE*, bool, List<String>*) /test/11.0_dbg_san/sql/table.cc:4613
    #6 0x560ffd1a43e3 in open_table(THD*, TABLE_LIST*, Open_table_context*) /test/11.0_dbg_san/sql/sql_base.cc:2178
    #7 0x560ffd1bbf14 in open_and_process_table /test/11.0_dbg_san/sql/sql_base.cc:4108
    #8 0x560ffd1bbf14 in open_tables(THD*, DDL_options_st const&, TABLE_LIST**, unsigned int*, unsigned int, Prelocking_strategy*) /test/11.0_dbg_san/sql/sql_base.cc:4595
    #9 0x560ffd1c2dbe in open_and_lock_tables(THD*, DDL_options_st const&, TABLE_LIST*, bool, unsigned int, Prelocking_strategy*) /test/11.0_dbg_san/sql/sql_base.cc:5570
    #10 0x560ffd587f9c in open_and_lock_tables(THD*, TABLE_LIST*, bool, unsigned int) /test/11.0_dbg_san/sql/sql_base.h:510
    #11 0x560ffd587f9c in execute_sqlcom_select /test/11.0_dbg_san/sql/sql_parse.cc:6199
    #12 0x560ffd5ebef5 in mysql_execute_command(THD*, bool) /test/11.0_dbg_san/sql/sql_parse.cc:3949
    #13 0x560ffd61b973 in mysql_parse(THD*, char*, unsigned int, Parser_state*) /test/11.0_dbg_san/sql/sql_parse.cc:8014
    #14 0x560ffd62b707 in dispatch_command(enum_server_command, THD*, char*, unsigned int, bool) /test/11.0_dbg_san/sql/sql_parse.cc:1894
    #15 0x560ffd639542 in do_command(THD*, bool) /test/11.0_dbg_san/sql/sql_parse.cc:1407
    #16 0x560ffe00e8b5 in do_handle_one_connection(CONNECT*, bool) /test/11.0_dbg_san/sql/sql_connect.cc:1416
    #17 0x560ffe00fdd0 in handle_one_connection /test/11.0_dbg_san/sql/sql_connect.cc:1318
    #18 0x153ddca94b42 in start_thread nptl/pthread_create.c:442
    #19 0x153ddcb269ff  (/lib/x86_64-linux-gnu/libc.so.6+0x1269ff)
 
230614 15:43:30 [ERROR] mysqld got signal 11 ;

Bug confirmed present in:
MariaDB: 10.5.21 (dbg), 10.5.21 (opt), 10.6.14 (dbg), 10.6.14 (opt), 10.9.7 (dbg), 10.9.7 (opt), 10.10.5 (dbg), 10.10.5 (opt), 10.11.4 (dbg), 10.11.4 (opt), 11.0.2 (dbg), 11.0.2 (opt), 11.1.0 (dbg), 11.1.0 (opt)

Bug (or feature/syntax) confirmed not present in:
MariaDB: 10.4.30 (dbg), 10.4.30 (opt)
MySQL: 5.5.62 (dbg), 5.5.62 (opt), 5.6.51 (dbg), 5.6.51 (opt), 5.7.40 (dbg), 8.0.33 (dbg), 8.0.33 (opt)



 Comments   
Comment by Yuchen Pei [ 2023-06-16 ]

Strange, I cannot reproduce, at 368dd22a816f3b437bccd0b9ff28b9de9b1abf0a, with both ubsan and asan on and the following testcase which passes:

SET sql_mode='';
INSTALL PLUGIN Spider SONAME 'ha_spider.so';
CREATE TABLE t (c INT) ENGINE=Spider PARTITION BY HASH (c) PARTITIONS 2;
--error ER_CONNECT_TO_FOREIGN_DATA_SOURCE
INSERT INTO t VALUES (0);
--error ER_CONNECT_TO_FOREIGN_DATA_SOURCE
INSERT INTO t VALUES (0);
--error ER_CONNECT_TO_FOREIGN_DATA_SOURCE
SELECT * FROM t WHERE c=1;
ANALYZE TABLE t;
--error ER_CONNECT_TO_FOREIGN_DATA_SOURCE
SELECT * FROM t;

Comment by Roel Van de Paar [ 2023-06-22 ]

Just tested on the latest 11.1 (as of today) on a debug (non-UB/ASAN) build and it still SIGSEGV's using the CLI:

11.1.2 3883eb63dc5e663558571c33d086c9fd3aa0cf8f (Debug)

11.1.2-dbg>SET sql_mode='';
Query OK, 0 rows affected (0.000 sec)
 
11.1.2-dbg>INSTALL PLUGIN Spider SONAME 'ha_spider.so';
Query OK, 0 rows affected, 1 warning (0.017 sec)
 
11.1.2-dbg>CREATE TABLE t (c INT) ENGINE=Spider PARTITION BY HASH (c) PARTITIONS 2;
Query OK, 0 rows affected (0.420 sec)
 
11.1.2-dbg>INSERT INTO t VALUES (0);
ERROR 1429 (HY000): Unable to connect to foreign data source: localhost
11.1.2-dbg>INSERT INTO t VALUES (0);
 
ERROR 1429 (HY000): Unable to connect to foreign data source: localhost
11.1.2-dbg>SELECT * FROM t WHERE c=1;
ERROR 1429 (HY000): Unable to connect to foreign data source: localhost
11.1.2-dbg>ANALYZE TABLE t;
+--------+---------+----------+----------+
| Table  | Op      | Msg_type | Msg_text |
+--------+---------+----------+----------+
| test.t | analyze | status   | OK       |
+--------+---------+----------+----------+
1 row in set (0.001 sec)
 
11.1.2-dbg>SELECT * FROM t;
ERROR 2013 (HY000): Lost connection to server during query

I tried to get this to work in MTR, however for some reason the SELECT does not crash in MTR, but returns ER_CONNECT_TO_FOREIGN_DATA_SOURCE (btw note --source include/have_partition.inc is required).

Comment by Roel Van de Paar [ 2023-06-23 ]

Also verified that the latest 11.1 (as of today) still produces the same UBSAN issue.

11.1.2 3883eb63dc5e663558571c33d086c9fd3aa0cf8f (Debug, UBASAN)

11.1.2-dbg>SET sql_mode='';
Query OK, 0 rows affected (0.001 sec)
 
11.1.2-dbg>INSTALL PLUGIN Spider SONAME 'ha_spider.so';
Query OK, 0 rows affected, 1 warning (0.076 sec)
 
11.1.2-dbg>CREATE TABLE t (c INT) ENGINE=Spider PARTITION BY HASH (c) PARTITIONS 2;
Query OK, 0 rows affected (0.481 sec)
 
11.1.2-dbg>INSERT INTO t VALUES (0);
ERROR 1429 (HY000): Unable to connect to foreign data source: localhost
11.1.2-dbg>INSERT INTO t VALUES (0);
 
ERROR 1429 (HY000): Unable to connect to foreign data source: localhost
11.1.2-dbg>SELECT * FROM t WHERE c=1;
ERROR 1429 (HY000): Unable to connect to foreign data source: localhost
11.1.2-dbg>ANALYZE TABLE t;
+--------+---------+----------+----------+
| Table  | Op      | Msg_type | Msg_text |
+--------+---------+----------+----------+
| test.t | analyze | status   | OK       |
+--------+---------+----------+----------+
1 row in set (0.002 sec)
 
11.1.2-dbg>SELECT * FROM t;
ERROR 2013 (HY000): Lost connection to server during query

Produces

UBSAN|applying non-zero offset X to null pointer|storage/spider/spd_malloc.cc|spider_free_mem|ha_spider::~ha_spider|ha_spider::~ha_spider|ha_partition::~ha_partition

Comment by Roel Van de Paar [ 2023-06-23 ]

Discussed further with ycp.
We found that the same SQL behaves differently in MTR than in a full server/client setup. For example in MTR we see:

11.1.2 3883eb63dc5e663558571c33d086c9fd3aa0cf8f (Debug)

MariaDB [test]> ANALYZE TABLE t;
+--------+---------+----------+-----------------------------------------------------+
| Table  | Op      | Msg_type | Msg_text                                            |
+--------+---------+----------+-----------------------------------------------------+
| test.t | analyze | Error    | Unable to connect to foreign data source: localhost |
| test.t | analyze | status   | Operation failed                                    |
+--------+---------+----------+-----------------------------------------------------+
2 rows in set (1.121 sec)

Whilst in a full mariadbd/mariadb setup we see:

11.1.2 3883eb63dc5e663558571c33d086c9fd3aa0cf8f (Debug)

11.1.2-dbg>ANALYZE TABLE t;
+--------+---------+----------+----------+
| Table  | Op      | Msg_type | Msg_text |
+--------+---------+----------+----------+
| test.t | analyze | status   | OK       |
+--------+---------+----------+----------+
1 row in set (0.001 sec)

Additionally, when copying commands one-by-one from the original testcase above, we get:

11.1.2 3883eb63dc5e663558571c33d086c9fd3aa0cf8f (Debug)

11.1.2-dbg>SELECT * FROM t;
ERROR 1429 (HY000): Unable to connect to foreign data source: localhost

Whereas when the total testcase is pasted in full, the crash happens with consistency. The issue is thus confirmed timing related.
Thanks to ycp we also tried the Spider init (and sql_mode), then wait, then paste in the rest at once, and the server also does not crash.
It could thus be a Spider init race condition.

Comment by Roel Van de Paar [ 2023-06-23 ]

Additional testing shows that 10.9 at both 3e639612451 and d8997f875e2d78300999876e25d348cf6ad3f73e are affected. It is thus likely a different Spider init bug as the issues that were addressed in commit 3e639612451.

Comment by Roel Van de Paar [ 2023-10-24 ]

Additional testcase

INSTALL PLUGIN Spider SONAME 'ha_spider.so';
CREATE SERVER srv FOREIGN DATA WRAPPER mysql OPTIONS (SOCKET '../socket.sock', DATABASE 'test', USER 'Spider', PASSWORD '');
CREATE TABLE t1 (a INT, b VARCHAR(255), PRIMARY KEY(a)) ENGINE=Spider PARTITION BY RANGE (a) (PARTITION p1 VALUES LESS THAN (3), PARTITION p2 VALUES LESS THAN MAXVALUE REMOTE_SERVER="srv");
INSERT INTO t VALUES (0,NULL,'a'), (1,'B','b'), (2,0,'c');
DROP SERVER srv;
SELECT * FROM t1;

Comment by Roel Van de Paar [ 2023-10-26 ]

Additional testcase using SPIDER_IGNORE_COMMENTS:

INSTALL PLUGIN Spider SONAME 'ha_spider.so';
CREATE SERVER srv FOREIGN DATA WRAPPER MYSQL OPTIONS (SOCKET '../socket.sock', DATABASE 'test', USER 'Spider', PASSWORD '');
SET SESSION SPIDER_IGNORE_COMMENTS=1;
CREATE TABLE t1 (a INT, b VARCHAR(255), PRIMARY KEY(a)) ENGINE=Spider COMMENT='tbl "t1"' PARTITION BY RANGE (a) (PARTITION p1 VALUES LESS THAN (3) COMMENT='srv "s_2_1"', PARTITION p2 VALUES LESS THAN MAXVALUE REMOTE_SERVER="srv");
DROP SERVER srv;
SELECT * FROM t1;

SIGSEGV|spider_free_mem|ha_spider::~ha_spider|ha_spider::~ha_spider|ha_partition::~ha_partition

Generated at Thu Feb 08 10:24:10 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.