[MDEV-30083] SIGSEGV's in federatedx_txn::sp_acquire, federatedx_txn::acquire and federatedx_txn::txn_rollback on SAVEPOINT Created: 2022-11-24  Updated: 2023-11-28

Status: Open
Project: MariaDB Server
Component/s: Storage Engine - Federated
Affects Version/s: 10.5, 10.6, 10.7, 10.8, 10.9, 10.10, 10.11
Fix Version/s: 10.5, 10.6

Type: Bug Priority: Major
Reporter: Roel Van de Paar Assignee: Oleksandr Byelkin
Resolution: Unresolved Votes: 0
Labels: crash, regression-10.5, thread_hang

Issue Links:
Relates
relates to MDEV-29178 Assertion `sp > last_savepoint()' fai... Confirmed
relates to MDEV-29214 ASAN errors in federatedx_txn::txn_co... Confirmed
relates to MDEV-30410 SIGSEGV in ___pthread_mutex_lock from... Open

 Description   

This bug has some similarities with both MDEV-29178 and MDEV-29214.
It is similar to MDEV-29178 in it's connection with savepoints/stack frames, and in MDEV-29214 in various ways;

  1. The testcase is similar, but not the same, especially the crashing statement differs.
  2. Debug versions crash on the same assert server->io_count == 0 with the testcase given here. However, this is likely the result of a seperate bug (i.e. MDEV-29214) being hit first, as the crash is on the second last statement (which matches the crashing query of MDEV-29214)
  3. On optimized builds, a new SIGSEGV is seen in federatedx_txn::sp_acquire

Given the new SIGSEGV seen, the fact that this bug is 10.5+ only and that the crashing query is different, this looks to be a completely separate issue.
The issue is lightly sporadic.

SET GLOBAL table_open_cache=1;
INSTALL SONAME 'ha_federatedx.so';
CREATE SERVER srv FOREIGN DATA WRAPPER MYSQL OPTIONS (SOCKET '../socket.sock',DATABASE 'test',user 'federatedx',PASSWORD'');
CREATE TABLE t2 (c INT) ENGINE=InnoDB;
CREATE TABLE t CONNECTION='srv/t2' ENGINE=FEDERATED;
RENAME TABLE t TO t3;  # Likely point where the wrong situation is created
CREATE TABLE t (s INT) ENGINE=InnoDB;
XA START 0x1;
INSERT INTO t VALUES (9);
SELECT * FROM mysql.roles_mapping;
INSERT INTO t3 VALUES();
UPDATE t2 SET d=0;
SELECT * FROM mysql.roles_mapping;
SELECT fn1 (1e);
HANDLER t OPEN;
INSERT INTO t SELECT * FROM t;
DELETE FROM mysql.tables_priv;
SAVEPOINT s;

Leads to:

10.10.2 5deccac4aaf1be948a0ae10f40bb5f668ac37a4d (Optimized)

Core was generated by `/test/MD190922-mariadb-10.10.2-linux-x86_64-opt/bin/mysqld --no-defaults --core'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000014ea70029d71 in federatedx_txn::sp_acquire (
    this=this@entry=0x14ea10051ed0, sp=sp@entry=0x14ea100147d0)
    at /test/10.10_opt/storage/federatedx/federatedx_txn.cc:301
[Current thread is 1 (Thread 0x14ea480a8700 (LWP 2613257))]
(gdb) bt
#0  0x000014ea70029d71 in federatedx_txn::sp_acquire (this=this@entry=0x14ea10051ed0, sp=sp@entry=0x14ea100147d0) at /test/10.10_opt/storage/federatedx/federatedx_txn.cc:301
#1  0x000014ea700221c9 in ha_federatedx::savepoint_set (hton=0x55bf6a5b51a8, thd=0x14ea10000c58, sv=0x14ea100147d0) at /test/10.10_opt/storage/federatedx/ha_federatedx.cc:3546
#2  0x000055bf6746cb0c in ha_savepoint (thd=thd@entry=0x14ea10000c58, sv=sv@entry=0x14ea10014740) at /test/10.10_opt/sql/handler.cc:2935
#3  0x000055bf6734d84b in trans_savepoint (thd=thd@entry=0x14ea10000c58, name={str = 0x14ea10010798 "s", length = 1}) at /test/10.10_opt/sql/transaction.cc:617
#4  0x000055bf67229a5a in mysql_execute_command (thd=0x14ea10000c58, is_called_from_prepared_stmt=<optimized out>) at /test/10.10_opt/sql/sql_parse.cc:5705
#5  0x000055bf67217bb5 in mysql_parse (rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>, thd=0x14ea10000c58) at /test/10.10_opt/sql/sql_parse.cc:8035
#6  mysql_parse (thd=0x14ea10000c58, rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>) at /test/10.10_opt/sql/sql_parse.cc:7957
#7  0x000055bf6722370a in dispatch_command (command=COM_QUERY, thd=0x14ea10000c58, packet=<optimized out>, packet_length=<optimized out>, blocking=<optimized out>) at /test/10.10_opt/sql/sql_class.h:1345
#8  0x000055bf67225652 in do_command (thd=0x14ea10000c58, blocking=blocking@entry=true) at /test/10.10_opt/sql/sql_parse.cc:1407
#9  0x000055bf6733d97f in do_handle_one_connection (connect=<optimized out>, connect@entry=0x55bf6a5bfd48, put_in_cache=put_in_cache@entry=true) at /test/10.10_opt/sql/sql_connect.cc:1416
#10 0x000055bf6733dc5d in handle_one_connection (arg=0x55bf6a5bfd48) at /test/10.10_opt/sql/sql_connect.cc:1318
#11 0x000014ea74565609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#12 0x000014ea74151133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

There were also a number of different crashes observed in a 10.6.10 optimized build when executing the testcase one to two times. First there was this crash (seen on a single execution of testcase):

10.6.10 5e270ca28d05acb72c6aec9f1d37f9610fc11a0e (Optimized)

Core was generated by `/test/MD190922-mariadb-10.6.10-linux-x86_64-opt/bin/mysqld --no-defaults --core'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00001489e4035db6 in federatedx_txn::txn_rollback (this=0x148978035500)
    at /test/10.6_opt/storage/federatedx/federatedx_txn.cc:270
[Current thread is 1 (Thread 0x1489c0129700 (LWP 2613280))]
(gdb) bt
#0  0x00001489e4035db6 in federatedx_txn::txn_rollback (this=0x148978035500) at /test/10.6_opt/storage/federatedx/federatedx_txn.cc:270
#1  federatedx_txn::txn_rollback (this=0x148978035500) at /test/10.6_opt/storage/federatedx/federatedx_txn.cc:258
#2  0x0000555ce9510f20 in ha_rollback_trans (thd=thd@entry=0x148978000c58, all=all@entry=true) at /test/10.6_opt/sql/handler.cc:2182
#3  0x0000555ce949766c in xa_trans_force_rollback (thd=thd@entry=0x148978000c58) at /test/10.6_opt/sql/xa.cc:393
#4  0x0000555ce94989ef in trans_xa_detach (thd=thd@entry=0x148978000c58) at /test/10.6_opt/sql/xa.cc:830
#5  0x0000555ce9294014 in THD::cleanup (this=this@entry=0x148978000c58) at /test/10.6_opt/sql/sql_class.cc:1548
#6  0x0000555ce920efc9 in unlink_thd (thd=0x148978000c58) at /test/10.6_opt/sql/mysqld.cc:2726
#7  0x0000555ce93ef621 in do_handle_one_connection (connect=<optimized out>, connect@entry=0x555cec860e58, put_in_cache=put_in_cache@entry=true) at /test/10.6_opt/sql/sql_connect.cc:1427
#8  0x0000555ce93efacd in handle_one_connection (arg=0x555cec860e58) at /test/10.6_opt/sql/sql_connect.cc:1318
#9  0x00001489eda2d609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#10 0x00001489ed619133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Another attempt produced this odd error on the first execution of the testcase:

10.6.10 5e270ca28d05acb72c6aec9f1d37f9610fc11a0e (Optimized)

ERROR 1178 (42000) at line 18 in file: 'in.sql': The storage engine for the table doesn't support SAVEPOINT

At that point (i.e. when this situation is seen), executing the testcase again produces a different stack:

10.6.10 5e270ca28d05acb72c6aec9f1d37f9610fc11a0e (Optimized)

Core was generated by `/test/MD190922-mariadb-10.6.10-linux-x86_64-opt/bin/mysqld --no-defaults --core'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  federatedx_txn::acquire (this=0x151628035580, share=0x151628019510, 
    thd=thd@entry=0x151628000c58, readonly=readonly@entry=true, 
    ioptr=ioptr@entry=0x1516280206d8)
    at /test/10.6_opt/storage/federatedx/federatedx_txn.cc:108
[Current thread is 1 (Thread 0x15164b5b0700 (LWP 598564))]
(gdb) bt
#0  federatedx_txn::acquire (this=0x151628035580, share=0x151628019510, thd=thd@entry=0x151628000c58, readonly=readonly@entry=true, ioptr=ioptr@entry=0x1516280206d8) at /test/10.6_opt/storage/federatedx/federatedx_txn.cc:108
#1  0x000015164b5527ed in ha_federatedx::open (this=0x151628020190, name=<optimized out>, mode=<optimized out>, test_if_locked=<optimized out>) at /test/10.6_opt/storage/federatedx/ha_federatedx.cc:1804
#2  0x000055c00134b2ce in handler::ha_open (this=0x151628020190, table_arg=table_arg@entry=0x151630009c68, name=0x15162807a3a0 "./test/t3", mode=mode@entry=2, test_if_locked=test_if_locked@entry=18, mem_root=mem_root@entry=0x0, partitions_to_open=0x0) at /test/10.6_opt/sql/handler.cc:3334
#3  0x000055c0011fa9dc in open_table_from_share (thd=thd@entry=0x151628000c58, share=share@entry=0x151628079f00, alias=alias@entry=0x151628010768, db_stat=db_stat@entry=33, prgflag=prgflag@entry=8, ha_open_flags=18, outparam=0x151630009c68, is_create_table=false, partitions_to_open=0x0) at /test/10.6_opt/sql/table.cc:4413
#4  0x000055c0010b5c5a in open_table (thd=0x151628000c58, table_list=0x151628010720, ot_ctx=0x15164b5aead0) at /test/10.6_opt/sql/sql_base.cc:2035
#5  0x000055c0010b897a in open_and_process_table (ot_ctx=0x15164b5aead0, has_prelocking_list=false, prelocking_strategy=0x15164b5aec90, flags=0, counter=0x15164b5aeb6c, tables=0x151628010720, thd=0x151628000c58) at /test/10.6_opt/sql/sql_base.cc:3845
#6  open_tables (thd=thd@entry=0x151628000c58, options=@0x1516280060f0: {m_options = DDL_options_st::OPT_NONE}, start=start@entry=0x15164b5aeb58, counter=counter@entry=0x15164b5aeb6c, flags=flags@entry=0, prelocking_strategy=0x15164b5aec90) at /test/10.6_opt/sql/sql_base.cc:4328
#7  0x000055c0010b912a in open_and_lock_tables (thd=thd@entry=0x151628000c58, options=<optimized out>, tables=<optimized out>, tables@entry=0x151628010720, derived=derived@entry=true, flags=flags@entry=0, prelocking_strategy=<optimized out>) at /test/10.6_opt/sql/sql_base.cc:5301
#8  0x000055c0010e8cce in open_and_lock_tables (flags=<optimized out>, derived=<optimized out>, tables=<optimized out>, thd=<optimized out>) at /test/10.6_opt/sql/sql_base.h:509
#9  open_and_lock_for_insert_delayed (table_list=<optimized out>, thd=<optimized out>) at /test/10.6_opt/sql/sql_insert.cc:626
#10 mysql_insert (thd=thd@entry=0x151628000c58, table_list=0x151628010720, fields=@0x151628005aa8: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x55c002185610 <end_of_list>, last = 0x151628005aa8, elements = 0}, <No data fields>}, values_list=@0x151628005af0: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x151628011298, last = 0x151628011298, elements = 1}, <No data fields>}, update_fields=@0x151628005ad8: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x55c002185610 <end_of_list>, last = 0x151628005ad8, elements = 0}, <No data fields>}, update_values=@0x151628005ac0: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x55c002185610 <end_of_list>, last = 0x151628005ac0, elements = 0}, <No data fields>}, duplic=<optimized out>, ignore=<optimized out>, result=<optimized out>) at /test/10.6_opt/sql/sql_insert.cc:752
#11 0x000055c0011238b4 in mysql_execute_command (thd=0x151628000c58, is_called_from_prepared_stmt=<optimized out>) at /test/10.6_opt/sql/sql_parse.cc:4565
#12 0x000055c001113df5 in mysql_parse (rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>, thd=0x151628000c58) at /test/10.6_opt/sql/sql_parse.cc:8030
#13 mysql_parse (thd=0x151628000c58, rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>) at /test/10.6_opt/sql/sql_parse.cc:7952
#14 0x000055c00111fa3a in dispatch_command (command=COM_QUERY, thd=0x151628000c58, packet=<optimized out>, packet_length=<optimized out>, blocking=<optimized out>) at /test/10.6_opt/sql/sql_class.h:1364
#15 0x000055c001121982 in do_command (thd=0x151628000c58, blocking=blocking@entry=true) at /test/10.6_opt/sql/sql_parse.cc:1409
#16 0x000055c0012277ef in do_handle_one_connection (connect=<optimized out>, connect@entry=0x55c0030bbe68, put_in_cache=put_in_cache@entry=true) at /test/10.6_opt/sql/sql_connect.cc:1416
#17 0x000055c001227acd in handle_one_connection (arg=0x55c0030bbe68) at /test/10.6_opt/sql/sql_connect.cc:1318
#18 0x0000151677e44609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#19 0x0000151677a30133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Bug confirmed present in:
MariaDB: 10.5.18 (opt), 10.6.10 (opt), 10.7.6 (opt), 10.8.5 (opt), 10.9.3 (opt), 10.10.2 (opt), 10.11.2 (opt)

Due to MDEV-29214 (i.e. early crash in debug builds on the assert seen there), it cannot be determined if this bug is present in:
MariaDB: 10.5.18 (dbg), 10.6.10 (dbg), 10.7.6 (dbg), 10.8.5 (dbg), 10.9.3 (dbg), 10.10.2 (dbg), 10.11.2 (dbg)

Bug (or feature/syntax) confirmed not present in:
MariaDB: 10.3.37 (dbg), 10.3.37 (opt), 10.4.27 (dbg), 10.4.27 (opt)
MySQL: 5.5.62 (dbg), 5.5.62 (opt), 5.6.51 (dbg), 5.6.51 (opt), 5.7.38 (dbg), 5.7.38 (opt), 8.0.29 (dbg), 8.0.29 (opt)



 Comments   
Comment by Roel Van de Paar [ 2022-11-24 ]

MTR Testcase

--let $SOCKET= `SELECT @@global.socket`
--source include/have_innodb.inc
SET GLOBAL table_open_cache=1;
INSTALL SONAME 'ha_federatedx.so';
CREATE USER federatedx@localhost IDENTIFIED BY 'a';
GRANT ALL ON test.* TO federatedx@localhost;
eval CREATE SERVER srv FOREIGN DATA WRAPPER MYSQL OPTIONS (SOCKET "$SOCKET",DATABASE 'test',user 'federatedx',PASSWORD 'a');
CREATE TABLE t2 (c INT) ENGINE=InnoDB;
CREATE TABLE t CONNECTION='srv/t2' ENGINE=FEDERATED;
RENAME TABLE t TO t3;  # Likely point where the wrong situation is created
CREATE TABLE t (s INT) ENGINE=InnoDB;
XA START 0x1;
INSERT INTO t VALUES (9);
SELECT * FROM mysql.roles_mapping;
INSERT INTO t3 VALUES();
--error 1054
UPDATE t2 SET d=0;
SELECT * FROM mysql.roles_mapping;
--error 1305
SELECT fn1 (1e);
HANDLER t OPEN;
INSERT INTO t SELECT * FROM t;
DELETE FROM mysql.tables_priv;
SAVEPOINT s;

Comment by Roel Van de Paar [ 2022-11-24 ]

Changing the UPDATE to refer to an existing column:

UPDATE t2 SET d=0;

Leads to a thread hang similar to the one seen in MDEV-29214, but is likely otherwise unrelated to the SAVEPOINT crashes described in this bug.

10.11.2 8283948846740a22f96bbe7bccf250708406d5d9 (Optimized)

10.11.2-opt>SHOW PROCESSLIST;
+----+------------+-----------+------+---------+------+----------+-------------------+----------+
| Id | User       | Host      | db   | Command | Time | State    | Info              | Progress |
+----+------------+-----------+------+---------+------+----------+-------------------+----------+
|  4 | root       | localhost | test | Query   |   27 | Updating | UPDATE t2 SET c=0 |    0.000 |
|  6 | federatedx | localhost | test | Sleep   |   28 |          | NULL              |    0.000 |
|  7 | root       | localhost | test | Query   |    0 | starting | SHOW PROCESSLIST  |    0.000 |
+----+------------+-----------+------+---------+------+----------+-------------------+----------+
3 rows in set (0.000 sec)

Generated at Thu Feb 08 10:13:31 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.