[MDEV-29053] safe_mutex: Trying to destroy a mutex cache->lock that was locked at innobase/fts/fts0fts.cc on shutdown Created: 2022-07-07  Updated: 2022-09-14  Resolved: 2022-09-14

Status: Closed
Project: MariaDB Server
Component/s: Full-text Search, Locking
Affects Version/s: 10.6, 10.7, 10.8, 10.9, 10.10, 10.11
Fix Version/s: 10.6.10, 10.7.6, 10.8.5, 10.9.3, 10.10.2

Type: Bug Priority: Major
Reporter: Roel Van de Paar Assignee: Marko Mäkelä
Resolution: Duplicate Votes: 0
Labels: locking, not-10.5, regression-10.6, sporadic, stack-smashing

Issue Links:
Duplicate
duplicates MDEV-29342 Assertion failure in file que0que.cc ... Closed
Relates
relates to MDEV-29277 On error, fts_sync_table() fails to r... Closed

 Description   

Debug discussion with Thiru. Seen in bb-10.10-MDEV-28883, but likely unrelated as per discussions with Thiru. Core+mysqld+libs+error log sent via drive/email.

10.10.0 2db18fdb3d68d906fbd188ec570a64502ba55849 (Debug)

(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x0000153a4b571859 in __GI_abort () at abort.c:79
#2  0x000055eb6d644864 in safe_mutex_destroy (mp=mp@entry=0x1539c84caf50, 
    file=file@entry=0x55eb6da857d8 "/test/bb-10.10-MDEV-28883_dbg/storage/innobase/fts/fts0fts.cc", line=line@entry=274)
    at /test/bb-10.10-MDEV-28883_dbg/mysys/thr_mutex.c:602
#3  0x000055eb6d57deef in inline_mysql_mutex_destroy (src_line=274, 
    src_file=0x55eb6da857d8 "/test/bb-10.10-MDEV-28883_dbg/storage/innobase/fts/fts0fts.cc", that=0x1539c84caf50)
    at /test/bb-10.10-MDEV-28883_dbg/include/mysql/psi/mysql_thread.h:724
#4  fts_cache_destroy (cache=0x1539c84caf50) at /test/bb-10.10-MDEV-28883_dbg/storage/innobase/fts/fts0fts.cc:274
#5  0x000055eb6d5828f0 in fts_t::~fts_t (this=0x1539c8201250, __in_chrg=<optimized out>)
    at /test/bb-10.10-MDEV-28883_dbg/storage/innobase/fts/fts0fts.cc:5080
#6  0x000055eb6d58295c in fts_free (table=table@entry=0x1539c84c5790)
    at /test/bb-10.10-MDEV-28883_dbg/storage/innobase/fts/fts0fts.cc:5117
#7  0x000055eb6d5147f5 in dict_sys_t::remove (this=this@entry=0x55eb6e05c940 <dict_sys>, table=0x1539c84c5790, 
    lru=lru@entry=false, keep=keep@entry=false) at /test/bb-10.10-MDEV-28883_dbg/storage/innobase/dict/dict0dict.cc:1957
#8  0x000055eb6d514cfb in dict_sys_t::close (this=0x55eb6e05c940 <dict_sys>)
    at /test/bb-10.10-MDEV-28883_dbg/storage/innobase/dict/dict0dict.cc:4592
#9  0x000055eb6d4025f3 in innodb_shutdown () at /test/bb-10.10-MDEV-28883_dbg/storage/innobase/srv/srv0start.cc:1765
#10 0x000055eb6d1fd64c in innobase_end () at /test/bb-10.10-MDEV-28883_dbg/storage/innobase/handler/ha_innodb.cc:4310
#11 0x000055eb6cefdff9 in ha_finalize_handlerton (plugin=0x55eb6fecc900) at /test/bb-10.10-MDEV-28883_dbg/sql/handler.cc:596
#12 0x000055eb6cc3354a in plugin_deinitialize (plugin=0x55eb6fecc900, ref_check=ref_check@entry=true)
    at /test/bb-10.10-MDEV-28883_dbg/sql/sql_plugin.cc:1269
#13 0x000055eb6cc3742b in reap_plugins () at /test/bb-10.10-MDEV-28883_dbg/sql/sql_plugin.cc:1343
#14 0x000055eb6cc37f0e in plugin_shutdown () at /test/bb-10.10-MDEV-28883_dbg/sql/sql_plugin.cc:2051
#15 0x000055eb6cadd2b0 in clean_up (print_message=print_message@entry=true) at /test/bb-10.10-MDEV-28883_dbg/sql/mysqld.cc:1987
#16 0x000055eb6cae966c in mysqld_main (argc=<optimized out>, argv=<optimized out>)
    at /test/bb-10.10-MDEV-28883_dbg/sql/mysqld.cc:5937
#17 0x000055eb6cadcb66 in main (argc=<optimized out>, argv=<optimized out>) at /test/bb-10.10-MDEV-28883_dbg/sql/main.cc:34

10.10.0 2db18fdb3d68d906fbd188ec570a64502ba55849 (Debug)

2022-07-02  5:06:35 0 [ERROR] InnoDB: (Too many concurrent transactions) while getting next doc id for table `test`.`articles`
2022-07-02  5:06:35 0 [ERROR] InnoDB: (Too many concurrent transactions) during SYNC of table `test`.`articles`
2022-07-02  5:06:35 0 [Note] InnoDB: FTS optimize thread exiting.
2022-07-02  5:06:35 0 [Note] InnoDB: Starting shutdown...
safe_mutex: Found wrong usage of mutex 'cache->lock' and 'LOCK_global_system_variables'
Mutex currently locked (in reverse order):
LOCK_global_system_variables      /test/bb-10.10-MDEV-28883_dbg/storage/innobase/buf/buf0dump.cc  line 182
cache->lock                       /test/bb-10.10-MDEV-28883_dbg/storage/innobase/fts/fts0fts.cc  line 4077
2022-07-02  5:06:35 0 [Note] InnoDB: Dumping buffer pool(s) to /dev/shm/130567/9332/data/ib_buffer_pool
2022-07-02  5:06:35 0 [Note] InnoDB: Buffer pool(s) dump completed at 220702  5:06:35
safe_mutex: Trying to destroy a mutex cache->lock that was locked at /test/bb-10.10-MDEV-28883_dbg/storage/innobase/fts/fts0fts.cc, line 4077 at /test/bb-10.10-MDEV-28883_dbg/storage/innobase/fts/fts0fts.cc, line 274



 Comments   
Comment by Marko Mäkelä [ 2022-07-26 ]

Roel, can you please try to reproduce this on plain 10.10 or any older main branch? I don’t think that there have been significant changes to this code since 10.6.

Comment by Roel Van de Paar [ 2022-07-26 ]

marko Agreed it is unlikely bb-10.10-MDEV-28883 (and same confirmed unlikely by thiru).
No testcase, only core etc. which was passed to thiru.

Comment by Thirunarayanan Balathandayuthapani [ 2022-08-02 ]

I read the code and don't see any relation between cache->lock and LOCK_global_system_variables. From the error message, shutdown is happening when insert does fts_sync(). I failed to repeat the scenario locally by writing the mtr test case. It would be great if you repeat the issue in plain 10.10 or any branch.

Comment by Roel Van de Paar [ 2022-08-09 ]

Thank you thiru for the great efforts.

Comment by Roel Van de Paar [ 2022-09-13 ]

I searched through all trials, and found a new set of occurrences (~5) of 'Found wrong usage of mutex', of which only a single one one reproduced and reduced. Here is the reduced testcase, which looks very similar to the first one added in MDEV-29277 (which is equally hard to reproduce). It is likely the two bugs are connected (ref below), however reducing towards 'Found wrong usage of mutex' seems to result in a slightly different testcase than the one added in MDEV-29277, and only the single trial I had with the (original run) UniqueID/stack listed below would actually reduce towards 'Found wrong usage of mutex' unlike the others which rather saw the UniqueID/stack in MDEV-29277, even though that the same string/text was present in their error logs. Only a single client thread running the SQL is required. Additionally, when I replayed this SQL against all versions, 10.6+ showed affected, but at that point the only stack returned was the one from MDEV-29277 (with this testcase), specifically UniqueID table->get_ref_count() == 0|SIGABRT|Backtrace stopped: Cannot access memory at address as described in MDEV-29277.

# mysqld options required for replay: --log-bin
CREATE TABLE t (s CHAR(255),FULLTEXT (s)) DEFAULT CHARSET=utf8;
INSERT INTO t VALUES (10009);
SET GLOBAL binlog_checksum=NONE,innodb_trx_rseg_n_slots_debug=1,@@SESSION.pseudo_slave_mode=ON;
SHUTDOWN;
SET GLOBAL spider_buffer_pool_filename=NULL;

This leads to:

10.11.0 8f9df08f02294f4828d40ef0a298dc0e72b01f60 (Debug)

2022-09-13 21:25:06 0 [Note] /test/MD130922-mariadb-10.11.0-linux-x86_64-dbg/bin/mysqld: ready for connections.
Version: '10.11.0-MariaDB-debug-log'  socket: '/test/MD130922-mariadb-10.11.0-linux-x86_64-dbg/socket.sock'  port: 10525  MariaDB Server
2022-09-13 21:25:33 0 [Note] /test/MD130922-mariadb-10.11.0-linux-x86_64-dbg/bin/mysqld (initiated by: root[root] @ localhost []): Normal shutdown
2022-09-13 21:25:33 0 [Warning] InnoDB: Cannot find a free slot for an undo log. Do you have too many active transactions running concurrently?
2022-09-13 21:25:33 0 [Warning] InnoDB: Cannot find a free slot for an undo log. Do you have too many active transactions running concurrently?
2022-09-13 21:25:33 0 [ERROR] InnoDB: (Too many concurrent transactions) while getting next doc id for table `test`.`t`
2022-09-13 21:25:33 0 [ERROR] InnoDB: (Too many concurrent transactions) during SYNC of table `test`.`t`
2022-09-13 21:25:33 0 [Note] InnoDB: FTS optimize thread exiting.
2022-09-13 21:25:33 0 [Note] InnoDB: Starting shutdown...
safe_mutex: Found wrong usage of mutex 'cache->lock' and 'LOCK_global_system_variables'
Mutex currently locked (in reverse order):
LOCK_global_system_variables      /test/10.11_dbg/storage/innobase/buf/buf0dump.cc  line 182
cache->lock                       /test/10.11_dbg/storage/innobase/fts/fts0fts.cc  line 4077
2022-09-13 21:25:33 0 [Note] InnoDB: Dumping buffer pool(s) to /test/MD130922-mariadb-10.11.0-linux-x86_64-dbg/data/ib_buffer_pool
2022-09-13 21:25:33 0 [Note] InnoDB: Buffer pool(s) dump completed at 220913 21:25:33
safe_mutex: Trying to destroy a mutex cache->lock that was locked at /test/10.11_dbg/storage/innobase/fts/fts0fts.cc, line 4077 at /test/10.11_dbg/storage/innobase/fts/fts0fts.cc, line 274
# This is the actual end of the log (< this line was added by me)

SIGABRT|safe_mutex_destroy|inline_mysql_mutex_destroy|fts_cache_destroy|fts_t::~fts_t

Comment by Roel Van de Paar [ 2022-09-13 ]

The way in which the testcase is executed may matter significantly. For the above, direct paste into the CLI was used. SOURCE in.sql also seems to work.

Comment by Roel Van de Paar [ 2022-09-13 ]

I manually rechecked versions affected and can say with a reasonable level of certitude that only 10.6+ is affected.

Comment by Marko Mäkelä [ 2022-09-14 ]

This test case will only reproduce MDEV-29277 for me:

--source include/have_innodb.inc
--source include/have_log_bin.inc
CREATE TABLE t (s CHAR(255),FULLTEXT (s)) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO t VALUES (10009);
SET GLOBAL binlog_checksum=NONE,innodb_trx_rseg_n_slots_debug=1,@@SESSION.pseudo_slave_mode=ON;
--source include/restart_mysqld.inc
DROP TABLE t;

10.6 fd0bdd3180a7d5f4b9804d372d6a63b6a202818c

#4  0x000056247ca511c6 in ut_dbg_assertion_failed (expr=expr@entry=0x56247baa3d52 "table->get_ref_count() == 0", file=file@entry=0x56247bb614c0 "/mariadb/10.6/storage/innobase/dict/dict0dict.cc", 
    line=line@entry=1860) at /mariadb/10.6/storage/innobase/ut/ut0dbg.cc:60
#5  0x000056247c87d733 in dict_sys_t::remove (this=this@entry=0x56247cce0980 <dict_sys>, table=0x7fa99c116178, lru=lru@entry=false, keep=keep@entry=false)
    at /mariadb/10.6/storage/innobase/dict/dict0dict.cc:1860
#6  0x000056247c87dc0c in dict_sys_t::close (this=0x56247cce0980 <dict_sys>) at /mariadb/10.6/storage/innobase/dict/dict0dict.cc:4574
#7  0x000056247ca24470 in innodb_shutdown () at /mariadb/10.6/storage/innobase/srv/srv0start.cc:2045
#8  0x000056247c7c4e57 in innobase_end () at /mariadb/10.6/storage/innobase/handler/ha_innodb.cc:4394

If I revert the 10.6 part of the MDEV-29342 fix or check out the parent revision of that fix, then it will fail as described:

10.6 1872a142b5f827758a68e7ee745b3451f5dbea2e

2022-09-14 16:35:57 0 [Warning] InnoDB: Cannot find a free slot for an undo log. Do you have too many active transactions running concurrently?
2022-09-14 16:35:57 0 [Warning] InnoDB: Cannot find a free slot for an undo log. Do you have too many active transactions running concurrently?
2022-09-14 16:35:57 0 [ERROR] InnoDB: (Too many concurrent transactions) while getting next doc id for table `test`.`t`
2022-09-14 16:35:57 0 [ERROR] InnoDB: (Too many concurrent transactions) during SYNC of table `test`.`t`
^ Found warnings in /dev/shm/10.6/mysql-test/var/log/mysqld.1.err

Generated at Thu Feb 08 10:05:31 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.