[MDEV-26491] SIGABRT when setting innodb_disallow_writes and --innodb-fatal-semaphore-wait-threshold=2 and attempting DROP TABLE Created: 2021-08-28  Updated: 2022-04-06  Resolved: 2022-04-06

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.2, 10.3, 10.4, 10.5, 10.6, 10.7
Fix Version/s: 10.3.35, 10.4.25, 10.5.16, 10.6.8, 10.7.4, 10.8.3, 10.9.1

Type: Bug Priority: Critical
Reporter: Roel Van de Paar Assignee: Jan Lindström (Inactive)
Resolution: Fixed Votes: 0
Labels: affects-tests

Issue Links:
Duplicate
duplicates MDEV-24845 Oddities around innodb_fatal_semaphor... Closed
Relates
relates to MDEV-24911 Missing warning before [ERROR] [FATAL... Open
relates to MDEV-25975 Turning on innodb_disallow_writes wil... Closed

 Description   

CREATE TABLE t (c INT) ENGINE=InnoDB;
SET GLOBAL innodb_disallow_writes=ON;
DROP TABLE t;

Will hang at when executed from the CLI. Attaching another CLI we see closing tables for DROP TABLE t in the process list. Exiting the client (CTRL+c) and attempting mysqladmin shutdown will also hang. Error log notes Thread 4 (user : 'root') did not exit warning. Nothing too special so far. However, when using the following testcase instead (same testcase, one startup option added):

# mysqld options required for replay:  --innodb-fatal-semaphore-wait-threshold=2
CREATE TABLE t (c INT) ENGINE=InnoDB;
SET GLOBAL innodb_disallow_writes=ON;
DROP TABLE t;

We get:

10.7.0 05e29e177df243b700392b797e26cae43fd3181e (Debug)

2021-08-28 17:03:59 0 [Warning] InnoDB: Long wait (1 seconds) for dict_sys.mutex
2021-08-28 17:04:14 0 [ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.mutex. Please refer to https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/

10.7.0 05e29e177df243b700392b797e26cae43fd3181e (Debug)

Core was generated by `/test/MD280821-mariadb-10.7.0-linux-x86_64-dbg/bin/mysqld --no-defaults --core-'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
[Current thread is 1 (Thread 0x14c3a57fb700 (LWP 961936))]
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x000014c3c9a1d859 in __GI_abort () at abort.c:79
#2  0x0000563a258c924b in ib::fatal::~fatal (this=<optimized out>, __in_chrg=<optimized out>) at /test/10.7_dbg/storage/innobase/ut/ut0ut.cc:519
#3  0x0000563a2587650a in srv_monitor_task () at /test/10.7_dbg/storage/innobase/include/ut0ut.h:393
#4  0x0000563a25a6abb0 in tpool::thread_pool_generic::timer_generic::run (this=0x563a28dcdd10) at /test/10.7_dbg/tpool/tpool_generic.cc:313
#5  tpool::thread_pool_generic::timer_generic::execute (arg=0x563a28dcdd10) at /test/10.7_dbg/tpool/tpool_generic.cc:333
#6  0x0000563a25a6bb1b in tpool::task::execute (this=0x563a28dcdd50) at /test/10.7_dbg/tpool/task.cc:52
#7  0x0000563a25a6a6cb in tpool::thread_pool_generic::worker_main (this=0x563a28a25e70, thread_var=0x563a28a35890) at /test/10.7_dbg/tpool/tpool_generic.cc:550
#8  0x0000563a25a6aa02 in std::__invoke_impl<void, void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__t=<optimized out>, __f=<optimized out>) at /usr/include/c++/9/bits/invoke.h:89
#9  std::__invoke<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__fn=<optimized out>) at /usr/include/c++/9/bits/invoke.h:95
#10 std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::_M_invoke<0ul, 1ul, 2ul> (this=<optimized out>) at /usr/include/c++/9/thread:244
#11 std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::operator() (this=<optimized out>) at /usr/include/c++/9/thread:251
#12 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> > >::_M_run (this=<optimized out>) at /usr/include/c++/9/thread:195
#13 0x000014c3c9e0fde4 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#14 0x000014c3c9f2c609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#15 0x000014c3c9b1a293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

And it may also produce this stack (which is likely a sporadic timing based side issue/side path, and the stack looks familar/generic, and was also seen in MDEV-26457 for example):

10.7.0 05e29e177df243b700392b797e26cae43fd3181e (Debug)

Core was generated by `/test/MD280821-mariadb-10.7.0-linux-x86_64-dbg/bin/mariadbd --no-defaults --lc-'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000055771db6bbf8 in ha_maria::drop_table (this=0x55771faab1e0, 
    name=0x55771faaa340 "/tmp/#sql-temptable-edebe-1-1")
    at /test/10.7_dbg/storage/maria/ha_maria.cc:2791
[Current thread is 1 (Thread 0x14bc47e14800 (LWP 974526))]
(gdb) bt
#0  0x000055771db6bbf8 in ha_maria::drop_table (this=0x55771faab1e0, name=0x55771faaa340 "/tmp/#sql-temptable-edebe-1-1") at /test/10.7_dbg/storage/maria/ha_maria.cc:2791
#1  0x000055771d952321 in handler::ha_drop_table (this=0x55771faab1e0, name=0x55771faaa340 "/tmp/#sql-temptable-edebe-1-1") at /test/10.7_dbg/sql/handler.cc:5319
#2  0x000055771d6b4fc4 in free_tmp_table (thd=thd@entry=0x55771f67bb28, entry=0x55771faa9200) at /test/10.7_dbg/sql/sql_select.cc:20348
#3  0x000055771d5af188 in close_thread_tables (thd=thd@entry=0x55771f67bb28) at /test/10.7_dbg/sql/sql_base.cc:840
#4  0x000055771d652937 in mysql_execute_command (thd=thd@entry=0x55771f67bb28, is_called_from_prepared_stmt=is_called_from_prepared_stmt@entry=false) at /test/10.7_dbg/sql/sql_parse.cc:6063
#5  0x000055771d638b83 in mysql_parse (thd=0x55771f67bb28, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x7ffe8e0482a0) at /test/10.7_dbg/sql/sql_parse.cc:8030
#6  0x000055771d645806 in bootstrap (file=0x55771f4ed670 <instrumented_stdin>) at /test/10.7_dbg/sql/sql_class.h:1355
#7  0x000055771d5061ee in mysqld_main (argc=<optimized out>, argv=<optimized out>) at /test/10.7_dbg/sql/mysqld.cc:5751
#8  0x000055771d4f7b46 in main (argc=<optimized out>, argv=<optimized out>) at /test/10.7_dbg/sql/main.cc:34

Hang looks present in:
MariaDB: 10.2.41 (dbg), 10.2.41 (opt), 10.3.32 (dbg), 10.3.32 (opt), 10.4.22 (dbg), 10.4.22 (opt), 10.5.13 (dbg), 10.5.13 (opt), 10.6.5 (dbg), 10.6.5 (opt), 10.7.0 (dbg), 10.7.0 (opt)

Crash confirmed present in:
10.6.5 (dbg), 10.6.5 (opt), 10.7.0 (dbg), 10.7.0 (opt)

Bug (or feature/syntax) confirmed not present in:
MySQL: 5.5.62 (dbg), 5.5.62 (opt), 5.6.51 (dbg), 5.6.51 (opt), 5.7.35 (dbg), 5.7.35 (opt), 8.0.26 (dbg), 8.0.26 (opt)

I set the versions to the crash affected versions for the moment. The main stack for this issue (the first one) was also seen in MDEV-24911 previously, though it is likely unrelated and just a consequence/resulting outcome.



 Comments   
Comment by Vladislav Vaintroub [ 2021-08-28 ]

There is no SIGABRT from tpool::thread_pool_generic::timer_generic::run . There is
[FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.mutex, thrown from ib_fatal from srv_monitor_task. The culprit is of course the long wait, and not not the "threadpool"

Comment by Marko Mäkelä [ 2021-08-28 ]

The real culprit is the Galera-specific parameter innodb_disallow_writes. A correct implementation of it would block all writes at the high level, not at the low level.

Comment by Marko Mäkelä [ 2021-08-28 ]

This strongly feels like a duplicate of MDEV-24845.

Note that earlier versions that 10.6 would report something like "Long semaphore wait". In 10.6, because the "sync array" was removed in MDEV-21452, there is a special watchdog only for dict_sys.mutex (soon, after MDEV-24258, for dict_sys.latch).

Comment by Marko Mäkelä [ 2021-08-28 ]

MDEV-25975 demonstrates an even more fundamental problem with the parameter innodb_disallow_writes.

Comment by Marko Mäkelä [ 2022-03-30 ]

I submitted a fix (removal of innodb_disallow_writes) in MDEV-25975.

Comment by Marko Mäkelä [ 2022-04-06 ]

The parameter innodb_disallow_writes was removed in MDEV-25975.

Generated at Thu Feb 08 09:45:42 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.