[MDEV-24225] Long semaphore wait in dict0dict.cc line 943 and semaphore wait >600 sec kill - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.4(EOL), 10.5, 10.6
Fix Version/s: N/A
Component/s: Storage Engine - InnoDB, XA
Labels:
None

Description

Run the following, in continually random SQL order, using 150 or 200 threads, perpetually:

DROP DATABASE test;

CREATE DATABASE test;

USE test;

CREATE TABLE t1 (a INT) ENGINE=InnoDB;

INSERT INTO t1 VALUES (NULL);

XA BEGIN 'x';

UPDATE t1 SET a = 1;

DELETE FROM t1;

ALTER TABLE t1 ADD COLUMN extra INT;

INSERT INTO t1 (a) VALUES (2);

ROLLBACK;

DELETE FROM t1;

INSERT INTO t1 (a) VALUES (2);

XA END 'x';

XA ROLLBACK 'x';

DROP TABLE t1;

After about 50k-500k executions, and sufficient time to hit the 600sec timeout, this leads to:

10.6.0 5d4599f9750140f92cfdbbe4d292ae1b8dd456f8 (Debug)
2020-11-17 10:53:27 0 [Note] /test/MD211020-mariadb-10.6.0-linux-x86_64-dbg/bin/mysqld: ready for connections.
Version: '10.6.0-MariaDB-debug' socket: '/test/MD211020-mariadb-10.6.0-linux-x86_64-dbg/socket.sock' port: 12191 MariaDB Server
2020-11-17 10:53:35 499 [Note] InnoDB: Deferring DROP TABLE `test`.`t1`; renaming to test/#sql-ib33
2020-11-17 10:57:37 0 [Warning] InnoDB: A long semaphore wait:
--Thread 22831188305664 has waited at dict0dict.cc line 943 for 241.00 seconds the semaphore:
Mutex at 0x55e77c0b2a00, Mutex DICT_SYS created /test/10.6_dbg/storage/innobase/dict/dict0dict.cc:1032, lock var 2
Last time reserved in file /test/10.6_dbg/storage/innobase/row/row0mysql.cc line 3921
2020-11-17 10:57:37 0 [Note] InnoDB: A semaphore wait:
--Thread 22831188305664 has waited at dict0dict.cc line 943 for 241.00 seconds the semaphore:
Mutex at 0x55e77c0b2a00, Mutex DICT_SYS created /test/10.6_dbg/storage/innobase/dict/dict0dict.cc:1032, lock var 2
Last time reserved in file /test/10.6_dbg/storage/innobase/row/row0mysql.cc line 3921
InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
InnoDB: Pending reads 0, writes 0
InnoDB: ###### Diagnostic info printed to the standard error stream
... many more (all on the same dict0dict.cc:943, dict0dict.cc:1032, and row0mysql.cc line 3921)...
2020-11-17 11:03:47 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
201117 11:03:47 [ERROR] mysqld got signal 6 ;

10.6.0 5d4599f9750140f92cfdbbe4d292ae1b8dd456f8 (Debug)
Core was generated by `/test/MD211020-mariadb-10.6.0-linux-x86_64-dbg/bin/mysqld --no-defaults --core-'.
Program terminated with signal SIGABRT, Aborted.
#0 __pthread_kill (threadid=<optimized out>, signo=signo@entry=6)
at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
[Current thread is 1 (Thread 0x14c3ce1f0700 (LWP 827875))]
(gdb) bt
#0 __pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
#1 0x000055e77b698021 in my_write_core (sig=sig@entry=6) at /test/10.6_dbg/mysys/stacktrace.c:424
#2 0x000055e77addf321 in handle_fatal_signal (sig=6) at /test/10.6_dbg/sql/signal_handler.cc:330
#3 <signal handler called>
#4 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#5 0x000014c48b161859 in __GI_abort () at abort.c:79
#6 0x000055e77b4574b9 in ib::fatal::~fatal (this=<optimized out>, __in_chrg=<optimized out>) at /test/10.6_dbg/storage/innobase/ut/ut0ut.cc:580
#7 0x000055e77b3e5b04 in srv_error_monitor_task () at /test/10.6_dbg/storage/innobase/include/ut0ut.h:346
#8 0x000055e77b623746 in tpool::thread_pool_generic::timer_generic::run (this=0x55e77edb1af0) at /test/10.6_dbg/tpool/tpool_generic.cc:299
#9 tpool::thread_pool_generic::timer_generic::execute (arg=0x55e77edb1af0) at /test/10.6_dbg/tpool/tpool_generic.cc:317
#10 0x000055e77b62467f in tpool::task::execute (this=0x55e77edb1b30) at /test/10.6_dbg/tpool/task.cc:52
#11 0x000055e77b62325f in tpool::thread_pool_generic::worker_main (this=0x55e77e3f1b20, thread_var=0x55e77e42c4d0) at /test/10.6_dbg/tpool/tpool_generic.cc:518
#12 0x000055e77b623598 in std::__invoke_impl<void, void (tpool::thread_pool_generic::)(tpool::worker_data), tpool::thread_pool_generic, tpool::worker_data> (__t=<optimized out>, __f=<optimized out>) at /usr/include/c++/9/bits/invoke.h:89
#13 std::__invoke<void (tpool::thread_pool_generic::)(tpool::worker_data), tpool::thread_pool_generic, tpool::worker_data> (__fn=<optimized out>) at /usr/include/c++/9/bits/invoke.h:95
#14 std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::)(tpool::worker_data), tpool::thread_pool_generic, tpool::worker_data> >::_M_invoke<0ul, 1ul, 2ul> (this=<optimized out>) at /usr/include/c++/9/thread:244
#15 std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::)(tpool::worker_data), tpool::thread_pool_generic, tpool::worker_data> >::operator() (this=<optimized out>) at /usr/include/c++/9/thread:251
#16 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::)(tpool::worker_data), tpool::thread_pool_generic, tpool::worker_data> > >::_M_run (this=<optimized out>) at /usr/include/c++/9/thread:195
#17 0x000014c48b553d84 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#18 0x000014c48b66f609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#19 0x000014c48b25e293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

10.6.0 5d4599f9750140f92cfdbbe4d292ae1b8dd456f8 (Optimized)
Core was generated by `/test/MD201020-mariadb-10.6.0-linux-x86_64-opt/bin/mysqld --no-defaults --core-'.
Program terminated with signal SIGABRT, Aborted.
#0 __pthread_kill (threadid=<optimized out>, signo=signo@entry=6)
at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
[Current thread is 1 (Thread 0x148ea2df6700 (LWP 1128501))]
(gdb) bt
#0 __pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
#1 0x000055b083224a4f in my_write_core (sig=sig@entry=6) at /test/10.6_opt/mysys/stacktrace.c:424
#2 0x000055b082c48130 in handle_fatal_signal (sig=6) at /test/10.6_opt/sql/signal_handler.cc:330
#3 <signal handler called>
#4 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#5 0x0000148fb2181859 in __GI_abort () at abort.c:79
#6 0x000055b08290d1f7 in ib::fatal::~fatal (this=<optimized out>, __in_chrg=<optimized out>) at /test/10.6_opt/storage/innobase/ut/ut0ut.cc:580
#7 0x000055b082902fde in srv_error_monitor_task () at /test/10.6_opt/storage/innobase/include/ut0ut.h:346
#8 0x000055b0831b7159 in tpool::thread_pool_generic::timer_generic::run (this=0x55b08599f5a0) at /test/10.6_opt/tpool/tpool_generic.cc:299
#9 tpool::thread_pool_generic::timer_generic::execute (arg=0x55b08599f5a0) at /test/10.6_opt/tpool/tpool_generic.cc:317
#10 0x000055b0831b815a in tpool::task::execute (this=0x55b08599f5e0) at /test/10.6_opt/tpool/task.cc:52
#11 tpool::task::execute (this=0x55b08599f5e0) at /test/10.6_opt/tpool/task.cc:42
#12 0x000055b0831b6c47 in tpool::thread_pool_generic::worker_main (this=0x55b0853a52e0, thread_var=0x55b0853dec00) at /test/10.6_opt/tpool/tpool_generic.cc:518
#13 0x0000148fb2573d84 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#14 0x0000148fb268f609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#15 0x0000148fb227e293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

MDEV-24225_10.5.7_dbg_e3fc9c1db04bdc5566bcdafcc3d5159199089c00.txt
2020-11-17 09:15
593 kB
Roel Van de Paar
MDEV-24225_10.5.9_7f30a5c423249c7fcfc9f9ac1578d06263aa37ac_dbg_2.txt
2020-11-17 09:57
626 kB
Roel Van de Paar
MDEV-24225_10.5.9_7f30a5c423249c7fcfc9f9ac1578d06263aa37ac_dbg.txt
2020-11-17 09:34
581 kB
Roel Van de Paar
MDEV-24225.txt
2020-11-17 08:26
614 kB
Roel Van de Paar

Issue Links

relates to

MDEV-21175 Remove dict_table_t::n_foreign_key_checks_running from InnoDB

Closed

MDEV-21602 CREATE TABLE…PRIMARY KEY…SELECT workaround causes DROP TABLE to ignore locks

Closed

MDEV-15532 XA: Assertion `!log->same_pk' failed in row_log_table_apply_delete

Closed

Activity

Ascending order - Click to sort in descending order

View 5 older comments

Marko Mäkelä added a comment - 2020-11-17 10:05

In both MDEV-24225.txt and MDEV-24225_10.5.9_7f30a5c423249c7fcfc9f9ac1578d06263aa37ac_dbg.txt, the progress of most InnoDB threads is blocked by dict_sys.mutex and dict_sys.latch that are being held by a thread that does not seem to be blocked. In MDEV-24225_10.5.9_7f30a5c423249c7fcfc9f9ac1578d06263aa37ac_dbg.txt that thread is

10.5 7f30a5c423249c7fcfc9f9ac1578d06263aa37ac
Thread 2 (Thread 0x1507ac1ad700 (LWP 4193329)):
#0 0x000055579e9f0e76 in MutexDebug<TTASEventMutex<GenericPolicy> >::is_owned (this=0x55579f897270 <dict_sys+16>) at /test/10.5_dbg/storage/innobase/include/sync0policy.h:158
#1 0x000055579ed2e505 in PolicyMutex<TTASEventMutex<GenericPolicy> >::is_owned (this=<optimized out>) at /test/10.5_dbg/storage/innobase/include/ib0mutex.h:635
#2 dict_get_first_table_name_in_db (name=name@entry=0x150774025478 "test/") at /test/10.5_dbg/storage/innobase/dict/dict0load.cc:223
#3 0x000055579eb6d82b in row_drop_database_for_mysql (name=<optimized out>, name@entry=0x150774025478 "test/", trx=trx@entry=0x1507a7c03690, found=found@entry=0x1507ac1ab368) at /test/10.5_dbg/storage/innobase/row/row0mysql.cc:3921
#4 0x000055579e9c68e9 in innobase_drop_database (hton=<optimized out>, path=<optimized out>) at /test/10.5_dbg/storage/innobase/handler/ha_innodb.cc:13427

This is blocking (among other threads) the master task that is executing row_drop_tables_for_mysql_in_background().

Likewise, in MDEV-24225.txt, row_drop_tables_for_mysql_in_background() and many others are waiting for the completion of row_drop_database_for_mysql() in Thread 4, which does not seem to be blocked.

It is hard to say what trips the InnoDB watchdog. There seemingly is some progress, and not everything is stuck.

But, in any case, the entire "background drop table queue" is an InnoDB workaround for bugs at the storage engine layer. I think that this should be fixed by ~~MDEV-15532~~, ~~MDEV-21175~~, ~~MDEV-21602~~, or similar. The entire workaround should be made redundant and removed.

Marko Mäkelä added a comment - 2020-11-17 10:05 In both MDEV-24225.txt and MDEV-24225_10.5.9_7f30a5c423249c7fcfc9f9ac1578d06263aa37ac_dbg.txt , the progress of most InnoDB threads is blocked by dict_sys.mutex and dict_sys.latch that are being held by a thread that does not seem to be blocked. In MDEV-24225_10.5.9_7f30a5c423249c7fcfc9f9ac1578d06263aa37ac_dbg.txt that thread is 10.5 7f30a5c423249c7fcfc9f9ac1578d06263aa37ac Thread 2 (Thread 0x1507ac1ad700 (LWP 4193329)): #0 0x000055579e9f0e76 in MutexDebug<TTASEventMutex<GenericPolicy> >::is_owned (this=0x55579f897270 <dict_sys+16>) at /test/10.5_dbg/storage/innobase/include/sync0policy.h:158 #1 0x000055579ed2e505 in PolicyMutex<TTASEventMutex<GenericPolicy> >::is_owned (this=<optimized out>) at /test/10.5_dbg/storage/innobase/include/ib0mutex.h:635 #2 dict_get_first_table_name_in_db (name=name@entry=0x150774025478 "test/") at /test/10.5_dbg/storage/innobase/dict/dict0load.cc:223 #3 0x000055579eb6d82b in row_drop_database_for_mysql (name=<optimized out>, name@entry=0x150774025478 "test/", trx=trx@entry=0x1507a7c03690, found=found@entry=0x1507ac1ab368) at /test/10.5_dbg/storage/innobase/row/row0mysql.cc:3921 #4 0x000055579e9c68e9 in innobase_drop_database (hton=<optimized out>, path=<optimized out>) at /test/10.5_dbg/storage/innobase/handler/ha_innodb.cc:13427 This is blocking (among other threads) the master task that is executing row_drop_tables_for_mysql_in_background() . Likewise, in MDEV-24225.txt , row_drop_tables_for_mysql_in_background() and many others are waiting for the completion of row_drop_database_for_mysql() in Thread 4, which does not seem to be blocked. It is hard to say what trips the InnoDB watchdog. There seemingly is some progress, and not everything is stuck. But, in any case, the entire "background drop table queue" is an InnoDB workaround for bugs at the storage engine layer. I think that this should be fixed by MDEV-15532 , MDEV-21175 , MDEV-21602 , or similar. The entire workaround should be made redundant and removed.

Marko Mäkelä added a comment - 2021-07-23 16:30

Roel, is this repeatable with the latest 10.6? The broken-by-design "background drop table queue" was removed in 10.6.3.

Marko Mäkelä added a comment - 2021-07-23 16:30 Roel , is this repeatable with the latest 10.6? The broken-by-design "background drop table queue" was removed in 10.6.3.

Roel Van de Paar added a comment - 2021-07-26 11:02

Issue no longer reproducible in 10.6.3 891a927e804c5a3a582f6137c2f316ef7abb25ca (Debug + Optimized). Also tested 10.7.0 71ed8c136fa203b9b3a678a6d5cc72235ef73ef7 (Debug + Optimized), same result. Closing. Thank you marko

Roel Van de Paar added a comment - 2021-07-26 11:02 Issue no longer reproducible in 10.6.3 891a927e804c5a3a582f6137c2f316ef7abb25ca (Debug + Optimized). Also tested 10.7.0 71ed8c136fa203b9b3a678a6d5cc72235ef73ef7 (Debug + Optimized), same result. Closing. Thank you marko

Roel Van de Paar added a comment - 2021-10-27 10:20

ccalender I tested 10.5.13 4eb7217ec33fef8d23f2dda0c97b442508c81b1d (Debug) and the issue is not reproducible there.

marko Any idea how/why it could be fixed in 10.5? Thanks

Roel Van de Paar added a comment - 2021-10-27 10:20 ccalender I tested 10.5.13 4eb7217ec33fef8d23f2dda0c97b442508c81b1d (Debug) and the issue is not reproducible there. marko Any idea how/why it could be fixed in 10.5? Thanks

Marko Mäkelä added a comment - 2021-10-27 13:32

If this is not easily repeatable with 10.5.12 or later releases, the explanation could be a number of fixes to page flushing. ~~MDEV-25801~~ a.k.a. ~~MDEV-25773~~ could be a good candidate, but there were many other fixes in 10.5.12 that were motivated by ~~MDEV-25451~~.

Marko Mäkelä added a comment - 2021-10-27 13:32 If this is not easily repeatable with 10.5.12 or later releases, the explanation could be a number of fixes to page flushing. MDEV-25801 a.k.a. MDEV-25773 could be a good candidate, but there were many other fixes in 10.5.12 that were motivated by MDEV-25451 .

MariaDB Server

Long semaphore wait in dict0dict.cc line 943 and semaphore wait >600 sec kill

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration