Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.3.28
-
None
Description
Hi,
we've had a bunch of deadlocks (+sigabrt) now that resulted in these logs:
2021-03-18 1:06:37 0 [Warning] InnoDB: A long semaphore wait:
|
--Thread 140349926676224 has waited at lock0lock.cc line 3882 for 241.00 seconds the semaphore:
|
Mutex at 0x5587b08404c0, Mutex LOCK_SYS created lock0lock.cc:461, lock var 2
|
...
|
 |
2021-03-18 1:18:29 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
|
210318 1:18:29 [ERROR] mysqld got signal 6 ;
|
Relevant versions:
- mariadb 1:10.3.28+maria~bionic
- galera-3 25.3.32-bionic
I've compared two core dumps:
- dump1: threads: 432
- dump2 : threads: 418
- dump1: 0 locks at LOCK_show_status
- dump2: 3 locks at LOCK_show_status
- dump1: 1 lock in DeadlockChecker::search waiting for thread 68
- dump2: 1 lock in trx_commit waiting for thread 97
- dump1: thread 68 has lock, but is waiting for condition in bg_wsrep_kill_trx->TTASEventMutex->sync_array_wait_event
- dump2: thread 97 has lock, but is waiting for condition in bg_wsrep_kill_trx->TTASEventMutex->sync_array_wait_event
See the attached dump1.txt and dump2.txt for closer inspection.
The thread that appears to unjustly be holding the lock (68 and 97 respectively) has this BT:
(gdb) bt
|
#0 0x00007fc58cdd3ad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x55881a038ec4) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
|
#1 __pthread_cond_wait_common (abstime=0x0, mutex=0x55881a038e70, cond=0x55881a038e98) at pthread_cond_wait.c:502
|
#2 __pthread_cond_wait (cond=cond@entry=0x55881a038e98, mutex=mutex@entry=0x55881a038e70) at pthread_cond_wait.c:655
|
#3 0x00005587afb3f230 in os_event::wait (this=0x55881a038e60) at ./storage/innobase/os/os0event.cc:158
|
#4 os_event::wait_low (reset_sig_count=8, this=0x55881a038e60) at ./storage/innobase/os/os0event.cc:325
|
#5 os_event_wait_low (event=0x55881a038e60, reset_sig_count=<optimized out>) at ./storage/innobase/os/os0event.cc:502
|
#6 0x00005587afbdb82c in sync_array_wait_event (arr=0x5587b1ad5430, cell=@0x7fa5ea7fbcd8: 0x5587b1ad56b0) at ./storage/innobase/sync/sync0arr.cc:471
|
#7 0x00005587afadccb7 in TTASEventMutex<GenericPolicy>::enter (line=18772,
|
filename=0x5587b0044130 "/home/buildbot/buildbot/build/mariadb-10.3.28/storage/innobase/handler/ha_innodb.cc", max_delay=4, max_spins=<optimized out>,
|
this=0x5587b08404c0 <lock_sys+64>) at ./storage/innobase/include/ib0mutex.h:471
|
#8 PolicyMutex<TTASEventMutex<GenericPolicy> >::enter (this=0x5587b08404c0 <lock_sys+64>, n_spins=30, n_delay=4,
|
name=name@entry=0x5587b0044130 "/home/buildbot/buildbot/build/mariadb-10.3.28/storage/innobase/handler/ha_innodb.cc", line=line@entry=18772)
|
at ./storage/innobase/include/ib0mutex.h:592
|
#9 0x00005587afad0798 in bg_wsrep_kill_trx (void_arg=0x7fa530046ea0) at ./storage/innobase/handler/ha_innodb.cc:18772
|
#10 0x00005587af7565d3 in handle_manager (arg=arg@entry=0x0) at ./sql/sql_manager.cc:112
|
#11 0x00005587afe5612a in pfs_spawn_thread (arg=0x55881a187138) at ./storage/perfschema/pfs.cc:1869
|
#12 0x00007fc58cdcd6db in start_thread (arg=0x7fa5ea7fc700) at pthread_create.c:463
|
#13 0x00007fc58c3cf71f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
|
Is this a known issue? Is there any additional info I can provide?
(I have the complete core dumps, but I cannot share them in their entirety obviously.)
Cheers,
Walter Doekes
OSSO B.V.
Attachments
Issue Links
- duplicates
-
MDEV-25111 Long semaphore wait (> 800 secs), server stops responding
- Closed
- is part of
-
MDEV-24872 galera.galera_insert_multi MTR failed: crash with SIGABRT
- Closed
- relates to
-
MDEV-24704 Galera test failure on galera.galera_nopk_unicode
- Closed