[MDEV-26275] Mariadb 10.4.17 crashes with Semaphore > 600 seconds Created: 2021-07-29  Updated: 2021-09-12  Resolved: 2021-09-12

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.4.17
Fix Version/s: 10.4.18, 10.5.9

Type: Bug Priority: Major
Reporter: Jaya Karthik Karri Assignee: Marko Mäkelä
Resolution: Duplicate Votes: 0
Labels: need_feedback
Environment:

Production slave server



 Description   

Hi Team,

We have a MariaDB 10.4.17 production slave instance crashing couple of times every week with semaphore wait has lasted > 600 seconds. The master of this instance is running on MySQL 5.6.39-1 version, which doesn't have any issues and is up from the past 154 days. Below are the details
Below MariaDB version running on Debian 10
10.4.17-MariaDB-1:10.4.17+maria~stretch-log

Error message from error log

2021-07-24 19:29:53 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
210724 19:29:53 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 10.4.17-MariaDB-1:10.4.17+maria~stretch-log
key_buffer_size=16777216
read_buffer_size=2097152
max_used_connections=5
max_threads=402
thread_count=15
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2496187 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x49000
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x5606b742bb5e]
/usr/sbin/mysqld(handle_fatal_signal+0x3af)[0x5606b6eb9c4f]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x7fe9b000a730]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x10b)[0x7fe9af4c87bb]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x121)[0x7fe9af4b3535]
/usr/sbin/mysqld(+0xb7e021)[0x5606b71b1021]
/usr/sbin/mysqld(+0xb331f2)[0x5606b71661f2]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7fa3)[0x7fe9afffffa3]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7fe9af58a4cf]
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /var/lib/mysql
Resource Limits:
Fatal signal 11 while backtracing

Innodb Monitor output related to semaphores

=====================================
2021-07-24 19:29:46 0x7fe8a9ad0700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 7 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 161013 srv_active, 0 srv_shutdown, 9848 srv_idle
srv_master_thread log flush and writes: 170835
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 3670910
--Thread 140637230651136 has waited at srv0srv.cc line 2017 for 944.00 seconds the semaphore:
X-lock (wait_ex) on RW-latch at 0x5606b7de81b0 created in file dict0dict.cc line 833
a writer (thread id 140637230651136) has reserved it in mode wait exclusive
number of readers 4, waiters flag 0, lock_word: fffffffc
Last time write locked in file srv0srv.cc line 2017
--Thread 140636944971520 has waited at btr0cur.cc line 1492 for 946.00 seconds the semaphore:
SX-lock on RW-latch at 0x5606be9a10c8 created in file dict0dict.cc line 1954
a writer (thread id 140637247436544) has reserved it in mode SX
number of readers 0, waiters flag 1, lock_word: 10000000
Last time write locked in file dict0stats.cc line 1968
--Thread 140637213558528 has waited at btr0cur.cc line 1492 for 946.00 seconds the semaphore:
SX-lock on RW-latch at 0x5606be9a10c8 created in file dict0dict.cc line 1954
a writer (thread id 140637247436544) has reserved it in mode SX
number of readers 0, waiters flag 1, lock_word: 10000000
Last time write locked in file dict0stats.cc line 1968
--Thread 140637039556352 has waited at btr0cur.cc line 1492 for 946.00 seconds the semaphore:
SX-lock on RW-latch at 0x5606be9a10c8 created in file dict0dict.cc line 1954
a writer (thread id 140637247436544) has reserved it in mode SX
number of readers 0, waiters flag 1, lock_word: 10000000
Last time write locked in file dict0stats.cc line 1968
--Thread 140637222258432 has waited at btr0cur.cc line 1492 for 946.00 seconds the semaphore:
SX-lock on RW-latch at 0x5606be9a10c8 created in file dict0dict.cc line 1954
a writer (thread id 140637247436544) has reserved it in mode SX
number of readers 0, waiters flag 1, lock_word: 10000000
Last time write locked in file dict0stats.cc line 1968
--Thread 140637047949056 has waited at btr0cur.cc line 1492 for 946.00 seconds the semaphore:
SX-lock on RW-latch at 0x5606be9a10c8 created in file dict0dict.cc line 1954
a writer (thread id 140637247436544) has reserved it in mode SX
number of readers 0, waiters flag 1, lock_word: 10000000
Last time write locked in file dict0stats.cc line 1968
OS WAIT ARRAY INFO: signal count 3473703
RW-shared spins 2504671, rounds 42293906, OS waits 922559
RW-excl spins 1664250, rounds 12730384, OS waits 239318
RW-sx spins 490905, rounds 5036270, OS waits 124407
Spin rounds per wait: 16.89 RW-shared, 7.65 RW-excl, 10.26 RW-sx



 Comments   
Comment by Marko Mäkelä [ 2021-07-30 ]

The 10.4.16 and 10.4.17 releases were affected by a regression due to MDEV-23991, which was fixed in 10.4.18 in MDEV-24275.

Do you see these hangs with a newer version of the server?

To diagnose the hang, we would need a fully resolved stack trace of all threads.

Comment by Jaya Karthik Karri [ 2021-07-30 ]

Thanks Marko for a quick reply. As this is a production slave server we have upgraded the server to latest 10.4 release i.e 10.4.20. Hope this will fix our DB crash issue. Will monitor for few days and keep you posted.

---------------------------------------------

version()

---------------------------------------------

10.4.20-MariaDB-1:10.4.20+maria~stretch-log

---------------------------------------------

Comment by Jaya Karthik Karri [ 2021-08-13 ]

It has been 14 days after upgrading the mariadb Instance to 10.4.20. So far we have not seen any db crash. It seems the issue is fixed with this minor upgrade.

Once again thanks a lot Marko for your support.

Please close the ticket.

Generated at Thu Feb 08 09:44:04 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.