[MDEV-20547] mysqld crashes: Semaphore wait has lasted > 60 seconds Created: 2019-09-10  Updated: 2020-02-14  Resolved: 2020-02-14

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.3.17
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: Federico Razzoli Assignee: Marko Mäkelä
Resolution: Incomplete Votes: 1
Labels: need_feedback
Environment:

CloudLinux 7.6


Attachments: Text File dump.txt     File error.log    
Issue Links:
Relates
relates to MDEV-13983 Mariadb becomes unresponsive Closed
relates to MDEV-15135 'show global status' can cause lock c... Closed

 Description   

mysqld crashed multiple times. I'm attaching the error log.
Please let me know if you require more info.



 Comments   
Comment by Elena Stepanova [ 2019-09-10 ]

2019-09-10  8:31:33 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 60 seconds. We intentionally crash the server because it appears to be hung.
190910  8:31:33 [ERROR] mysqld got signal 6 ;

Comment by Marko Mäkelä [ 2019-09-10 ]

f_razzoli, to better analyze this bug, we would need stack traces of all InnoDB threads during the hang:

gdb -ex "set pagination 0" -ex "thread apply all bt" --batch -p $(pgrep mysqld)

Alternatively, you should enable core dumps and provide the output of the following command:

gdb -ex "set pagination 0" -ex "thread apply all bt" --batch /usr/sbin/mysqld core

Starting with MariaDB Server 10.3, core dumps should be much smaller, because they will by default not include the InnoDB buffer pool (which we might actually need later to analyze the hang in more detail, in case some buf_block_t::lock are involved).

We have similar reports in MDEV-13983 and MDEV-15135.

Comment by Federico Razzoli [ 2019-09-12 ]

I've run the first command short after a crash:

$ gdb -ex "set pagination 0" -ex "thread apply all bt" --batch -p $(pgrep mysqld)
Thread 1 (process 3286):
#0  0x0000000000456a43 in runtime.futex ()
#1  0x00000000004274ab in runtime.futexsleep ()
#2  0x0000000000b7b7c8 in runtime.m0 ()
#3  0x0000000000000000 in ?? ()

Unfortunately at the moment I don't have a core dump to submit. Does this make it impossible to investigate the bug?

Comment by Peanuts [ 2019-09-14 ]

Dump attached. Please let us know if this is what you are looking for.

Comment by Federico Razzoli [ 2019-09-16 ]

Please see the core dump uploaded by Peanuts. It comes from the same server.

Comment by Marko Mäkelä [ 2019-09-30 ]

Peanuts, thank you, the stack traces in dump.txt seem to suggest that some thread is holding dict_sys->mutex, which several threads are waiting for. If you still have that core dump (or can produce another), please show

print dict_sys->mutex
print dict_operation_lock

and then try to find the thread that is holding those (or has submitted an exclusive latch request on dict_operation_lock) by using the gdb command

thread find 0x…

It expects the thread identifier in hexadecimal, so you might want to use the print/x command.

That is only the first step of analyzing the hang. Often hangs involve multiple mutexes or rw-latches.

If you provide a core dump, then I’d also need download links to the executables and shared libraries (ldd /usr/sbin/mysqld). Otherwise the stack traces will typically be garbage.

Comment by Federico Razzoli [ 2019-10-07 ]

Unfortunately mysqld is stripped.

I have a dump to upload, from the most recent crash. But it's 2.2G, it compresses to 104M. And it seems I can't upload files > 10M here. So how can I share it?

In the meanwhile, these are the libraries:

# ldd /usr/sbin/mysqld
	linux-vdso.so.1 =>  (0x00007fff807bf000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f5d6d03d000)
	libaio.so.1 => /lib64/libaio.so.1 (0x00007f5d6ce3b000)
	libz.so.1 => /lib64/libz.so.1 (0x00007f5d6cc25000)
	libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007f5d6c9ee000)
	libsystemd.so.0 => /lib64/libsystemd.so.0 (0x00007f5d6c7bd000)
	libssl.so.10 => /lib64/libssl.so.10 (0x00007f5d6c54b000)
	libcrypto.so.10 => /lib64/libcrypto.so.10 (0x00007f5d6c0e9000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007f5d6bee5000)
	libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f5d6bbde000)
	libm.so.6 => /lib64/libm.so.6 (0x00007f5d6b8dc000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f5d6b50f000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f5d6f164000)
	libfreebl3.so => /lib64/libfreebl3.so (0x00007f5d6b30c000)
	libcap.so.2 => /lib64/libcap.so.2 (0x00007f5d6b107000)
	librt.so.1 => /lib64/librt.so.1 (0x00007f5d6aeff000)
	libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f5d6acd8000)
	liblzma.so.5 => /lib64/liblzma.so.5 (0x00007f5d6aab2000)
	liblz4.so.1 => /lib64/liblz4.so.1 (0x00007f5d6a89d000)
	libgcrypt.so.11 => /lib64/libgcrypt.so.11 (0x00007f5d6a61c000)
	libgpg-error.so.0 => /lib64/libgpg-error.so.0 (0x00007f5d6a417000)
	libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f5d6a1fe000)
	libdw.so.1 => /lib64/libdw.so.1 (0x00007f5d69faf000)
	libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f5d69d99000)
	libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2 (0x00007f5d69b4c000)
	libkrb5.so.3 => /lib64/libkrb5.so.3 (0x00007f5d69863000)
	libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007f5d6965f000)
	libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x00007f5d6942c000)
	libattr.so.1 => /lib64/libattr.so.1 (0x00007f5d69227000)
	libpcre.so.1 => /lib64/libpcre.so.1 (0x00007f5d68fc5000)
	libelf.so.1 => /lib64/libelf.so.1 (0x00007f5d68dad000)
	libbz2.so.1 => /lib64/libbz2.so.1 (0x00007f5d68b9d000)
	libkrb5support.so.0 => /lib64/libkrb5support.so.0 (0x00007f5d6898d000)
	libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007f5d68789000)

Comment by Marko Mäkelä [ 2019-10-07 ]

f_razzoli, is there a separate package available for your system that includes the debugging symbols? Where did you get the mysqld executable from? The core dump will be rather useless without having access to the debugging symbols and to the executable and the dynamic libraries.

You can upload a .tar.bz2 file of the core dump (as well as all the listed .so files) to ftp://ftp.mariadb.com/uploads using anonymous FTP. Let us know the file name. It will only be accessible to employees of MariaDB Corporation.

Generated at Thu Feb 08 09:00:18 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.