[MDEV-24924] Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung. Created: 2021-02-19  Updated: 2021-03-28

Status: Open
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.5.7
Fix Version/s: 10.5

Type: Bug Priority: Major
Reporter: Milad Elhaei Sahar Assignee: Marko Mäkelä
Resolution: Unresolved Votes: 0
Labels: mariadb
Environment:

Linux, cpu_cores : 4, memory total : 34G swap_total : 2G


Attachments: JPEG File semaphoresError.JPG    

 Description   

We start seeing this mariadb crash log with this long semaphores wait. In this mariadb instance we have one database with couple of big tables with 250+ million rows. We read other jira tickets with semaphores error but seems ours is not exactly same so now we thinking to increase the number of max open files 16384 to something around 32000, but not sure if it will fix the issue or it is at all related to this mariadb crash. During this long semaphore wait we see higher cpu IO wait and increase in read requests from disk.

2021-02-15 21:18:02 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
210215 21:18:02 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.5.7-MariaDB-log
key_buffer_size=16777216
read_buffer_size=131072
max_used_connections=25
max_threads=1002
thread_count=21
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2222136 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x49000
Can't start addr2line
/usr/sbin/mariadbd(my_print_stacktrace+0x2e)[0x55d8bcf1a12e]
/usr/sbin/mariadbd(handle_fatal_signal+0x307)[0x55d8bc928fd7]
/lib64/libpthread.so.0(+0xf630)[0x7f5fe0964630]
/lib64/libc.so.6(gsignal+0x37)[0x7f5fde9be387]
/lib64/libc.so.6(abort+0x148)[0x7f5fde9bfa78]
/usr/sbin/mariadbd(+0xdb8020)[0x55d8bcd71020]
/usr/sbin/mariadbd(+0xd64019)[0x55d8bcd1d019]
/usr/sbin/mariadbd(_ZN5tpool19thread_pool_generic13timer_generic7executeEPv+0x35)[0x55d8bcea2c45]
/usr/sbin/mariadbd(_ZN5tpool4task7executeEv+0x2b)[0x55d8bcea3d7b]
/usr/sbin/mariadbd(_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x61)[0x55d8bcea21f1]
/lib64/libstdc++.so.6(+0xb5070)[0x7f5fdf10d070]
/lib64/libpthread.so.0(+0x7ea5)[0x7f5fe095cea5]
/lib64/libc.so.6(clone+0x6d)[0x7f5fdea868dd]
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /var/lib/mysql
Resource Limits:
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        0                    unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             127923               127923               processes 
Max open files            16384                16384                files     
Max locked memory         65536                65536                bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       127923               127923               signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us        
Core pattern: core



 Comments   
Comment by Milad Elhaei Sahar [ 2021-02-19 ]

do you think this bug will be fixed in the coming release? I see some jira tickets also report the same bug with 10.5.8.

Comment by Marko Mäkelä [ 2021-02-19 ]

Please post the stack traces of all threads during the hang. This could be a duplicate of MDEV-24188 or MDEV-24275.

Comment by Milad Elhaei Sahar [ 2021-02-19 ]

Hi Marko
isn't backtrace in the log i posted above?

Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x49000
Can't start addr2line
/usr/sbin/mariadbd(my_print_stacktrace+0x2e)[0x55d8bcf1a12e]
/usr/sbin/mariadbd(handle_fatal_signal+0x307)[0x55d8bc928fd7]
/lib64/libpthread.so.0(+0xf630)[0x7f5fe0964630]
/lib64/libc.so.6(gsignal+0x37)[0x7f5fde9be387]
/lib64/libc.so.6(abort+0x148)[0x7f5fde9bfa78]
/usr/sbin/mariadbd(+0xdb8020)[0x55d8bcd71020]
/usr/sbin/mariadbd(+0xd64019)[0x55d8bcd1d019]
/usr/sbin/mariadbd(_ZN5tpool19thread_pool_generic13timer_generic7executeEPv+0x35)[0x55d8bcea2c45]
/usr/sbin/mariadbd(_ZN5tpool4task7executeEv+0x2b)[0x55d8bcea3d7b]
/usr/sbin/mariadbd(_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x61)[0x55d8bcea21f1]
/lib64/libstdc++.so.6(+0xb5070)[0x7f5fdf10d070]
/lib64/libpthread.so.0(+0x7ea5)[0x7f5fe095cea5]
/lib64/libc.so.6(clone+0x6d)[0x7f5fdea868dd]

Generated at Thu Feb 08 09:33:45 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.