[MDEV-31293] Threads stuck on semaphore wait causing MariaDB to crash Created: 2023-05-16 Updated: 2023-09-18 Resolved: 2023-09-18 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | OTHER, Storage Engine - InnoDB |
| Affects Version/s: | 10.5.17, 10.5.18, 10.5.19 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Critical |
| Reporter: | Joris de Leeuw | Assignee: | Unassigned |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | crash, deadlock, semaphore | ||
| Environment: |
OS: CloudLinux 8 like RHEL - Kernel: 4.18.0-348.20.1.lve.1.el8.x86_64 |
||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Description |
|
We run rather large production servers with over hundreds of databases with varying sizes between a few MB and many GB. For over a year we have issues where suddenly all threads of MariaDB get stuck on semaphore wait. The only way to resolve this is by killing MariaDB With all crashes the threads get stuck on the same lock:
Previously we presumed together with the help of Sergei Golubchik ( After switching to tcmalloc this behaviour became less visible. However it is still happening. Fortunately less often but the same behaviour is still there. It is not reproducible but it happens mostly on most busy production servers running over hundreds or even thousand of databases. Also the change seems higher if a server has bigger InnoDB databases ( 1 GB or bigger ) and the chance seems higher when there is more memory pressure on a system (e.g. still 20 GB RAM free of 128 GB in total) We use ZFS which requires a lot of 128K memory segments. This can cause memory pressure and might influence MariaDB in its behaviour. We ensure however that servers have enough CPU and RAM available and try to prevent performance degradation/swapping. So when this behaviour happens the load isn't higher than normal and well below what the system and MariaDB should be able to handle. Attached are a redacted Backtraces For All Threads From a Core File, MariaDB logging during a crash. |
| Comments |
| Comment by Daniel Black [ 2023-05-16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks for the detailed backtrace. Most threads are spinning in base::internal::SpinLockDelay from tcmalloc, and it appears at a quick look, the others are waiting on locks held by the other threads stuck in base::internal::SpinLockDelay while attempting to free memory. I'd suggest starting looking at the tcmalloc version and seeing if a) the current version has bugs, b) if any tuning is required to handle as many threads as MariaDB is using in your environment. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2023-08-14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Could it be like tcmalloc issue 111 where vm.max_map_count being too low? Check wc -l /proc/$(pidof mariadbd)/maps and see how it compares to the sysctl vm.max_map_count? Is thp configured? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-08-14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
In mariadbd_full_bt_all_threads.txt The holder of dict_syst.mutex ought to be Thread 140, executing the following:
I double checked that anything below srv_master_evict_from_table_cache() should be continuously holding dict_sys.mutex. Hypothetically speaking, we could suspect a bug in MariaDB Server if:
Based on the tcmalloc function names, this looks like a hang in tcmalloc itself. Any synchronization primitives inside a memory allocation library are conceptually at a lower level than anything that MariaDB Server would use. I am not aware of any intentional or designed ‘callbacks’ from a memory allocator to MariaDB Server. Thread 196 seems to trigger SIGFPE somewhere in tcmalloc:
Could it simply be that the invocation of my_print_stacktrace() in a signal handler is violating man 7 signal-safety? At the top of this stack, the signal handler is invoking tcmalloc again:
A section from man 7 signal-safety on my system discourages invoking fork(2) in a signal handler:
For what it is worth, I do not see any reference to pthread_atfork in MariaDB Server either. It would seem plausible to me that the hang was caused by the SIGFPE handler invoking my_print_stacktrace(). Do these hangs go away if the server is invoked with --skip-stack-trace or the option skip_stack_trace is present in the configuration files? |