[MDEV-35139] Semaphore wait has lasted > 600 seconds - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Won't Fix
Affects Version/s: 10.5.26
Fix Version/s: 10.6.1
Component/s: Locking, Server
Labels:
None
Environment:
Debian 11

Description

Hello,

we are experiencing intermitent crashes/reboots ( ~ 1 / week) on one of our systems. Significant traits are that the system in question is quite busy, with lot of deadlocks as well. Increase of processed data is the only change which can be considered as importatnt to mention, when comparing current and earlier, stable, state.

mariadb.cfg

port = 3307

tmp_disk_table_size = 10737418240  #10G

max_allowed_packet = 134217728

wait_timeout = 300

interactive_timeout = 300

innodb_buffer_pool_size = 300G

max_connections = 8000

thread_cache_size = 8000

innodb_print_all_deadlocks = 1

innodb_log_file_size = 32G

innodb_io_capacity = 1000

innodb_max_dirty_pages_pct = 50

lock_wait_timeout = 300

transaction-isolation = 'READ-COMMITTED'

open_files_limit = 65535

expire_logs_days = 4

max_binlog_size = 1000M

binlog_format = ROW

sync_binlog = 0

binlog_annotate_row_events=0

slave_parallel_mode = CONSERVATIVE

log_bin_trust_function_creators = ON

log_slave_updates = ON

sync_master_info=0

sync_relay_log=0

sync_relay_log_info=0

replicate_annotate_row_events=0

innodb_flush_log_at_trx_commit = 2

gtid_strict_mode = 1

innodb_stats_persistent = OFF

innodb_stats_auto_recalc = OFF

innodb_stats_traditional = OFF

core_file = 1

error log

...

2024-09-30 11:36:01 0 [Note] InnoDB: A semaphore wait:

--Thread 139958540924672 has waited at dict0dict.cc line 1094 for 119.00 seconds the semaphore:

Mutex at 0x55d3ccb631c0, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1038, lock var 2

2024-09-30 11:36:01 0 [Note] InnoDB: A semaphore wait:

--Thread 139969000175360 has waited at row0undo.cc line 412 for 119.00 seconds the semaphore:

S-lock on RW-latch at 0x55d3ccb631f8 created in file dict0dict.cc line 1047

a writer (thread id 139958803928832) has reserved it in mode  exclusive

number of readers 0, waiters flag 1, lock_word: 0

Last time write locked in file handler0alter.cc line 11265

2024-09-30 11:36:01 0 [Note] InnoDB: A semaphore wait:

--Thread 139958496712448 has waited at dict0dict.cc line 1094 for 119.00 seconds the semaphore:

Mutex at 0x55d3ccb631c0, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1038, lock var 2

...

2024-09-30 12:39:17 0 [Note] InnoDB: A semaphore wait:

--Thread 139971173091072 has waited at dict0dict.cc line 1094 for 31.00 seconds the semaphore:

Mutex at 0x55d3ccb631c0, Mutex DICT_SYS created ./storage/innobase/dict/dict0dict.cc:1038, lock var 2

...

2024-09-30 12:39:17 0 [Note] InnoDB: A semaphore wait:

--Thread 139971126847232 has waited at trx0trx.cc line 883 for 546.00 seconds the semaphore:

Mutex at 0x55d552b97ae0, Mutex REDO_RSEG created ./storage/innobase/trx/trx0rseg.cc:417, lock var 2

...

2024-09-30 12:39:17 0 [Note] InnoDB: A semaphore wait:

--Thread 139971147429632 has waited at trx0trx.cc line 883 for 575.00 seconds the semaphore:

Mutex at 0x55d552b97ae0, Mutex REDO_RSEG created ./storage/innobase/trx/trx0rseg.cc:417, lock var 2

2024-09-30 12:39:17 0 [Note] InnoDB: A semaphore wait:

--Thread 139971330160384 has waited at trx0trx.cc line 883 for 256.00 seconds the semaphore:

Mutex at 0x55d552b97ae0, Mutex REDO_RSEG created ./storage/innobase/trx/trx0rseg.cc:417, lock var 2

InnoDB: Pending reads 1, writes 0

2024-09-30 12:39:17 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.

240930 12:39:17 [ERROR] mysqld got signal 6 ;

Sorry, we probably made a mistake, and this is a bug.

Your assistance in bug reporting will enable us to fix this for the next release.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help

diagnose the problem, but since we have already crashed,

something is definitely wrong and this may fail.

Server version: 10.5.26-MariaDB-0+deb11u2-log source revision: 7a5b8bf0f5470a13094101f0a4bdfa9e1b9ded02

key_buffer_size=134217728

read_buffer_size=131072

max_used_connections=4001

max_threads=4002

thread_count=4001

It is possible that mysqld could use up to

key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 8941918 K  bytes of memory

Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0

Attempting backtrace. You can use the following information to find out

where mysqld died. If you see no messages after this, something went

terribly wrong...

stack_bottom = 0x0 thread_stack 0x49000

Printing to addr2line failed

/usr/sbin/mariadbd(my_print_stacktrace+0x2e)[0x55d3cc34abfe]

/usr/sbin/mariadbd(handle_fatal_signal+0x475)[0x55d3cbe36645]

/lib/x86_64-linux-gnu/libpthread.so.0

/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x141)[0x7f99f854fd51]

/lib/x86_64-linux-gnu/libc.so.6(abort+0x123)[0x7f99f8539537]

/usr/sbin/mariadbd(+0x65a4ad)[0x55d3cbad84ad]

/usr/sbin/mariadbd(+0x650d20)[0x55d3cbaced20]

/usr/sbin/mariadbd(_ZN5tpool19thread_pool_generic13timer_generic7executeEPv+0x38)[0x55d3cc2ec0c8]

/usr/sbin/mariadbd(_ZN5tpool4task7executeEv+0x32)[0x55d3cc2ed292]

/usr/sbin/mariadbd(_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x4f)[0x55d3cc2eaf9f]

/lib/x86_64-linux-gnu/libstdc++.so.6(+0xceed0)[0x7f99f88ffed0]

/lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7)[0x7f99f8a0bea7]

/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f99f8612acf]

The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mariadbd/ contains

information that should help you find out what is causing the crash.

Writing a core file...

Working directory at /var/lib/mysql-data

Resource Limits:

Limit                     Soft Limit           Hard Limit           Units

Max cpu time              unlimited            unlimited            seconds

Max file size             unlimited            unlimited            bytes

Max data size             unlimited            unlimited            bytes

Max stack size            8388608              unlimited            bytes

Max core file size        unlimited            unlimited            bytes

Max resident set          unlimited            unlimited            bytes

Max processes             2061719              2061719              processes

Max open files            131070               131070               files

Max locked memory         65536                65536                bytes

Max address space         unlimited            unlimited            bytes

Max file locks            unlimited            unlimited            locks

Max pending signals       2061719              2061719              signals

Max msgqueue size         819200               819200               bytes

Max nice priority         0                    0

Max realtime priority     0                    0

Max realtime timeout      unlimited            unlimited            us

Core pattern: core

Kernel version: Linux version 5.10.0-27-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.205-2 (2023-12-31)

2024-09-30 12:40:28 0 [Note] Starting MariaDB 10.5.26-MariaDB-0+deb11u2-log source revision 7a5b8bf0f5470a13094101f0a4bdfa9e1b9ded02 server_uid Bf2sgYgSusKDXowmzo3Sx8bpGj4= as process 3945280

There is also 25G core dump, though I am hesitant to upload it considering its size and potentially sensitive information.

Attachments

Issue Links

relates to

MDEV-26733 assert on shutdown lock->lock_word == X_LOCK_DECR in test

Closed

Semaphore wait has lasted > 600 seconds

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration