Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
10.6, 10.11, 11.1(EOL), 11.2, 10.11.5, 10.7(EOL), 10.8(EOL), 10.9(EOL), 10.10(EOL), 11.0(EOL), 11.3(EOL)
-
Debian 11 Bullseye; MariaDB from the Maria repositories.
Description
I've had two different servers this week hit an issue with a semaphore wait lasting forever. This bug report isn't about what caused those wait-forever situations: it's about how the server handled them.
If I'm reading the code in storage/innobase/srv/srv0srv.cc correctly, there is supposed to be a warning at 1/4 of the innodb_fatal_semaphore_wait_threshold value, at 1/2 of that value, and at 3/4 of that value before killing the server once the threshold value is reached.
Examining the syslog, however, shows this doesn't seem to be working. One of my servers gave no warnings at all. It waited the full 600 seconds and then gave the fatal error as it crashed itself. The "Long wait" message never appeared at all.
The other server did give the "Long wait" warning, starting at 150 seconds, which is the correct time to start (1/4 of 600). I would expect a warning at 300 seconds, then 450 seconds. However it instead warned again at 159 seconds, and then every 10 seconds after that until 289 at which time I killed the server myself rather than waiting the full 600.
I would like to change my innodb_fatal_semaphore_wait_threshold setting to a dramatically lower number. I'll be able to tell whether it's safe to do so by making incremental changes and observing the presence or absence of "Long wait" warnings. However, if these warnings aren't behaving the way they're supposed to, that won't work.
Apologies if I'm misunderstanding what the behavior is supposed to be. It doesn't seem like zero warnings in some cases and warnings every 10 seconds in other cases would be the intended behavior.
Attachments
Issue Links
- is duplicated by
-
MDEV-32909 Hang not caught by semaphore wait threshold
- Closed
- relates to
-
MDEV-24911 Missing warning before [ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.mutex
- Open