[MDEV-32371] Deadlock between buf_page_get_zip() and buf_pool_t::corrupted_evict() on InnoDB ROW_FORMAT=COMPRESSED table corruption - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.11.5, 10.6, 10.7(EOL), 10.8(EOL), 10.9(EOL), 10.10(EOL), 10.11, 11.0(EOL), 11.1(EOL), 11.2(EOL), 11.3(EOL)
Fix Version/s: 10.6.17, 10.11.7, 11.0.5, 11.1.4, 11.2.3, 11.3.2
Component/s: Storage Engine - InnoDB
Labels:
- hang
Environment:
Debian 11 Bullseye; MariaDB from the Maria repositories.

Description

I've had two different servers this week hit an issue with a semaphore wait lasting forever. This bug report isn't about what caused those wait-forever situations: it's about how the server handled them.

If I'm reading the code in storage/innobase/srv/srv0srv.cc correctly, there is supposed to be a warning at 1/4 of the innodb_fatal_semaphore_wait_threshold value, at 1/2 of that value, and at 3/4 of that value before killing the server once the threshold value is reached.

Examining the syslog, however, shows this doesn't seem to be working. One of my servers gave no warnings at all. It waited the full 600 seconds and then gave the fatal error as it crashed itself. The "Long wait" message never appeared at all.

The other server did give the "Long wait" warning, starting at 150 seconds, which is the correct time to start (1/4 of 600). I would expect a warning at 300 seconds, then 450 seconds. However it instead warned again at 159 seconds, and then every 10 seconds after that until 289 at which time I killed the server myself rather than waiting the full 600.

I would like to change my innodb_fatal_semaphore_wait_threshold setting to a dramatically lower number. I'll be able to tell whether it's safe to do so by making incremental changes and observing the presence or absence of "Long wait" warnings. However, if these warnings aren't behaving the way they're supposed to, that won't work.

Apologies if I'm misunderstanding what the behavior is supposed to be. It doesn't seem like zero warnings in some cases and warnings every 10 seconds in other cases would be the intended behavior.

Attachments

Issue Links

is duplicated by

MDEV-32909 Hang not caught by semaphore wait threshold

Closed

relates to

MDEV-24911 Missing warning before [ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.mutex

Open

Activity

People

Assignee:: Marko Mäkelä

Reporter:: Xan Charbonnet

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 2023-10-07 02:21

Updated:: 2024-03-11 04:04

Resolved:: 2023-11-30 10:01

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server