[MDEV-37728] Shutdown hang due to deadlock between timer_handler and srv_thread_pool_end - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.6, 10.11, 11.4, 11.8
Fix Version/s: 10.11.15, 11.4.9, 11.8.4, 12.1.2
Component/s: Storage Engine - InnoDB
Labels:

Bug Category:
Can result in hang or crash
Release Note Summary:
In rare cases, shutdown might hang

Description

saahil reported a shutdown hang in https://github.com/MariaDB/server/pull/4264#discussion_r2372053903 while testing ~~MDEV-37482~~.

As far as I can tell (please see the details in the link), the dedicated timer_handler() thread is holding LOCK_timer and later waiting on a mutex that was acquired in tpool::thread_pool_generic::timer_generic::disarm(), which is invoked deep inside the function srv_thread_pool_end(). After this, tpool::thread_pool_generic::timer_generic::disarm() (still holding that mutex) will invoke thr_timer_end(), which will wait on LOCK_timer, which is being held by the timer_handler() thread.

This is obvious lock order inversion: one thread waits for LOCK_timer while holding the other mutex, and timer_handler() has the waits in the opposite way.

A lock order inversion does not always cause a deadlock, but it is a prerequisite for one. In this case, we got evidence of an actual hang due to this deadlock.

Attachments

Issue Links

relates to

MDEV-16264 Implement a common work queue for InnoDB background tasks

Closed

MDEV-37482 Contention on btr_sea::partition::latch

Closed

Activity

People

Assignee:: Vladislav Vaintroub

Reporter:: Marko Mäkelä

Assigned for Implementation:: Vladislav Vaintroub

Assigned for Testing:: Saahil Alam

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2025-09-24 13:04

Updated:: 2025-09-30 17:50

Resolved:: 2025-09-25 15:10

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.