[MDEV-25371] Potential hang in wsrep_is_BF_lock_timeout() Created: 2021-04-08  Updated: 2021-04-08  Resolved: 2021-04-08

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.6
Fix Version/s: 10.6.0

Type: Bug Priority: Blocker
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: regression-10.6

Issue Links:
Blocks
blocks MDEV-24966 Galera multi-master regression Closed
Problem/Incident
is caused by MDEV-24789 Performance regression after MDEV-24671 Closed

 Description   

In MDEV-24671, lock_sys.wait_mutex was moved above lock_sys.mutex (which was later replaced with lock_sys.latch) in the latching order. In MDEV-24789, a potential hang was introduced to Galera. The function lock_wait() would hold lock_sys.wait_mutex while invoking wsrep_is_BF_lock_timeout(), which in turn could acquire LockMutexGuard for some diagnostic printout.

According to jplindst, we can remove that printout.

To catch similar latching order violation in the future, we will add debug checks around lock_sys.latch acquisition. Unfortunately, RW-locks are not covered by SAFE_MUTEX.


Generated at Thu Feb 08 09:37:11 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.