[MDEV-22527] Add thread info for innodb semaphore wait in error log Created: 2020-05-11  Updated: 2021-09-30  Resolved: 2021-07-20

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Fix Version/s: N/A

Type: Task Priority: Major
Reporter: Manjot Singh (Inactive) Assignee: Marko Mäkelä
Resolution: Won't Fix Votes: 1
Labels: None

Issue Links:
Duplicate
duplicates MDEV-21330 Lock monitor doesn't print a semaphor... Closed

 Description   

In the error log, with InnoDB semaphore wait NOTE or WARNING (as shown below), information about which thread is holding the mutex is not shown. E.g.:

2020-04-21 18:49:04 139273581979392 [Note] InnoDB: A semaphore wait:
--Thread 139269554730752 has waited at dict0dict.cc line 1153 for 514.00 seconds the semaphore:
Mutex at 0x5612f8e8ec40, Mutex DICT_SYS created dict0dict.cc:1089, lock var 2

The log should show information on who is holding the mutexes to aid in troubleshooting:

2020-04-21 18:49:04 139273581979392 [Note] InnoDB: A semaphore wait:
--Thread 139269603882752 has waited at btr0cur.cc line 3872 for 0.00 seconds the semaphore:
X-lock on RW-latch at 0x5612f8c2ee80 created in file btr0sea.cc line 193
a writer (thread id 139271632033536) has reserved it in mode exclusive
number of readers 0, waiters flag 0, lock_word: 0
Last time read locked in file btr0sea.ic line 134
Last time write locked in file btr0cur.cc line 3872



 Comments   
Comment by Geoff Montee (Inactive) [ 2020-05-11 ]

This appears to be a duplicate of MDEV-21330.

Comment by Geoff Montee (Inactive) [ 2020-05-11 ]

This issue differs from MDEV-21330 in a subtle way: it's asking for InnoDB to print the thread ID of the thread holding the mutex, while MDEV-21330 is asking for InnoDB to print the same thread info in release builds as it currently does in debug builds. Debug builds currently print the following information:

  • The thread ID of the thread waiting for the semaphore.
  • The type of semaphore. (i.e. mutex or RW-latch)
  • The source file and line where the thread is waiting for the lock.
  • The source file and line where the semaphore was last reserved.

I agree that the thread ID of the thread holding the mutex should be printed as well. However, we still need the above information to be printed too. The thread ID is not always helpful on its own. The problem is that some threads do not appear in SHOW ENGINE INNODB STATUS, so the thread ID is not always enough to identify the thread. In that case, the information requested by MDEV-21330 (which is currently only available in debug builds) can be used to identify the thread.

Comment by Geoff Montee (Inactive) [ 2020-05-21 ]

My perspective of this is that MDEV-21330 and MDEV-22527 are asking for the same thing: the lock monitor in the error log and SHOW ENGINE INNODB STATUS output should allow you to properly diagnose semaphore waits. Each of the issues is asking for a different piece of that:

1.) MDEV-21330 is basically asking for the output to include what the thread is doing while holding the semaphore. This code is already written, but it is only enabled in debug builds for some reason.
2.) MDEV-22527 is basically asking for the output to include which thread is holding the semaphore. This code may not be written at all yet.

Both pieces of this information appear to be necessary for proper diagnostics, because not all threads appear in the lock monitor in the error log or SHOW ENGINE INNODB STATUS output. For those threads, #2 may not be enough information to diagnose the problem. In that case, you may need #1 to determine if the semaphore is held by some internal InnoDB thread that is mostly hidden from the user.

Comment by Eugene Kosov (Inactive) [ 2020-08-04 ]

MDEV-21330 introduced two variables to write on every lock() and this MDEV requests to add another variable thread_id which will become a scalability issue too similar to the case of MDEV-21330

Comment by Marko Mäkelä [ 2021-07-20 ]

The semaphore wait section was removed in MDEV-21452. We could theoretically implement the requested diagnostics in older major versions than 10.6 only.

Comment by Manjot Singh (Inactive) [ 2021-07-20 ]

Marko, I don't see this removal in the ticket description. Why would we remove important and critical observability from our engine?

Generated at Thu Feb 08 09:15:24 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.