Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-22527

Add thread info for innodb semaphore wait in error log

Details

    Description

      In the error log, with InnoDB semaphore wait NOTE or WARNING (as shown below), information about which thread is holding the mutex is not shown. E.g.:

      2020-04-21 18:49:04 139273581979392 [Note] InnoDB: A semaphore wait:
      --Thread 139269554730752 has waited at dict0dict.cc line 1153 for 514.00 seconds the semaphore:
      Mutex at 0x5612f8e8ec40, Mutex DICT_SYS created dict0dict.cc:1089, lock var 2
      

      The log should show information on who is holding the mutexes to aid in troubleshooting:

      2020-04-21 18:49:04 139273581979392 [Note] InnoDB: A semaphore wait:
      --Thread 139269603882752 has waited at btr0cur.cc line 3872 for 0.00 seconds the semaphore:
      X-lock on RW-latch at 0x5612f8c2ee80 created in file btr0sea.cc line 193
      a writer (thread id 139271632033536) has reserved it in mode exclusive
      number of readers 0, waiters flag 0, lock_word: 0
      Last time read locked in file btr0sea.ic line 134
      Last time write locked in file btr0cur.cc line 3872
      

      Attachments

        Issue Links

          Activity

            This appears to be a duplicate of MDEV-21330.

            GeoffMontee Geoff Montee (Inactive) added a comment - This appears to be a duplicate of MDEV-21330 .

            This issue differs from MDEV-21330 in a subtle way: it's asking for InnoDB to print the thread ID of the thread holding the mutex, while MDEV-21330 is asking for InnoDB to print the same thread info in release builds as it currently does in debug builds. Debug builds currently print the following information:

            • The thread ID of the thread waiting for the semaphore.
            • The type of semaphore. (i.e. mutex or RW-latch)
            • The source file and line where the thread is waiting for the lock.
            • The source file and line where the semaphore was last reserved.

            I agree that the thread ID of the thread holding the mutex should be printed as well. However, we still need the above information to be printed too. The thread ID is not always helpful on its own. The problem is that some threads do not appear in SHOW ENGINE INNODB STATUS, so the thread ID is not always enough to identify the thread. In that case, the information requested by MDEV-21330 (which is currently only available in debug builds) can be used to identify the thread.

            GeoffMontee Geoff Montee (Inactive) added a comment - This issue differs from MDEV-21330 in a subtle way: it's asking for InnoDB to print the thread ID of the thread holding the mutex, while MDEV-21330 is asking for InnoDB to print the same thread info in release builds as it currently does in debug builds. Debug builds currently print the following information: The thread ID of the thread waiting for the semaphore. The type of semaphore. (i.e. mutex or RW-latch) The source file and line where the thread is waiting for the lock. The source file and line where the semaphore was last reserved. I agree that the thread ID of the thread holding the mutex should be printed as well. However, we still need the above information to be printed too. The thread ID is not always helpful on its own. The problem is that some threads do not appear in SHOW ENGINE INNODB STATUS , so the thread ID is not always enough to identify the thread. In that case, the information requested by MDEV-21330 (which is currently only available in debug builds) can be used to identify the thread.

            My perspective of this is that MDEV-21330 and MDEV-22527 are asking for the same thing: the lock monitor in the error log and SHOW ENGINE INNODB STATUS output should allow you to properly diagnose semaphore waits. Each of the issues is asking for a different piece of that:

            1.) MDEV-21330 is basically asking for the output to include what the thread is doing while holding the semaphore. This code is already written, but it is only enabled in debug builds for some reason.
            2.) MDEV-22527 is basically asking for the output to include which thread is holding the semaphore. This code may not be written at all yet.

            Both pieces of this information appear to be necessary for proper diagnostics, because not all threads appear in the lock monitor in the error log or SHOW ENGINE INNODB STATUS output. For those threads, #2 may not be enough information to diagnose the problem. In that case, you may need #1 to determine if the semaphore is held by some internal InnoDB thread that is mostly hidden from the user.

            GeoffMontee Geoff Montee (Inactive) added a comment - My perspective of this is that MDEV-21330 and MDEV-22527 are asking for the same thing: the lock monitor in the error log and SHOW ENGINE INNODB STATUS output should allow you to properly diagnose semaphore waits. Each of the issues is asking for a different piece of that: 1.) MDEV-21330 is basically asking for the output to include what the thread is doing while holding the semaphore . This code is already written, but it is only enabled in debug builds for some reason. 2.) MDEV-22527 is basically asking for the output to include which thread is holding the semaphore . This code may not be written at all yet. Both pieces of this information appear to be necessary for proper diagnostics, because not all threads appear in the lock monitor in the error log or SHOW ENGINE INNODB STATUS output. For those threads, #2 may not be enough information to diagnose the problem. In that case, you may need #1 to determine if the semaphore is held by some internal InnoDB thread that is mostly hidden from the user.
            kevg Eugene Kosov (Inactive) added a comment - - edited

            MDEV-21330 introduced two variables to write on every lock() and this MDEV requests to add another variable thread_id which will become a scalability issue too similar to the case of MDEV-21330

            kevg Eugene Kosov (Inactive) added a comment - - edited MDEV-21330 introduced two variables to write on every lock() and this MDEV requests to add another variable thread_id which will become a scalability issue too similar to the case of MDEV-21330

            The semaphore wait section was removed in MDEV-21452. We could theoretically implement the requested diagnostics in older major versions than 10.6 only.

            marko Marko Mäkelä added a comment - The semaphore wait section was removed in MDEV-21452 . We could theoretically implement the requested diagnostics in older major versions than 10.6 only.

            Marko, I don't see this removal in the ticket description. Why would we remove important and critical observability from our engine?

            manjot Manjot Singh (Inactive) added a comment - Marko, I don't see this removal in the ticket description. Why would we remove important and critical observability from our engine?

            People

              marko Marko Mäkelä
              manjot Manjot Singh (Inactive)
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.