Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34193

InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.

    XMLWordPrintable

Details

    • Bug
    • Status: Needs Feedback (View Workflow)
    • Critical
    • Resolution: Unresolved
    • 10.4.20
    • 10.4(EOL)
    • None
    • RHEL baremetal server

    Description

      MariaDB Server hung and then crashed due to a long semaphore wait. The error log shows many overlapping semaphores active. The breakdown of the counts, files, and line numbers from the last dump from SHOW ENGINE INNODB STATUS before the crash-

      21  has waited at trx0trx.cc line 907
      5  has waited at row0sel.cc line 4728
      4  has waited at ha_innodb.cc line 12977
      2  has waited at btr0pcur.ic line 547
      1  has waited at trx0undo.ic line 129
      1  has waited at trx0undo.cc line 1348
      1  has waited at srv0srv.cc line 2015
      1  has waited at row0row.cc line 1304
      1  has waited at row0purge.cc line 964
      1  has waited at buf0flu.cc line 1186
      1  has waited at buf0buf.cc line 4105
      1  has waited at btr0cur.cc line 6556
      1  has waited at btr0btr.cc line 229

      Here is some of the output from that which also demonstrates how long some of these semaphore waits existed before the crash-

      ----------
      SEMAPHORES
      ----------
      OS WAIT ARRAY INFO: reservation count 62447733
      --Thread 140201591777024 has waited at trx0trx.cc line 907 for 900.00 seconds the semaphore:
      Mutex at 0x5653dd2ad098, Mutex REDO_RSEG created trx0rseg.cc:403, lock var 2
       
      --Thread 140201585018624 has waited at row0sel.cc line 4728 for 936.00 seconds the semaphore:
      S-lock on RW-latch at 0x7f9fb80a9588 created in file buf0buf.cc line 1568
      a writer (thread id 140201709262592) has reserved it in mode  wait exclusive
      number of readers 1, waiters flag 1, lock_word: efffffff
      Last time write locked in file btr0btr.cc line 229
      --Thread 140201742833408 has waited at srv0srv.cc line 2015 for 928.00 seconds the semaphore:
      X-lock (wait_ex) on RW-latch at 0x565330a56a50 created in file dict0dict.cc line 833
      a writer (thread id 140201742833408) has reserved it in mode  wait exclusive
      number of readers 4, waiters flag 1, lock_word: fffffffc
      Last time write locked in file dict0stats.cc line 2486
      --Thread 140201577338624 has waited at trx0trx.cc line 907 for 934.00 seconds the semaphore:
      Mutex at 0x5653dd2ad098, Mutex REDO_RSEG created trx0rseg.cc:403, lock var 2
       
      --Thread 140201717655296 has waited at row0purge.cc line 964 for 936.00 seconds the semaphore:
      SX-lock on RW-latch at 0x7f814819fe00 created in file dict0dict.cc line 1957
      a writer (thread id 140201709262592) has reserved it in mode  SX
      number of readers 8, waiters flag 1, lock_word: ffffff8
      Last time write locked in file row0purge.cc line 964
      --Thread 140201692169984 has waited at trx0trx.cc line 907 for 935.00 seconds the semaphore:
      Mutex at 0x5653dd2ad098, Mutex REDO_RSEG created trx0rseg.cc:403, lock var 2

      The affected instance was not using encryption, so this should be unrelated to MDEV-33770. The version is also patched against MDEV-22456 which was another known cause of long semaphore wait timeouts.

      Attachments

        Issue Links

          Activity

            People

              rob.schwyzer@mariadb.com Rob Schwyzer
              juan.vera Juan
              Votes:
              2 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.