Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-36218

InnoDB Semaphore Waits for Extended Periods Causing Lock Contention and hangs the server

    XMLWordPrintable

Details

    Description

      We are experiencing significant lock contention in our MariaDB instance, where threads are waiting for InnoDB semaphore locks for long periods (over 240 seconds in some cases). This behavior seems to be linked to the holding of RW-latches by writer threads, preventing other threads from acquiring the necessary locks and causing a substantial delay in query execution. Specifically, the issue seems to be related to locks on the data dictionary or related resources.

      The following logs highlight instances of threads waiting for locks:

      Log Entry 1:

      --Thread 139459816298240 has waited at row0ins.cc line 2700 for 241.00 seconds the semaphore:
      SX-lock on RW-latch at 0x7f441ae0c2c8 created in file buf0buf.cc line 1390
      a writer (thread id 139482392381184) has reserved it in mode exclusive
      

      • Thread 139459816298240 has been waiting for 241.00 seconds for a SX-lock on an RW-latch.
      • The writer thread (ID: 139482392381184) holds the lock in exclusive mode, blocking access to other threads.

      Log Entry 2:

      --Thread 139460081497856 has waited at btr0cur.cc line 1490 for 241.00 seconds the semaphore:
      SX-lock on RW-latch at 0x7edbd225d5e0 created in file dict0dict.cc line 1925
      a writer (thread id 139482392381184) has reserved it in mode SX
      number of readers 103, waiters flag 1, lock_word: fffff99
      Last time write locked in file btr0cur.cc line 1490
      2025-02-25 12:17:43 0 [Note] InnoDB: A semaphore wait:
      

      • Thread 139460081497856 is waiting for 241.00 seconds for an SX-lock on an RW-latch related to the data dictionary.
      • The writer thread (ID: 139482392381184) holds the lock in SX mode (shared exclusive), with 103 readers waiting.

      Log Entry 3:

      --Thread 139460078425856 has waited at row0ins.cc line 1970 for 240.00 seconds the semaphore:
      S-lock on RW-latch at 0x55a539cfb5b0 created in file dict0dict.cc line 833
      a writer (thread id 139482701494016) has reserved it in mode wait exclusive
      number of readers 7, waiters flag 1, lock_word: fffffff9
      Last time write locked in file dict0stats.cc line 2479
      2025-02-25 12:17:43 0 [Note] InnoDB: A semaphore wait:
      

      • Thread 139460078425856 is waiting for 240.00 seconds for an S-lock on an RW-latch related to the data dictionary, with 7 readers also waiting.
      • The writer thread (ID: 139482701494016) holds the lock in wait exclusive mode, causing delays for both readers and writers.

      I am seeing a severe semaphore waits where multiple threads are waiting for an InnoDB RW-latch (Read-Write latch), which is held exclusively by a writer thread for an extended period.

      --Thread 139460275664640 has waited at row0ins.cc line 2700 for 941.00 seconds the semaphore:
      SX-lock on RW-latch at 0x7f441ae0c2c8 created in file buf0buf.cc line 1390
      a writer (thread id 139482392381184) has reserved it in mode  exclusive
      number of readers 0, waiters flag 1, lock_word: f0000000
      Last time write locked in file btr0btr.cc line 229
      OS WAIT ARRAY INFO: reservation count 0
      OS WAIT ARRAY INFO: reservation count 14216156
      OS WAIT ARRAY INFO: reservation count 0
      OS WAIT ARRAY INFO: reservation count 14857263
      OS WAIT ARRAY INFO: reservation count 0
      OS WAIT ARRAY INFO: reservation count 14212809
      --Thread 139459954722560 has waited at ha_innodb.cc line 14436 for 940.00 seconds the semaphore:
      S-lock on RW-latch at 0x7efa11d2ce40 created in file fil0fil.cc line 1276
      a writer (thread id 139482392381184) has reserved it in mode  exclusive
      number of readers 0, waiters flag 1, lock_word: e0000000
      Last time write locked in file fsp0fsp.cc line 2419
      OS WAIT ARRAY INFO: reservation count 0
      OS WAIT ARRAY INFO: reservation count 14848946
      --Thread 139482764965632 has waited at ha_innodb.cc line 14436 for 940.00 seconds the semaphore:
      S-lock on RW-latch at 0x7efa11d2ce40 created in file fil0fil.cc line 1276
      a writer (thread id 139482392381184) has reserved it in mode  exclusive
      number of readers 0, waiters flag 1, lock_word: e0000000
      Last time write locked in file fsp0fsp.cc line 2419
      OS WAIT ARRAY INFO: reservation count 0
      OS WAIT ARRAY INFO: reservation count 14213035
      --Thread 139481774987008 has waited at row0ins.cc line 2700 for 941.00 seconds the semaphore:
      SX-lock on RW-latch at 0x7f441ae0c2c8 created in file buf0buf.cc line 1390
      a writer (thread id 139482392381184) has reserved it in mode  exclusive
      number of readers 0, waiters flag 1, lock_word: f0000000
      Last time write locked in file btr0btr.cc line 229
      OS WAIT ARRAY INFO: reservation count 0
      OS WAIT ARRAY INFO: reservation count 14845575
      

      I was suspecting the buffer pool to be performing in-efficiently, but the buffer pool stats looks normal

      ----------------------
      BUFFER POOL AND MEMORY
      ----------------------
      Total large memory allocated 571230650368
      Dictionary memory allocated 470920
      Buffer pool size   33650688
      Free buffers       65536
      Database pages     54403797
      Old database pages 20081362
      Modified db pages  290
      Percent of dirty pages(LRU & free pages): 0.001
      Max dirty pages percent: 80.000
      Pending reads 0
      Pending writes: LRU 0, flush list 0, single page 0
      Pages made young 804830394, not young 143736088851
      0.00 youngs/s, 0.00 non-youngs/s
      Pages read 2300877009, created 1058711993, written 15684288049
      0.00 reads/s, 2.25 creates/s, 302.13 writes/s
      Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
      Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
      LRU len: 54403797, unzip_LRU len: 5440476
      I/O sum[6144384]:cur[348160], unzip sum[0]:cur[0]
      

      I am suspecting it is encountering the following MDEV-15020 error.

      Can someone help to validate the possible root cause and any way I can re-produce the issue.

      Attachments

        Activity

          People

            Unassigned Unassigned
            keshshan Keshan Nageswaran
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.