Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-31293

Threads stuck on semaphore wait causing MariaDB to crash

    XMLWordPrintable

Details

    Description

      We run rather large production servers with over hundreds of databases with varying sizes between a few MB and many GB.

      For over a year we have issues where suddenly all threads of MariaDB get stuck on semaphore wait.

      The only way to resolve this is by killing MariaDB or waiting for intentional semaphore wait crash.

      With all crashes the threads get stuck on the same lock:

      2023-05-12  8:30:03 0 [Note] InnoDB: A semaphore wait:
       
      --Thread 140122331952896 has waited at ha_innodb.cc line 14402 for 237.00 seconds the semaphore:
       
      Mutex at 0x5563d14c8bc0, Mutex DICT_SYS created /builddir/build/BUILD/mariadb-10.5.19/storage/innobase/dict/dict0dict.cc:1038, lock var 2
      

      Previously we presumed together with the help of Sergei Golubchik ( MDEV-30390 ) that this might be related to jemalloc memory allocator.

      After switching to tcmalloc this behaviour became less visible. However it is still happening. Fortunately less often but the same behaviour is still there.

      It is not reproducible but it happens mostly on most busy production servers running over hundreds or even thousand of databases.

      Also the change seems higher if a server has bigger InnoDB databases ( 1 GB or bigger ) and the chance seems higher when there is more memory pressure on a system (e.g. still 20 GB RAM free of 128 GB in total)

      We use ZFS which requires a lot of 128K memory segments. This can cause memory pressure and might influence MariaDB in its behaviour.

      We ensure however that servers have enough CPU and RAM available and try to prevent performance degradation/swapping. So when this behaviour happens the load isn't higher than normal and well below what the system and MariaDB should be able to handle.

      Attached are a redacted Backtraces For All Threads From a Core File, MariaDB logging during a crash.
      SHOW ENGINE INNODB STATUS.txt was made after the crash.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              Joriz Joris de Leeuw
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.