Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34442

InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch

Details

    Description

      The server hangs with a lot of queries in 'Killed' status. The main culprit to the hang seems to be a massive update on a wp_options (wordpress) table. This in itself is not odd.

      What is weird is that MariaDB stops accepting new queries and when the max_connection limit is reach it's restarted. The restart does not work, ends up crashing with error code 6. No core dump dumped as of yet, but I did manage to produce stack traces for all threads (attachment).

      [ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch. Please refer to https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/
      240623  8:17:35 [ERROR] mysqld got signal 6 ;
      Sorry, we probably made a mistake, and this is a bug.
       
      Your assistance in bug reporting will enable us to fix this for the next release.
      To report this bug, see https://mariadb.com/kb/en/reporting-bugs
       
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed,
      something is definitely wrong and this may fail.
       
      Server version: 10.6.18-MariaDB-cll-lve source revision: 887bb3f73555ff8a50138a580ca8308b9b5c069c
      key_buffer_size=402653184
      read_buffer_size=2097152
      max_used_connections=801
      max_threads=802
      thread_count=802
      It is possible that mysqld could use up to
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 3699409 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
       
      Thread pointer: 0x0
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x0 thread_stack 0x49000
      /usr/sbin/mariadbd(my_print_stacktrace+0x2e)[0x55fdf610648e]
      mysys/stacktrace.c:216(my_print_stacktrace)[0x55fdf5be1d45]
      /lib64/libpthread.so.0(+0x12d20)[0x7f0ff4915d20]
      /lib64/libc.so.6(gsignal+0x10f)[0x7f0ff3c6052f]
      /lib64/libc.so.6(abort+0x127)[0x7f0ff3c33e65]
      /usr/sbin/mariadbd(+0x678245)[0x55fdf5883245]
      ut/ut0ut.cc:476(Wsrep_server_service::log_dummy_write_set(wsrep::client_state&, wsrep::ws_meta const&))[0x55fdf587bcdc]
      srv/srv0srv.cc:1233(Wsrep_server_service::log_dummy_write_set(wsrep::client_state&, wsrep::ws_meta const&))[0x55fdf609a1c8]
      tpool/tpool_generic.cc:346(tpool::thread_pool_generic::timer_generic::execute(void*))[0x55fdf609af0b]
      tpool/task.cc:38(tpool::task::execute())[0x55fdf6098a7f]
      /lib64/libstdc++.so.6(+0xc2b23)[0x7f0ff442cb23]
      /lib64/libpthread.so.0(+0x81ca)[0x7f0ff490b1ca]
      /lib64/libc.so.6(clone+0x43)[0x7f0ff3c4b8d3]
      The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mariadbd/ contains
      information that should help you find out what is causing the crash.
      Writing a core file...
      Working directory at /var/lib/mysql
      Resource Limits:
      Limit                     Soft Limit           Hard Limit           Units
      Max cpu time              unlimited            unlimited            seconds
      Max file size             unlimited            unlimited            bytes
      Max data size             unlimited            unlimited            bytes
      Max stack size            8388608              unlimited            bytes
      Max core file size        unlimited            unlimited            bytes
      Max resident set          unlimited            unlimited            bytes
      Max processes             642336               642336               processes
      Max open files            80000                80000                files
      Max locked memory         65536                65536                bytes
      Max address space         unlimited            unlimited            bytes
      Max file locks            unlimited            unlimited            locks
      Max pending signals       642336               642336               signals
      Max msgqueue size         819200               819200               bytes
      Max nice priority         0                    0
      Max realtime priority     0                    0
      Max realtime timeout      unlimited            unlimited            us
      Core pattern: |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e
      

      Attachments

        Issue Links

          Activity

            I analysed the latest stack trace in output.log.anon.gz in details.
            There are total 784 threads.

            1. 9 Background threads waiting for appropriate trigger [all expected]

            mysqld_main, handle_manager, timer_handler, signal_hand, ma_checkpoint_background, tpool::aio_linux::getevent_thread_routine, tpool::thread_pool_generic::wait_for_tasks, tpool::thread_pool_generic::wait_for_tasks, trx_purge_wait_for_workers_to_complete
            

            2. 1 Server monitor task raising fatal error after detecting too long wait on dictionary system lock

            Thread 1 (Thread 0x7fb949ffb700 (LWP 2630941)):
            #5  0x00007fbb34a58e65 in __GI_abort () at abort.c:79
            #6  0x000055b467cbb245 in ib::fatal::~fatal (this=<optimized out>, __in_chrg=<optimized out>)
                at /usr/src/debug/MariaDB-/src_0/storage/innobase/ut/ut0ut.cc:476
            #7  0x000055b467cb3cdc in srv_monitor_task ()
            

            3. 627 user DML/QUERY threads waiting on dict_sys_t::freeze : S lock dictionary system

            #4  ssux_lock_impl<false>::rd_wait (this=this@entry=0x55b468f93788 <dict_sys+72>)
                at /usr/src/debug/MariaDB-/src_0/storage/innobase/sync/srw_lock.cc:397
            ...
            #8  dict_sys_t::freeze (line=line@entry=1039,
                file=0x55b4687ecab0 "/builddir/build/BUILD/cl-MariaDB106-10.6.18/mariadb-10.6.18/storage/innobase/dict/dict0dict.cc", this=0x55b468f93740 <dict_sys>)
                at /usr/src/debug/MariaDB-/src_0/storage/innobase/dict/dict0dict.cc:1006
            #9  0x000055b468455342 in dict_table_open_on_name (
                table_name=0x7fb9ea187b20 "anonymous1", dict_locked=<optimized out>,
                ignore_err=DICT_ERR_IGNORE_FK_NOKEY)
            

            4. 13 DML/QUERY threads waiting dict_sys_t::lock : X lock for dictionary system

            #6  dict_sys_t::lock_wait (this=0x55b468f93740 <dict_sys>, file=<optimized out>, line=225)
                at /usr/src/debug/MariaDB-/src_0/storage/innobase/dict/dict0dict.cc:995
            #7  0x000055b4684551f0 in dict_sys_t::lock (
                file=0x55b4687ecab0 "/builddir/build/BUILD/cl-MariaDB106-10.6.18/mariadb-10.6.18/storage/innobase/dict/dict0dict.cc", line=225, this=0x55b468f93740 <dict_sys>)
                at /usr/src/debug/MariaDB-/src_0/storage/innobase/include/dict0dict.h:1525
            #8  dict_table_close (table=0x7fb62c0bfb60)
            

            5. 48 user DML/QUERY threads waiting on S/X lock for buffer pool pages

            #1  0x000055b4683de748 in srw_mutex_impl<true>::wait (lk=<optimized out>, this=0x7fba88130898)
                at /usr/src/debug/MariaDB-/src_0/storage/innobase/sync/srw_lock.cc:243
            ...
            #5  sux_lock<ssux_lock_impl<true> >::x_lock (for_io=false, this=0x7fba88130898)
                at /usr/src/debug/MariaDB-/src_0/storage/innobase/include/sux_lock.h:395
            #6  buf_page_get_low (page_id=..., zip_size=<optimized out>, rw_latch=<optimized out>,
                guess=<optimized out>, mode=<optimized out>, mtr=<optimized out>, err=<optimized out>,
                allow_ibuf_merge=<optimized out>)
            

            6. 84 DML/QUERY threads waiting to free Page in the middle of a mini-transaction. Blocking threads in [5] waiting for page lock.

            #2  __pthread_cond_timedwait (cond=cond@entry=0x55b468f926b0 <buf_pool+17328>,
                mutex=mutex@entry=0x55b468f8e300 <buf_pool>, abstime=abstime@entry=0x7fba7f2f15e0)
                at pthread_cond_wait.c:667
            #3  0x000055b468445e02 in buf_LRU_get_free_block (have_mutex=have_mutex@entry=false)
                at /usr/src/debug/MariaDB-/src_0/storage/innobase/buf/buf0lru.cc:452
            #4  0x000055b467d1c937 in buf_page_init_for_read (mode=132, page_id=..., zip_size=0,
                unzip=<optimized out>) at /usr/src/debug/MariaDB-/src_0/storage/innobase/buf/buf0rea.cc:121
            #5  0x000055b46844629f in buf_read_page_low (space=0x7fba70060c28, sync=true, mode=132, page_id=...,
                zip_size=0, unzip=<optimized out>)
            

            7. 1 DML/QUERY thread is holding X dictionary system lock and waiting for free page. This one blocks all threads in [3] and [4] asking for dictionary system locks eventually causing the fatal error.

            Thread 296 (Thread 0x7fba7ec09700 (LWP 2781974)):
            #2  __pthread_cond_timedwait (cond=cond@entry=0x55b468f926b0 <buf_pool+17328>,
                mutex=mutex@entry=0x55b468f8e300 <buf_pool>, abstime=abstime@entry=0x7fba7ec047d0)
                at pthread_cond_wait.c:667
            #3  0x000055b468445e02 in buf_LRU_get_free_block (have_mutex=have_mutex@entry=false)
                at /usr/src/debug/MariaDB-/src_0/storage/innobase/buf/buf0lru.cc:452
            #4  0x000055b467d1c937 in buf_page_init_for_read (mode=132, page_id=..., zip_size=0,
                unzip=<optimized out>) at /usr/src/debug/MariaDB-/src_0/storage/innobase/buf/buf0rea.cc:121
            ...
            #8  0x000055b46841ec06 in btr_cur_t::search_leaf (this=this@entry=0x7fba7ec05810,
                tuple=tuple@entry=0x7fba7ec05750, mode=mode@entry=PAGE_CUR_GE,
                latch_mode=latch_mode@entry=BTR_SEARCH_LEAF, mtr=mtr@entry=0x7fba7ec05c20)
                at /usr/src/debug/MariaDB-/src_0/storage/innobase/btr/btr0cur.cc:1219
            ...
            #11 dict_load_table_one (name=..., ignore_err=<optimized out>, fk_tables=...)
                at /usr/src/debug/MariaDB-/src_0/storage/innobase/dict/dict0load.cc:2368
            #12 0x000055b468462943 in dict_sys_t::load_table (this=this@entry=0x55b468f93740 <dict_sys>, name=...,
                ignore=ignore@entry=DICT_ERR_IGNORE_FK_NOKEY)
                at /usr/src/debug/MariaDB-/src_0/storage/innobase/dict/dict0load.cc:2570
            anonymous403
                dict_locked=<optimized out>, ignore_err=DICT_ERR_IGNORE_FK_NOKEY)
                at /usr/src/debug/MariaDB-/src_0/storage/innobase/dict/dict0dict.cc:1060
            #14 0x000055b4682fd741 in ha_innobase::open(char const*, int, unsigned int) ()
                at /usr/src/debug/MariaDB-/src_0/storage/innobase/handler/ha_innodb.cc:6191
            

            8. 1 Page cleaner background thread is in "furious flush mode". This is blocking all threads in [6] and [7] by not generating any free page. The thread itself is stuck in synchronous flush as there are blocked mtrs holding page latches.

            Thread 8 (Thread 0x7fba84e9e700 (LWP 3086274)):
            #0  0x000055b4684412f8 in buf_do_flush_list_batch (lsn=76535143611932, max_n=2000)
                at /usr/src/debug/MariaDB-/src_0/storage/innobase/buf/buf0flu.cc:1586
            #1  buf_flush_list_holding_mutex (lsn=76535143611932, max_n=2000)
                at /usr/src/debug/MariaDB-/src_0/storage/innobase/buf/buf0flu.cc:1586
            #2  buf_flush_list_holding_mutex (max_n=2000, lsn=76535143611932)
                at /usr/src/debug/MariaDB-/src_0/storage/innobase/buf/buf0flu.cc:1567
            #3  0x000055b467cc3331 in buf_flush_list (max_n=2000, lsn=lsn@entry=76535143611932)
                at /usr/src/debug/MariaDB-/src_0/storage/innobase/buf/buf0flu.cc:1612
            #4  0x000055b467cc372d in buf_flush_sync_for_checkpoint (lsn=76535143611932)
                at /usr/src/debug/MariaDB-/src_0/storage/innobase/buf/buf0flu.cc:2023
            #5  buf_flush_page_cleaner () at /usr/src/debug/MariaDB-/src_0/storage/innobase/buf/buf0flu.cc:2294
            

            It is clear from the analysis that the symptom is exactly same as MDEV-34265 fixed in 10.6.19 and there are not any other visible symptoms. Please upgrade to 10.6.19 and confirm.

            debarun Debarun Banerjee added a comment - I analysed the latest stack trace in output.log.anon.gz in details. There are total 784 threads. 1. 9 Background threads waiting for appropriate trigger [all expected] mysqld_main, handle_manager, timer_handler, signal_hand, ma_checkpoint_background, tpool::aio_linux::getevent_thread_routine, tpool::thread_pool_generic::wait_for_tasks, tpool::thread_pool_generic::wait_for_tasks, trx_purge_wait_for_workers_to_complete 2. 1 Server monitor task raising fatal error after detecting too long wait on dictionary system lock Thread 1 (Thread 0x7fb949ffb700 (LWP 2630941)): #5 0x00007fbb34a58e65 in __GI_abort () at abort.c:79 #6 0x000055b467cbb245 in ib::fatal::~fatal (this=<optimized out>, __in_chrg=<optimized out>) at /usr/src/debug/MariaDB-/src_0/storage/innobase/ut/ut0ut.cc:476 #7 0x000055b467cb3cdc in srv_monitor_task () 3. 627 user DML/QUERY threads waiting on dict_sys_t::freeze : S lock dictionary system #4 ssux_lock_impl<false>::rd_wait (this=this@entry=0x55b468f93788 <dict_sys+72>) at /usr/src/debug/MariaDB-/src_0/storage/innobase/sync/srw_lock.cc:397 ... #8 dict_sys_t::freeze (line=line@entry=1039, file=0x55b4687ecab0 "/builddir/build/BUILD/cl-MariaDB106-10.6.18/mariadb-10.6.18/storage/innobase/dict/dict0dict.cc", this=0x55b468f93740 <dict_sys>) at /usr/src/debug/MariaDB-/src_0/storage/innobase/dict/dict0dict.cc:1006 #9 0x000055b468455342 in dict_table_open_on_name ( table_name=0x7fb9ea187b20 "anonymous1", dict_locked=<optimized out>, ignore_err=DICT_ERR_IGNORE_FK_NOKEY) 4. 13 DML/QUERY threads waiting dict_sys_t::lock : X lock for dictionary system #6 dict_sys_t::lock_wait (this=0x55b468f93740 <dict_sys>, file=<optimized out>, line=225) at /usr/src/debug/MariaDB-/src_0/storage/innobase/dict/dict0dict.cc:995 #7 0x000055b4684551f0 in dict_sys_t::lock ( file=0x55b4687ecab0 "/builddir/build/BUILD/cl-MariaDB106-10.6.18/mariadb-10.6.18/storage/innobase/dict/dict0dict.cc", line=225, this=0x55b468f93740 <dict_sys>) at /usr/src/debug/MariaDB-/src_0/storage/innobase/include/dict0dict.h:1525 #8 dict_table_close (table=0x7fb62c0bfb60) 5. 48 user DML/QUERY threads waiting on S/X lock for buffer pool pages #1 0x000055b4683de748 in srw_mutex_impl<true>::wait (lk=<optimized out>, this=0x7fba88130898) at /usr/src/debug/MariaDB-/src_0/storage/innobase/sync/srw_lock.cc:243 ... #5 sux_lock<ssux_lock_impl<true> >::x_lock (for_io=false, this=0x7fba88130898) at /usr/src/debug/MariaDB-/src_0/storage/innobase/include/sux_lock.h:395 #6 buf_page_get_low (page_id=..., zip_size=<optimized out>, rw_latch=<optimized out>, guess=<optimized out>, mode=<optimized out>, mtr=<optimized out>, err=<optimized out>, allow_ibuf_merge=<optimized out>) 6. 84 DML/QUERY threads waiting to free Page in the middle of a mini-transaction. Blocking threads in [5] waiting for page lock. #2 __pthread_cond_timedwait (cond=cond@entry=0x55b468f926b0 <buf_pool+17328>, mutex=mutex@entry=0x55b468f8e300 <buf_pool>, abstime=abstime@entry=0x7fba7f2f15e0) at pthread_cond_wait.c:667 #3 0x000055b468445e02 in buf_LRU_get_free_block (have_mutex=have_mutex@entry=false) at /usr/src/debug/MariaDB-/src_0/storage/innobase/buf/buf0lru.cc:452 #4 0x000055b467d1c937 in buf_page_init_for_read (mode=132, page_id=..., zip_size=0, unzip=<optimized out>) at /usr/src/debug/MariaDB-/src_0/storage/innobase/buf/buf0rea.cc:121 #5 0x000055b46844629f in buf_read_page_low (space=0x7fba70060c28, sync=true, mode=132, page_id=..., zip_size=0, unzip=<optimized out>) 7. 1 DML/QUERY thread is holding X dictionary system lock and waiting for free page. This one blocks all threads in [3] and [4] asking for dictionary system locks eventually causing the fatal error. Thread 296 (Thread 0x7fba7ec09700 (LWP 2781974)): #2 __pthread_cond_timedwait (cond=cond@entry=0x55b468f926b0 <buf_pool+17328>, mutex=mutex@entry=0x55b468f8e300 <buf_pool>, abstime=abstime@entry=0x7fba7ec047d0) at pthread_cond_wait.c:667 #3 0x000055b468445e02 in buf_LRU_get_free_block (have_mutex=have_mutex@entry=false) at /usr/src/debug/MariaDB-/src_0/storage/innobase/buf/buf0lru.cc:452 #4 0x000055b467d1c937 in buf_page_init_for_read (mode=132, page_id=..., zip_size=0, unzip=<optimized out>) at /usr/src/debug/MariaDB-/src_0/storage/innobase/buf/buf0rea.cc:121 ... #8 0x000055b46841ec06 in btr_cur_t::search_leaf (this=this@entry=0x7fba7ec05810, tuple=tuple@entry=0x7fba7ec05750, mode=mode@entry=PAGE_CUR_GE, latch_mode=latch_mode@entry=BTR_SEARCH_LEAF, mtr=mtr@entry=0x7fba7ec05c20) at /usr/src/debug/MariaDB-/src_0/storage/innobase/btr/btr0cur.cc:1219 ... #11 dict_load_table_one (name=..., ignore_err=<optimized out>, fk_tables=...) at /usr/src/debug/MariaDB-/src_0/storage/innobase/dict/dict0load.cc:2368 #12 0x000055b468462943 in dict_sys_t::load_table (this=this@entry=0x55b468f93740 <dict_sys>, name=..., ignore=ignore@entry=DICT_ERR_IGNORE_FK_NOKEY) at /usr/src/debug/MariaDB-/src_0/storage/innobase/dict/dict0load.cc:2570 anonymous403 dict_locked=<optimized out>, ignore_err=DICT_ERR_IGNORE_FK_NOKEY) at /usr/src/debug/MariaDB-/src_0/storage/innobase/dict/dict0dict.cc:1060 #14 0x000055b4682fd741 in ha_innobase::open(char const*, int, unsigned int) () at /usr/src/debug/MariaDB-/src_0/storage/innobase/handler/ha_innodb.cc:6191 8. 1 Page cleaner background thread is in "furious flush mode". This is blocking all threads in [6] and [7] by not generating any free page. The thread itself is stuck in synchronous flush as there are blocked mtrs holding page latches. Thread 8 (Thread 0x7fba84e9e700 (LWP 3086274)): #0 0x000055b4684412f8 in buf_do_flush_list_batch (lsn=76535143611932, max_n=2000) at /usr/src/debug/MariaDB-/src_0/storage/innobase/buf/buf0flu.cc:1586 #1 buf_flush_list_holding_mutex (lsn=76535143611932, max_n=2000) at /usr/src/debug/MariaDB-/src_0/storage/innobase/buf/buf0flu.cc:1586 #2 buf_flush_list_holding_mutex (max_n=2000, lsn=76535143611932) at /usr/src/debug/MariaDB-/src_0/storage/innobase/buf/buf0flu.cc:1567 #3 0x000055b467cc3331 in buf_flush_list (max_n=2000, lsn=lsn@entry=76535143611932) at /usr/src/debug/MariaDB-/src_0/storage/innobase/buf/buf0flu.cc:1612 #4 0x000055b467cc372d in buf_flush_sync_for_checkpoint (lsn=76535143611932) at /usr/src/debug/MariaDB-/src_0/storage/innobase/buf/buf0flu.cc:2023 #5 buf_flush_page_cleaner () at /usr/src/debug/MariaDB-/src_0/storage/innobase/buf/buf0flu.cc:2294 It is clear from the analysis that the symptom is exactly same as MDEV-34265 fixed in 10.6.19 and there are not any other visible symptoms. Please upgrade to 10.6.19 and confirm.

            Hi isodude
            Can you please confirm that the issue is resolved after upgrading to 10.6.19 ? Thanks.

            debarun Debarun Banerjee added a comment - Hi isodude Can you please confirm that the issue is resolved after upgrading to 10.6.19 ? Thanks.

            The 10.6.19 release is expected to occur in a couple of months. Meanwhile, it would be great if you can test if this problem goes away with a 10.6 development snapshot. You can find some installation packages at https://ci.mariadb.org/47867/ for the currently latest 10.6 build.

            marko Marko Mäkelä added a comment - The 10.6.19 release is expected to occur in a couple of months. Meanwhile, it would be great if you can test if this problem goes away with a 10.6 development snapshot. You can find some installation packages at https://ci.mariadb.org/47867/ for the currently latest 10.6 build.

            I don't have a way of testing it except leaving the production database in a bad state and hope that it crashes under pressure.

            I can make sure to revisit this if I can reproduce the same error again.

            It's a production environment where I don't want a development release installed. So maybe revisit when I have 10.6.19 installed?

            I'm quite satisfied knowing that the issue has been properly analyzed and most likely fixed.

            isodude Josef Johansson added a comment - I don't have a way of testing it except leaving the production database in a bad state and hope that it crashes under pressure. I can make sure to revisit this if I can reproduce the same error again. It's a production environment where I don't want a development release installed. So maybe revisit when I have 10.6.19 installed? I'm quite satisfied knowing that the issue has been properly analyzed and most likely fixed.

            isodude, I'll close this issue, as it's supposedly was fixed in 10.6.19 and 10.6.19 is already released.

            If you'll see it again, please add a comment and we'll reopen this.

            serg Sergei Golubchik added a comment - isodude , I'll close this issue, as it's supposedly was fixed in 10.6.19 and 10.6.19 is already released. If you'll see it again, please add a comment and we'll reopen this.

            People

              debarun Debarun Banerjee
              isodude Josef Johansson
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.