Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-16785

MariaDB server is running in 100% on one cpu

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 5.5.57
    • 10.5
    • None
    • None
    • Windows Server 2012 R2

    Description

      MariaDB server is running regularly in an endless loop in which one cpu is consuming 100%.
      An output of "show full processlist" showed no progress anymore.

      An analysis of the created minidump shows that the thread with id 3574 consumed most of the cpu.
      0:000> !runaway
      User Mode Time
      Thread Time
      15:3574 0 days 5:59:41.546
      18:43a4 0 days 0:02:40.625
      23:290c 0 days 0:01:22.109
      ...

      Attachments

        1. h2318784.err
          592 kB
          Michael Graf
        2. my.ini
          1 kB
          Michael Graf
        3. show_full_processlist.csv
          179 kB
          Michael Graf
        4. show_full_processlist.txt
          31 kB
          Michael Graf
        5. threads
          484 kB
          Elena Stepanova

        Issue Links

          Activity

            elenst Elena Stepanova added a comment - - edited

            Threads from the minidump: threads

            However, the contents doesn't seem to correspond the described situation. According to the coredump, there is a lot of query-processing activity (numerous updates, deletes, inserts, selects), so the processlist surely shouldn't be empty.

            Stack trace for the alleged culprit 3574 is not very impressive, either:

            Callstack for Thread 16 (Thread Id: 13684 (0x3574)):
             Index  Function
            --------------------------------------------------------------------------------
             1      mysqld.exe!sync_array_find_thread(sync_array_struct * arr=0x000000d78d9dccb0, unsigned long thread=4294967295)
             2      mysqld.exe!sync_array_print_long_waits(unsigned long * waiter=0x000000d82ab2fa48, const void * * sema=0x000000d82ab2fa58)
             3      mysqld.exe!srv_error_monitor_thread(void * arg=0x00007ffe4e7116a0)
             4      [External Code]
            

            wlad, would you like to take a look?
            Please feel free to re-assign or unassign if needed.

            elenst Elena Stepanova added a comment - - edited Threads from the minidump: threads However, the contents doesn't seem to correspond the described situation. According to the coredump, there is a lot of query-processing activity (numerous updates, deletes, inserts, selects), so the processlist surely shouldn't be empty. Stack trace for the alleged culprit 3574 is not very impressive, either: Callstack for Thread 16 (Thread Id: 13684 (0x3574)): Index Function -------------------------------------------------------------------------------- 1 mysqld.exe!sync_array_find_thread(sync_array_struct * arr=0x000000d78d9dccb0, unsigned long thread=4294967295) 2 mysqld.exe!sync_array_print_long_waits(unsigned long * waiter=0x000000d82ab2fa48, const void * * sema=0x000000d82ab2fa58) 3 mysqld.exe!srv_error_monitor_thread(void * arg=0x00007ffe4e7116a0) 4 [External Code] wlad , would you like to take a look? Please feel free to re-assign or unassign if needed.
            wlad Vladislav Vaintroub added a comment - - edited

            reassigning to Marko, looks like Innodb deadlock.
            All threads are foreground. A potentially interesting detail is existence of a thread that hangs in ha_close_connection, from the experience. connection shutdown might be have some exciting locking issues

            wlad Vladislav Vaintroub added a comment - - edited reassigning to Marko, looks like Innodb deadlock. All threads are foreground. A potentially interesting detail is existence of a thread that hangs in ha_close_connection, from the experience. connection shutdown might be have some exciting locking issues

            I see a possible bug in the InnoDB sync_array in threads:

            Callstack for Thread 19 (Thread Id: 17316 (0x43a4)):
             Index  Function
            --------------------------------------------------------------------------------
             1      [External Code]
             2      mysqld.exe!os_mutex_enter(os_mutex_struct * mutex=0x000000d78d9269c0)
             3      mysqld.exe!sync_array_reserve_cell(sync_array_struct * arr=0x000000d78d9dccb0, void * object=0x000000d7a43e5400, unsigned __int64 type=352, const char * file=0x00007ff77a2675d0, unsigned __int64 line=1357, unsigned __int64 * index=0x000000d82ae2eed0)
             4      mysqld.exe!rw_lock_s_lock_spin(rw_lock_struct * lock=0x000000d7a43e5400, unsigned __int64 pass=2, const char * file_name=0x00007ff77a2675d0, unsigned __int64 line=1357)
             5      mysqld.exe!rw_lock_s_lock_func(rw_lock_struct * lock=0x000000d7a43e5400, unsigned __int64 pass=2, const char * file_name=0x00007ff77a2675d0, unsigned __int64 line=1357)
             6      mysqld.exe!pfs_rw_lock_s_lock_func(rw_lock_struct * lock=0x000000d7a43e5400, unsigned __int64 pass=2, const char * file_name=0x00007ff77a2675d0, unsigned __int64 line=1357)
             7      mysqld.exe!buf_flush_page(buf_pool_struct * buf_pool=0x000000d79e5457d8, buf_page_struct * bpage=0x000000d7a43e5310, buf_flush flush_type)
             8      mysqld.exe!buf_flush_try_neighbors(unsigned __int64 space=0, unsigned __int64 offset=14058, buf_flush flush_type=BUF_FLUSH_LIST, unsigned __int64 n_flushed=124, unsigned __int64 n_to_flush=200)
             9      mysqld.exe!buf_flush_page_and_try_neighbors(buf_page_struct * bpage, buf_flush flush_type=BUF_FLUSH_LIST, unsigned __int64 n_to_flush=200, unsigned __int64 * count=0x000000d82ae2f0d0)
             10     mysqld.exe!buf_flush_flush_list_batch(buf_pool_struct * buf_pool=0x000000d79e5457d8, unsigned __int64 min_n=200, unsigned __int64 lsn_limit=18446744073709551615)
             11     mysqld.exe!buf_flush_batch(buf_pool_struct * buf_pool=0x000000d79e5457d8, buf_flush flush_type, unsigned __int64 min_n=200, unsigned __int64 lsn_limit)
             12     mysqld.exe!buf_flush_list(unsigned __int64 min_n=200, unsigned __int64 lsn_limit=18446744073709551615)
             13     mysqld.exe!srv_master_thread(void * arg=0x00007ffe4e7116a0)
             14     [External Code]
             
            Callstack for Thread 20 (Thread Id: 17352 (0x43c8)):
             Index  Function
            --------------------------------------------------------------------------------
             1      [External Code]
             2      mysqld.exe!os_cond_wait(_RTL_CONDITION_VARIABLE * cond, _RTL_CRITICAL_SECTION * mutex)
             3      mysqld.exe!os_event_wait_low(os_event_struct * event=0x000000d825aadb30, __int64 reset_sig_count=9)
             4      mysqld.exe!sync_array_wait_event(sync_array_struct * arr=0x000000d78d9dccb0, unsigned __int64 index=4)
             5      mysqld.exe!rw_lock_s_lock_spin(rw_lock_struct * lock=0x000000d7a0e3b9b8, unsigned __int64 pass=0, const char * file_name=0x00007ff77a261d28, unsigned __int64 line=671)
             6      mysqld.exe!rw_lock_s_lock_func(rw_lock_struct * lock=0x000000d7a0e3b9b8, unsigned __int64 pass=0, const char * file_name=0x00007ff77a261d28, unsigned __int64 line=671)
             7      mysqld.exe!pfs_rw_lock_s_lock_func(rw_lock_struct * lock=0x000000d7a0e3b9b8, unsigned __int64 pass=0, const char * file_name=0x00007ff77a261d28, unsigned __int64 line=671)
             8      mysqld.exe!mtr_s_lock_func(rw_lock_struct * lock=0x000000d7a0e3b9b8, const char * file, unsigned __int64 line, mtr_struct * mtr=0x000000d82af2ed70)
             9      mysqld.exe!btr_cur_search_to_nth_level(dict_index_struct * index=0x000000d7a0e3b928, unsigned __int64 level=0, const dtuple_struct * tuple=0x000000d79fc75918, unsigned __int64 mode=4, unsigned __int64 latch_mode=1, btr_cur_struct * cursor=0x000000d79fc73668, unsigned __int64 has_search_latch=0, const char * file=0x00007ff77a269b38, unsigned __int64 line=716, mtr_struct * mtr=0x000000d82af2ed70)
            

            These threads are operating on unrelated buf_block_t::lock, but there appears to be a possible race condition in the sync_array element 0x000000d78d9dccb0 between sync_array_reserve_cell() and sync_array_wait_event(). It appears that we could be attempting to allocate a slot that is still in use.

            The sync array is something that I have never worked on. I think that we should try to eliminate it, but I do not think that we can do it in a GA version.

            marko Marko Mäkelä added a comment - I see a possible bug in the InnoDB sync_array in threads : Callstack for Thread 19 (Thread Id: 17316 (0x43a4)): Index Function -------------------------------------------------------------------------------- 1 [External Code] 2 mysqld.exe!os_mutex_enter(os_mutex_struct * mutex=0x000000d78d9269c0) 3 mysqld.exe!sync_array_reserve_cell(sync_array_struct * arr=0x000000d78d9dccb0, void * object=0x000000d7a43e5400, unsigned __int64 type=352, const char * file=0x00007ff77a2675d0, unsigned __int64 line=1357, unsigned __int64 * index=0x000000d82ae2eed0) 4 mysqld.exe!rw_lock_s_lock_spin(rw_lock_struct * lock=0x000000d7a43e5400, unsigned __int64 pass=2, const char * file_name=0x00007ff77a2675d0, unsigned __int64 line=1357) 5 mysqld.exe!rw_lock_s_lock_func(rw_lock_struct * lock=0x000000d7a43e5400, unsigned __int64 pass=2, const char * file_name=0x00007ff77a2675d0, unsigned __int64 line=1357) 6 mysqld.exe!pfs_rw_lock_s_lock_func(rw_lock_struct * lock=0x000000d7a43e5400, unsigned __int64 pass=2, const char * file_name=0x00007ff77a2675d0, unsigned __int64 line=1357) 7 mysqld.exe!buf_flush_page(buf_pool_struct * buf_pool=0x000000d79e5457d8, buf_page_struct * bpage=0x000000d7a43e5310, buf_flush flush_type) 8 mysqld.exe!buf_flush_try_neighbors(unsigned __int64 space=0, unsigned __int64 offset=14058, buf_flush flush_type=BUF_FLUSH_LIST, unsigned __int64 n_flushed=124, unsigned __int64 n_to_flush=200) 9 mysqld.exe!buf_flush_page_and_try_neighbors(buf_page_struct * bpage, buf_flush flush_type=BUF_FLUSH_LIST, unsigned __int64 n_to_flush=200, unsigned __int64 * count=0x000000d82ae2f0d0) 10 mysqld.exe!buf_flush_flush_list_batch(buf_pool_struct * buf_pool=0x000000d79e5457d8, unsigned __int64 min_n=200, unsigned __int64 lsn_limit=18446744073709551615) 11 mysqld.exe!buf_flush_batch(buf_pool_struct * buf_pool=0x000000d79e5457d8, buf_flush flush_type, unsigned __int64 min_n=200, unsigned __int64 lsn_limit) 12 mysqld.exe!buf_flush_list(unsigned __int64 min_n=200, unsigned __int64 lsn_limit=18446744073709551615) 13 mysqld.exe!srv_master_thread(void * arg=0x00007ffe4e7116a0) 14 [External Code]   Callstack for Thread 20 (Thread Id: 17352 (0x43c8)): Index Function -------------------------------------------------------------------------------- 1 [External Code] 2 mysqld.exe!os_cond_wait(_RTL_CONDITION_VARIABLE * cond, _RTL_CRITICAL_SECTION * mutex) 3 mysqld.exe!os_event_wait_low(os_event_struct * event=0x000000d825aadb30, __int64 reset_sig_count=9) 4 mysqld.exe!sync_array_wait_event(sync_array_struct * arr=0x000000d78d9dccb0, unsigned __int64 index=4) 5 mysqld.exe!rw_lock_s_lock_spin(rw_lock_struct * lock=0x000000d7a0e3b9b8, unsigned __int64 pass=0, const char * file_name=0x00007ff77a261d28, unsigned __int64 line=671) 6 mysqld.exe!rw_lock_s_lock_func(rw_lock_struct * lock=0x000000d7a0e3b9b8, unsigned __int64 pass=0, const char * file_name=0x00007ff77a261d28, unsigned __int64 line=671) 7 mysqld.exe!pfs_rw_lock_s_lock_func(rw_lock_struct * lock=0x000000d7a0e3b9b8, unsigned __int64 pass=0, const char * file_name=0x00007ff77a261d28, unsigned __int64 line=671) 8 mysqld.exe!mtr_s_lock_func(rw_lock_struct * lock=0x000000d7a0e3b9b8, const char * file, unsigned __int64 line, mtr_struct * mtr=0x000000d82af2ed70) 9 mysqld.exe!btr_cur_search_to_nth_level(dict_index_struct * index=0x000000d7a0e3b928, unsigned __int64 level=0, const dtuple_struct * tuple=0x000000d79fc75918, unsigned __int64 mode=4, unsigned __int64 latch_mode=1, btr_cur_struct * cursor=0x000000d79fc73668, unsigned __int64 has_search_latch=0, const char * file=0x00007ff77a269b38, unsigned __int64 line=716, mtr_struct * mtr=0x000000d82af2ed70) These threads are operating on unrelated buf_block_t::lock , but there appears to be a possible race condition in the sync_array element 0x000000d78d9dccb0 between sync_array_reserve_cell() and sync_array_wait_event() . It appears that we could be attempting to allocate a slot that is still in use. The sync array is something that I have never worked on. I think that we should try to eliminate it, but I do not think that we can do it in a GA version.

            There is quite a bit of nonzero-overhead abstraction (or obfuscation) going on. Apparently, there is only one sync_array_t object in existence: sync_primary_wait_array, and sync_primary_wait_array->protection is always SYNC_ARRAY_OS_MUTEX. It looks like sync_primary_wait_array->os_mutex is the contention point that adds insult to injury when some contention occurs inside InnoDB.

            wlad points out that srv_error_monitor_thread() is continuously holding this mutex while gathering information about InnoDB mutex or rw-lock contention.

            Unless InnoDB is actually hanging, I am afraid that there is nothing that we can do about this in a GA version.

            marko Marko Mäkelä added a comment - There is quite a bit of nonzero-overhead abstraction (or obfuscation) going on. Apparently, there is only one sync_array_t object in existence: sync_primary_wait_array , and sync_primary_wait_array->protection is always SYNC_ARRAY_OS_MUTEX . It looks like sync_primary_wait_array->os_mutex is the contention point that adds insult to injury when some contention occurs inside InnoDB. wlad points out that srv_error_monitor_thread() is continuously holding this mutex while gathering information about InnoDB mutex or rw-lock contention. Unless InnoDB is actually hanging, I am afraid that there is nothing that we can do about this in a GA version.

            The code could be refactored as part of MDEV-16264, which is replacing many InnoDB threads.

            marko Marko Mäkelä added a comment - The code could be refactored as part of MDEV-16264 , which is replacing many InnoDB threads.

            People

              Unassigned Unassigned
              migra Michael Graf
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.