Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32909

Hang not caught by semaphore wait threshold

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Won't Fix
    • 10.11.6
    • N/A
    • Debian 11 Bullseye; MariaDB from the Maria repositories.

    Description

      Hello,

      Last night right after midnight one of my servers seemed to hang. All (or at least a great many?) queries would hang forever in the Execute phase. Of course this hung up the entire Galera cluster as well.

      I was awakened by an alert around 12:40am. My innodb_fatal_semaphore_wait_threshold is set to 32 seconds, so this hang was not caught by that watchdog.

      At 12:49 I was able to send MariaDB a signal which caused it to crash and dump core. So I do have a stack trace for this situation which I don't want to post publicly, but which is available upon request. I also have SHOW PROCESSLIST logs for much of the time in case that helps.

      If a MariaDB expert could take a look at the situation I would appreciate it! Thanks.

      Attachments

        Issue Links

          Activity

            Just today, while working on additional instrumentation for MDEV-32899, I realized that the dict_sys.latch watchdog does not cover waits for a shared latch, only waits for an exclusive latch.

            You have already filed MDEV-32371, whose fix I hope to push tomorrow, once some additional testing has been completed. The only reliable way to diagnose any hang is to produce the stack traces of all threads at the time of the hang, like you did in that bug. Is this bug something different?

            marko Marko Mäkelä added a comment - Just today, while working on additional instrumentation for MDEV-32899 , I realized that the dict_sys.latch watchdog does not cover waits for a shared latch, only waits for an exclusive latch. You have already filed MDEV-32371 , whose fix I hope to push tomorrow, once some additional testing has been completed. The only reliable way to diagnose any hang is to produce the stack traces of all threads at the time of the hang, like you did in that bug. Is this bug something different?

            Marko, the only reason I thought this was different was becuase it didn't get caught by the watchdog. I do have the full stack trace which I'll email you. It sounds like this is likely to be the same thing, just with a shared latch rather than an exclusive latch.

            If this is the same thing, then maybe this bug should be converted to making the dict_sys.latch watchdog cover both kinds of latches?

            Thank you!

            xan@biblionix.com Xan Charbonnet added a comment - Marko, the only reason I thought this was different was becuase it didn't get caught by the watchdog. I do have the full stack trace which I'll email you. It sounds like this is likely to be the same thing, just with a shared latch rather than an exclusive latch. If this is the same thing, then maybe this bug should be converted to making the dict_sys.latch watchdog cover both kinds of latches? Thank you!

            xan@biblionix.com, thank you for the stack traces that you provided via email. I can confirm that this hang is a duplicate of MDEV-32371:

            Thread 99 (Thread 0x7f72177a0700 (LWP 1099409)):
            #6  sux_lock<ssux_lock_impl<true> >::s_lock (this=0x55cd115b7b30) at ./storage/innobase/include/sux_lock.h:362
            No locals.
            #7  buf_page_get_zip (page_id={m_id = 452504869516544}, zip_size=zip_size@entry=8192) at ./storage/innobase/buf/buf0buf.cc:2228
                    discard_attempted = <optimized out>
                    chain = @0x55cc92982928: {first = 0x55cd039e9cc8}
                    hash_lock = @0x55cc92982900: {lk = {writer = {lock = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 2147483649}, <No data fields>}, static HOLDER = 2147483648}, readers = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 2147483649}, <No data fields>}, static WRITER = 2147483648}}
                    bpage = 0x55cd115b7b18
            #8  0x000055cc8acfbd94 in btr_copy_zblob_prefix (buf=<optimized out>, len=<optimized out>, zip_size=8192, id={m_id = 452504869516544}, offset=12) at ./storage/innobase/btr/btr0cur.cc:6739
            … while executing a SELECT that is accessing an off-page column
            

            The same buf_pool.page_hash latch is being waited for when a corrupted page is being evicted from the buffer pool on read completion:

            Thread 133 (Thread 0x7f62fc8f5700 (LWP 1674819)):
            #0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
            No locals.
            #1  0x000055cc8acad140 in ssux_lock_impl<true>::wait (lk=<optimized out>, this=0x55cc92982900) at ./storage/innobase/sync/srw_lock.cc:249
            No locals.
            #2  ssux_lock_impl<true>::wr_wait (this=this@entry=0x55cc92982900, lk=<optimized out>) at ./storage/innobase/sync/srw_lock.cc:383
            No locals.
            #3  0x000055cc8a51d6de in ssux_lock_impl<true>::wr_lock (this=0x55cc92982900) at ./storage/innobase/include/srw_lock.h:265
                    lk = <optimized out>
            #4  page_hash_latch::lock (this=0x55cc92982900) at ./storage/innobase/include/buf0types.h:214
            No locals.
            #5  buf_pool_t::corrupted_evict (this=0x55cc8b74cb40 <buf_pool>, bpage=0x55ccdb6a1218, state=2147483648) at ./storage/innobase/buf/buf0lru.cc:1211
                    id = {m_id = <optimized out>}
                    chain = @0x55cc92982918: {first = 0x55ccfcfddb18}
                    hash_lock = @0x55cc92982900: {lk = {writer = {lock = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 2147483649}, <No data fields>}, static HOLDER = 2147483648}, readers = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 2147483649}, <No data fields>}, static WRITER = 2147483648}}
                    unfix = <optimized out>
                    s = <optimized out>
            #6  0x000055cc8ad08df5 in buf_page_t::read_complete (this=0x55ccdb6a1218, node=@0x55cc8f60da58: {space = 0x55cc8f6df778, name = 0x55cc8f601ea8 "XXX.ibd", handle = {m_file = 40, m_psi = 0x0}, on_ssd = 1, punch_hole = 0, atomic_write = 0, is_raw_disk = 0, deferred = 0, size = 201728, init_size = 0, max_size = 4294967295, being_extended = {m = {_M_base = {static _S_alignment = 1, _M_i = false}}}, chain = {prev = 0x0, next = 0x0}, block_size = 4096}) at ./storage/innobase/buf/buf0buf.cc:3622
                    database_corrupted = <optimized out>
                    expected_id = {m_id = 452504869516546}
                    read_frame = <optimized out>
                    err = DB_FAIL
                    recovery = <optimized out>
                    ibuf_may_exist = <optimized out>
            #7  0x000055cc8ad4c29e in IORequest::read_complete (this=this@entry=0x55cc91cdb8e0, io_error=<optimized out>) at ./storage/innobase/fil/fil0fil.cc:2879
                    err = <optimized out>
                    id = {m_id = 452504869516546}
            #8  0x000055cc8ac2ecd6 in read_io_callback (c=0x55cc91cdb868) at ./storage/innobase/os/os0file.cc:3446
                    cb = 0x55cc91cdb868
                    request = @0x55cc91cdb8e0: {bpage = 0x55ccdb6a1218, slot = 0x0, node = 0x55cc8f60da58, type = IORequest::READ_ASYNC}
            

            The watchdog on dict_sys.latch is not catching this hang, because we do not have any threads waiting for it, either in shared or exclusive mode. For a shared-mode wait, if there was a conflicting request (a thread is already holding an exclusive lock or waiting for one), I think that we would escalate to exclusive locking during the wait. This should employ the watchdog.

            The old watchdog, which covered most of InnoDB, was removed in MDEV-21452. The main motivation was to reduce the memory footprint, to simplify the data structures and to improve performance. There were also some bugs in that subsystem, such as MDEV-11972. I thought that implementing a watchdog on dict_sys.latch only would be sufficient. Of course, hangs should not exist in the first place. The sync_array only was useful for diagnosing some rather simple hangs that do not involve more than 2 mutexes. Here we have 3. In any case, to diagnose and fix nontrivial hangs like MDEV-32371 or MDEV-29835 we need stack traces.

            marko Marko Mäkelä added a comment - xan@biblionix.com , thank you for the stack traces that you provided via email. I can confirm that this hang is a duplicate of MDEV-32371 : Thread 99 (Thread 0x7f72177a0700 (LWP 1099409)): … #6 sux_lock<ssux_lock_impl<true> >::s_lock (this=0x55cd115b7b30) at ./storage/innobase/include/sux_lock.h:362 No locals. #7 buf_page_get_zip (page_id={m_id = 452504869516544}, zip_size=zip_size@entry=8192) at ./storage/innobase/buf/buf0buf.cc:2228 discard_attempted = <optimized out> chain = @0x55cc92982928: {first = 0x55cd039e9cc8} hash_lock = @0x55cc92982900: {lk = {writer = {lock = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 2147483649}, <No data fields>}, static HOLDER = 2147483648}, readers = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 2147483649}, <No data fields>}, static WRITER = 2147483648}} bpage = 0x55cd115b7b18 #8 0x000055cc8acfbd94 in btr_copy_zblob_prefix (buf=<optimized out>, len=<optimized out>, zip_size=8192, id={m_id = 452504869516544}, offset=12) at ./storage/innobase/btr/btr0cur.cc:6739 … while executing a SELECT that is accessing an off-page column The same buf_pool.page_hash latch is being waited for when a corrupted page is being evicted from the buffer pool on read completion: Thread 133 (Thread 0x7f62fc8f5700 (LWP 1674819)): #0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 No locals. #1 0x000055cc8acad140 in ssux_lock_impl<true>::wait (lk=<optimized out>, this=0x55cc92982900) at ./storage/innobase/sync/srw_lock.cc:249 No locals. #2 ssux_lock_impl<true>::wr_wait (this=this@entry=0x55cc92982900, lk=<optimized out>) at ./storage/innobase/sync/srw_lock.cc:383 No locals. #3 0x000055cc8a51d6de in ssux_lock_impl<true>::wr_lock (this=0x55cc92982900) at ./storage/innobase/include/srw_lock.h:265 lk = <optimized out> #4 page_hash_latch::lock (this=0x55cc92982900) at ./storage/innobase/include/buf0types.h:214 No locals. #5 buf_pool_t::corrupted_evict (this=0x55cc8b74cb40 <buf_pool>, bpage=0x55ccdb6a1218, state=2147483648) at ./storage/innobase/buf/buf0lru.cc:1211 id = {m_id = <optimized out>} chain = @0x55cc92982918: {first = 0x55ccfcfddb18} hash_lock = @0x55cc92982900: {lk = {writer = {lock = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 2147483649}, <No data fields>}, static HOLDER = 2147483648}, readers = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 2147483649}, <No data fields>}, static WRITER = 2147483648}} unfix = <optimized out> s = <optimized out> #6 0x000055cc8ad08df5 in buf_page_t::read_complete (this=0x55ccdb6a1218, node=@0x55cc8f60da58: {space = 0x55cc8f6df778, name = 0x55cc8f601ea8 "XXX.ibd", handle = {m_file = 40, m_psi = 0x0}, on_ssd = 1, punch_hole = 0, atomic_write = 0, is_raw_disk = 0, deferred = 0, size = 201728, init_size = 0, max_size = 4294967295, being_extended = {m = {_M_base = {static _S_alignment = 1, _M_i = false}}}, chain = {prev = 0x0, next = 0x0}, block_size = 4096}) at ./storage/innobase/buf/buf0buf.cc:3622 database_corrupted = <optimized out> expected_id = {m_id = 452504869516546} read_frame = <optimized out> err = DB_FAIL recovery = <optimized out> ibuf_may_exist = <optimized out> #7 0x000055cc8ad4c29e in IORequest::read_complete (this=this@entry=0x55cc91cdb8e0, io_error=<optimized out>) at ./storage/innobase/fil/fil0fil.cc:2879 err = <optimized out> id = {m_id = 452504869516546} #8 0x000055cc8ac2ecd6 in read_io_callback (c=0x55cc91cdb868) at ./storage/innobase/os/os0file.cc:3446 cb = 0x55cc91cdb868 request = @0x55cc91cdb8e0: {bpage = 0x55ccdb6a1218, slot = 0x0, node = 0x55cc8f60da58, type = IORequest::READ_ASYNC} The watchdog on dict_sys.latch is not catching this hang, because we do not have any threads waiting for it, either in shared or exclusive mode. For a shared-mode wait, if there was a conflicting request (a thread is already holding an exclusive lock or waiting for one), I think that we would escalate to exclusive locking during the wait. This should employ the watchdog. The old watchdog, which covered most of InnoDB, was removed in MDEV-21452 . The main motivation was to reduce the memory footprint, to simplify the data structures and to improve performance. There were also some bugs in that subsystem, such as MDEV-11972 . I thought that implementing a watchdog on dict_sys.latch only would be sufficient. Of course, hangs should not exist in the first place. The sync_array only was useful for diagnosing some rather simple hangs that do not involve more than 2 mutexes. Here we have 3. In any case, to diagnose and fix nontrivial hangs like MDEV-32371 or MDEV-29835 we need stack traces.

            I see that dict_sys.freeze() invokes srw_lock::rd_lock(), which in the futex-based implementation would temporarily escalate the lock in rd_wait(). These waits are not covered by the watchdog that is implemented in dict_sys_t::lock_wait(). It is not trivial to change this to use the watchdog, because we have multiple different implementations here: the futex-based one on Linux and various BSDs (MDEV-26476), SRWLOCK on Microsoft Windows, and something that is wrapped by rw_lock_t on other platforms. An attempt to implement the watchdog for dict_sys.freeze() could reduce performance in some workloads. Besides, for this particular hang it would not help at all, because none of the threads are waiting for dict_sys.latch.

            When it comes to the root cause of this corruption, I am a bit puzzled. Based on MDEV-32115 it seems that the default wsrep_sst_method=rsync should work reliably on 10.5 and later releases. Perhaps this would better be explained by MDEV-32174, which is a bug in ROLLBACK on ROW_FORMAT=COMPRESSED tables. But, I do not yet see how it could make individual pages look as corrupted when they are being read into the buffer pool.

            marko Marko Mäkelä added a comment - I see that dict_sys.freeze() invokes srw_lock::rd_lock() , which in the futex-based implementation would temporarily escalate the lock in rd_wait() . These waits are not covered by the watchdog that is implemented in dict_sys_t::lock_wait() . It is not trivial to change this to use the watchdog, because we have multiple different implementations here: the futex-based one on Linux and various BSDs ( MDEV-26476 ), SRWLOCK on Microsoft Windows, and something that is wrapped by rw_lock_t on other platforms. An attempt to implement the watchdog for dict_sys.freeze() could reduce performance in some workloads. Besides, for this particular hang it would not help at all, because none of the threads are waiting for dict_sys.latch . When it comes to the root cause of this corruption, I am a bit puzzled. Based on MDEV-32115 it seems that the default wsrep_sst_method=rsync should work reliably on 10.5 and later releases. Perhaps this would better be explained by MDEV-32174 , which is a bug in ROLLBACK on ROW_FORMAT=COMPRESSED tables. But, I do not yet see how it could make individual pages look as corrupted when they are being read into the buffer pool.

            People

              marko Marko Mäkelä
              xan@biblionix.com Xan Charbonnet
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.