[MDEV-32909] Hang not caught by semaphore wait threshold - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Won't Fix
Affects Version/s: 10.11.6
Fix Version/s: N/A
Component/s: Storage Engine - InnoDB
Labels:
- hang
Environment:
Debian 11 Bullseye; MariaDB from the Maria repositories.

Description

Hello,

Last night right after midnight one of my servers seemed to hang. All (or at least a great many?) queries would hang forever in the Execute phase. Of course this hung up the entire Galera cluster as well.

I was awakened by an alert around 12:40am. My innodb_fatal_semaphore_wait_threshold is set to 32 seconds, so this hang was not caught by that watchdog.

At 12:49 I was able to send MariaDB a signal which caused it to crash and dump core. So I do have a stack trace for this situation which I don't want to post publicly, but which is available upon request. I also have SHOW PROCESSLIST logs for much of the time in case that helps.

If a MariaDB expert could take a look at the situation I would appreciate it! Thanks.

Attachments

Issue Links

duplicates

MDEV-32371 Deadlock between buf_page_get_zip() and buf_pool_t::corrupted_evict() on InnoDB ROW_FORMAT=COMPRESSED table corruption

Closed

Activity

Ascending order - Click to sort in descending order

Marko Mäkelä added a comment - 2023-11-29 17:05

Just today, while working on additional instrumentation for ~~MDEV-32899~~, I realized that the dict_sys.latch watchdog does not cover waits for a shared latch, only waits for an exclusive latch.

You have already filed ~~MDEV-32371~~, whose fix I hope to push tomorrow, once some additional testing has been completed. The only reliable way to diagnose any hang is to produce the stack traces of all threads at the time of the hang, like you did in that bug. Is this bug something different?

Marko Mäkelä added a comment - 2023-11-29 17:05 Just today, while working on additional instrumentation for MDEV-32899 , I realized that the dict_sys.latch watchdog does not cover waits for a shared latch, only waits for an exclusive latch. You have already filed MDEV-32371 , whose fix I hope to push tomorrow, once some additional testing has been completed. The only reliable way to diagnose any hang is to produce the stack traces of all threads at the time of the hang, like you did in that bug. Is this bug something different?

Xan Charbonnet added a comment - 2023-11-29 17:09

Marko, the only reason I thought this was different was becuase it didn't get caught by the watchdog. I do have the full stack trace which I'll email you. It sounds like this is likely to be the same thing, just with a shared latch rather than an exclusive latch.

If this is the same thing, then maybe this bug should be converted to making the dict_sys.latch watchdog cover both kinds of latches?

Thank you!

Xan Charbonnet added a comment - 2023-11-29 17:09 Marko, the only reason I thought this was different was becuase it didn't get caught by the watchdog. I do have the full stack trace which I'll email you. It sounds like this is likely to be the same thing, just with a shared latch rather than an exclusive latch. If this is the same thing, then maybe this bug should be converted to making the dict_sys.latch watchdog cover both kinds of latches? Thank you!

Marko Mäkelä added a comment - 2023-11-30 07:18

xan@biblionix.com, thank you for the stack traces that you provided via email. I can confirm that this hang is a duplicate of ~~MDEV-32371~~:

Thread 99 (Thread 0x7f72177a0700 (LWP 1099409)):

…

#6  sux_lock<ssux_lock_impl<true> >::s_lock (this=0x55cd115b7b30) at ./storage/innobase/include/sux_lock.h:362

No locals.

#7  buf_page_get_zip (page_id={m_id = 452504869516544}, zip_size=zip_size@entry=8192) at ./storage/innobase/buf/buf0buf.cc:2228

        discard_attempted = <optimized out>

        chain = @0x55cc92982928: {first = 0x55cd039e9cc8}

        hash_lock = @0x55cc92982900: {lk = {writer = {lock = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 2147483649}, <No data fields>}, static HOLDER = 2147483648}, readers = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 2147483649}, <No data fields>}, static WRITER = 2147483648}}

        bpage = 0x55cd115b7b18

#8  0x000055cc8acfbd94 in btr_copy_zblob_prefix (buf=<optimized out>, len=<optimized out>, zip_size=8192, id={m_id = 452504869516544}, offset=12) at ./storage/innobase/btr/btr0cur.cc:6739

… while executing a SELECT that is accessing an off-page column

The same buf_pool.page_hash latch is being waited for when a corrupted page is being evicted from the buffer pool on read completion:

Thread 133 (Thread 0x7f62fc8f5700 (LWP 1674819)):

#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38

No locals.

#1  0x000055cc8acad140 in ssux_lock_impl<true>::wait (lk=<optimized out>, this=0x55cc92982900) at ./storage/innobase/sync/srw_lock.cc:249

No locals.

#2  ssux_lock_impl<true>::wr_wait (this=this@entry=0x55cc92982900, lk=<optimized out>) at ./storage/innobase/sync/srw_lock.cc:383

No locals.

#3  0x000055cc8a51d6de in ssux_lock_impl<true>::wr_lock (this=0x55cc92982900) at ./storage/innobase/include/srw_lock.h:265

        lk = <optimized out>

#4  page_hash_latch::lock (this=0x55cc92982900) at ./storage/innobase/include/buf0types.h:214

No locals.

#5  buf_pool_t::corrupted_evict (this=0x55cc8b74cb40 <buf_pool>, bpage=0x55ccdb6a1218, state=2147483648) at ./storage/innobase/buf/buf0lru.cc:1211

        id = {m_id = <optimized out>}

        chain = @0x55cc92982918: {first = 0x55ccfcfddb18}

        hash_lock = @0x55cc92982900: {lk = {writer = {lock = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 2147483649}, <No data fields>}, static HOLDER = 2147483648}, readers = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 2147483649}, <No data fields>}, static WRITER = 2147483648}}

        unfix = <optimized out>

        s = <optimized out>

#6  0x000055cc8ad08df5 in buf_page_t::read_complete (this=0x55ccdb6a1218, node=@0x55cc8f60da58: {space = 0x55cc8f6df778, name = 0x55cc8f601ea8 "XXX.ibd", handle = {m_file = 40, m_psi = 0x0}, on_ssd = 1, punch_hole = 0, atomic_write = 0, is_raw_disk = 0, deferred = 0, size = 201728, init_size = 0, max_size = 4294967295, being_extended = {m = {_M_base = {static _S_alignment = 1, _M_i = false}}}, chain = {prev = 0x0, next = 0x0}, block_size = 4096}) at ./storage/innobase/buf/buf0buf.cc:3622

        database_corrupted = <optimized out>

        expected_id = {m_id = 452504869516546}

        read_frame = <optimized out>

        err = DB_FAIL

        recovery = <optimized out>

        ibuf_may_exist = <optimized out>

#7  0x000055cc8ad4c29e in IORequest::read_complete (this=this@entry=0x55cc91cdb8e0, io_error=<optimized out>) at ./storage/innobase/fil/fil0fil.cc:2879

        err = <optimized out>

        id = {m_id = 452504869516546}

#8  0x000055cc8ac2ecd6 in read_io_callback (c=0x55cc91cdb868) at ./storage/innobase/os/os0file.cc:3446

        cb = 0x55cc91cdb868

        request = @0x55cc91cdb8e0: {bpage = 0x55ccdb6a1218, slot = 0x0, node = 0x55cc8f60da58, type = IORequest::READ_ASYNC}

The watchdog on dict_sys.latch is not catching this hang, because we do not have any threads waiting for it, either in shared or exclusive mode. For a shared-mode wait, if there was a conflicting request (a thread is already holding an exclusive lock or waiting for one), I think that we would escalate to exclusive locking during the wait. This should employ the watchdog.

The old watchdog, which covered most of InnoDB, was removed in ~~MDEV-21452~~. The main motivation was to reduce the memory footprint, to simplify the data structures and to improve performance. There were also some bugs in that subsystem, such as ~~MDEV-11972~~. I thought that implementing a watchdog on dict_sys.latch only would be sufficient. Of course, hangs should not exist in the first place. The sync_array only was useful for diagnosing some rather simple hangs that do not involve more than 2 mutexes. Here we have 3. In any case, to diagnose and fix nontrivial hangs like ~~MDEV-32371~~ or ~~MDEV-29835~~ we need stack traces.

Marko Mäkelä added a comment - 2023-11-30 07:18 xan@biblionix.com , thank you for the stack traces that you provided via email. I can confirm that this hang is a duplicate of MDEV-32371 : Thread 99 (Thread 0x7f72177a0700 (LWP 1099409)): … #6 sux_lock<ssux_lock_impl<true> >::s_lock (this=0x55cd115b7b30) at ./storage/innobase/include/sux_lock.h:362 No locals. #7 buf_page_get_zip (page_id={m_id = 452504869516544}, zip_size=zip_size@entry=8192) at ./storage/innobase/buf/buf0buf.cc:2228 discard_attempted = <optimized out> chain = @0x55cc92982928: {first = 0x55cd039e9cc8} hash_lock = @0x55cc92982900: {lk = {writer = {lock = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 2147483649}, <No data fields>}, static HOLDER = 2147483648}, readers = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 2147483649}, <No data fields>}, static WRITER = 2147483648}} bpage = 0x55cd115b7b18 #8 0x000055cc8acfbd94 in btr_copy_zblob_prefix (buf=<optimized out>, len=<optimized out>, zip_size=8192, id={m_id = 452504869516544}, offset=12) at ./storage/innobase/btr/btr0cur.cc:6739 … while executing a SELECT that is accessing an off-page column The same buf_pool.page_hash latch is being waited for when a corrupted page is being evicted from the buffer pool on read completion: Thread 133 (Thread 0x7f62fc8f5700 (LWP 1674819)): #0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 No locals. #1 0x000055cc8acad140 in ssux_lock_impl<true>::wait (lk=<optimized out>, this=0x55cc92982900) at ./storage/innobase/sync/srw_lock.cc:249 No locals. #2 ssux_lock_impl<true>::wr_wait (this=this@entry=0x55cc92982900, lk=<optimized out>) at ./storage/innobase/sync/srw_lock.cc:383 No locals. #3 0x000055cc8a51d6de in ssux_lock_impl<true>::wr_lock (this=0x55cc92982900) at ./storage/innobase/include/srw_lock.h:265 lk = <optimized out> #4 page_hash_latch::lock (this=0x55cc92982900) at ./storage/innobase/include/buf0types.h:214 No locals. #5 buf_pool_t::corrupted_evict (this=0x55cc8b74cb40 <buf_pool>, bpage=0x55ccdb6a1218, state=2147483648) at ./storage/innobase/buf/buf0lru.cc:1211 id = {m_id = <optimized out>} chain = @0x55cc92982918: {first = 0x55ccfcfddb18} hash_lock = @0x55cc92982900: {lk = {writer = {lock = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 2147483649}, <No data fields>}, static HOLDER = 2147483648}, readers = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 2147483649}, <No data fields>}, static WRITER = 2147483648}} unfix = <optimized out> s = <optimized out> #6 0x000055cc8ad08df5 in buf_page_t::read_complete (this=0x55ccdb6a1218, node=@0x55cc8f60da58: {space = 0x55cc8f6df778, name = 0x55cc8f601ea8 "XXX.ibd", handle = {m_file = 40, m_psi = 0x0}, on_ssd = 1, punch_hole = 0, atomic_write = 0, is_raw_disk = 0, deferred = 0, size = 201728, init_size = 0, max_size = 4294967295, being_extended = {m = {_M_base = {static _S_alignment = 1, _M_i = false}}}, chain = {prev = 0x0, next = 0x0}, block_size = 4096}) at ./storage/innobase/buf/buf0buf.cc:3622 database_corrupted = <optimized out> expected_id = {m_id = 452504869516546} read_frame = <optimized out> err = DB_FAIL recovery = <optimized out> ibuf_may_exist = <optimized out> #7 0x000055cc8ad4c29e in IORequest::read_complete (this=this@entry=0x55cc91cdb8e0, io_error=<optimized out>) at ./storage/innobase/fil/fil0fil.cc:2879 err = <optimized out> id = {m_id = 452504869516546} #8 0x000055cc8ac2ecd6 in read_io_callback (c=0x55cc91cdb868) at ./storage/innobase/os/os0file.cc:3446 cb = 0x55cc91cdb868 request = @0x55cc91cdb8e0: {bpage = 0x55ccdb6a1218, slot = 0x0, node = 0x55cc8f60da58, type = IORequest::READ_ASYNC} The watchdog on dict_sys.latch is not catching this hang, because we do not have any threads waiting for it, either in shared or exclusive mode. For a shared-mode wait, if there was a conflicting request (a thread is already holding an exclusive lock or waiting for one), I think that we would escalate to exclusive locking during the wait. This should employ the watchdog. The old watchdog, which covered most of InnoDB, was removed in MDEV-21452 . The main motivation was to reduce the memory footprint, to simplify the data structures and to improve performance. There were also some bugs in that subsystem, such as MDEV-11972 . I thought that implementing a watchdog on dict_sys.latch only would be sufficient. Of course, hangs should not exist in the first place. The sync_array only was useful for diagnosing some rather simple hangs that do not involve more than 2 mutexes. Here we have 3. In any case, to diagnose and fix nontrivial hangs like MDEV-32371 or MDEV-29835 we need stack traces.

Marko Mäkelä added a comment - 2023-11-30 10:35

I see that dict_sys.freeze() invokes srw_lock::rd_lock(), which in the futex-based implementation would temporarily escalate the lock in rd_wait(). These waits are not covered by the watchdog that is implemented in dict_sys_t::lock_wait(). It is not trivial to change this to use the watchdog, because we have multiple different implementations here: the futex-based one on Linux and various BSDs (~~MDEV-26476~~), SRWLOCK on Microsoft Windows, and something that is wrapped by rw_lock_t on other platforms. An attempt to implement the watchdog for dict_sys.freeze() could reduce performance in some workloads. Besides, for this particular hang it would not help at all, because none of the threads are waiting for dict_sys.latch.

When it comes to the root cause of this corruption, I am a bit puzzled. Based on MDEV-32115 it seems that the default wsrep_sst_method=rsync should work reliably on 10.5 and later releases. Perhaps this would better be explained by ~~MDEV-32174~~, which is a bug in ROLLBACK on ROW_FORMAT=COMPRESSED tables. But, I do not yet see how it could make individual pages look as corrupted when they are being read into the buffer pool.

Marko Mäkelä added a comment - 2023-11-30 10:35 I see that dict_sys.freeze() invokes srw_lock::rd_lock() , which in the futex-based implementation would temporarily escalate the lock in rd_wait() . These waits are not covered by the watchdog that is implemented in dict_sys_t::lock_wait() . It is not trivial to change this to use the watchdog, because we have multiple different implementations here: the futex-based one on Linux and various BSDs ( MDEV-26476 ), SRWLOCK on Microsoft Windows, and something that is wrapped by rw_lock_t on other platforms. An attempt to implement the watchdog for dict_sys.freeze() could reduce performance in some workloads. Besides, for this particular hang it would not help at all, because none of the threads are waiting for dict_sys.latch . When it comes to the root cause of this corruption, I am a bit puzzled. Based on MDEV-32115 it seems that the default wsrep_sst_method=rsync should work reliably on 10.5 and later releases. Perhaps this would better be explained by MDEV-32174 , which is a bug in ROLLBACK on ROW_FORMAT=COMPRESSED tables. But, I do not yet see how it could make individual pages look as corrupted when they are being read into the buffer pool.

MariaDB Server

Hang not caught by semaphore wait threshold

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration