Debian 11 Bullseye; MariaDB from the Maria repositories.
Description
Hello,
Last night right after midnight one of my servers seemed to hang. All (or at least a great many?) queries would hang forever in the Execute phase. Of course this hung up the entire Galera cluster as well.
I was awakened by an alert around 12:40am. My innodb_fatal_semaphore_wait_threshold is set to 32 seconds, so this hang was not caught by that watchdog.
At 12:49 I was able to send MariaDB a signal which caused it to crash and dump core. So I do have a stack trace for this situation which I don't want to post publicly, but which is available upon request. I also have SHOW PROCESSLIST logs for much of the time in case that helps.
If a MariaDB expert could take a look at the situation I would appreciate it! Thanks.
Attachments
Issue Links
duplicates
MDEV-32371Deadlock between buf_page_get_zip() and buf_pool_t::corrupted_evict() on InnoDB ROW_FORMAT=COMPRESSED table corruption
Just today, while working on additional instrumentation for MDEV-32899, I realized that the dict_sys.latch watchdog does not cover waits for a shared latch, only waits for an exclusive latch.
You have already filed MDEV-32371, whose fix I hope to push tomorrow, once some additional testing has been completed. The only reliable way to diagnose any hang is to produce the stack traces of all threads at the time of the hang, like you did in that bug. Is this bug something different?
Marko Mäkelä
added a comment - Just today, while working on additional instrumentation for MDEV-32899 , I realized that the dict_sys.latch watchdog does not cover waits for a shared latch, only waits for an exclusive latch.
You have already filed MDEV-32371 , whose fix I hope to push tomorrow, once some additional testing has been completed. The only reliable way to diagnose any hang is to produce the stack traces of all threads at the time of the hang, like you did in that bug. Is this bug something different?
Marko, the only reason I thought this was different was becuase it didn't get caught by the watchdog. I do have the full stack trace which I'll email you. It sounds like this is likely to be the same thing, just with a shared latch rather than an exclusive latch.
If this is the same thing, then maybe this bug should be converted to making the dict_sys.latch watchdog cover both kinds of latches?
Thank you!
Xan Charbonnet
added a comment - Marko, the only reason I thought this was different was becuase it didn't get caught by the watchdog. I do have the full stack trace which I'll email you. It sounds like this is likely to be the same thing, just with a shared latch rather than an exclusive latch.
If this is the same thing, then maybe this bug should be converted to making the dict_sys.latch watchdog cover both kinds of latches?
Thank you!
The watchdog on dict_sys.latch is not catching this hang, because we do not have any threads waiting for it, either in shared or exclusive mode. For a shared-mode wait, if there was a conflicting request (a thread is already holding an exclusive lock or waiting for one), I think that we would escalate to exclusive locking during the wait. This should employ the watchdog.
The old watchdog, which covered most of InnoDB, was removed in MDEV-21452. The main motivation was to reduce the memory footprint, to simplify the data structures and to improve performance. There were also some bugs in that subsystem, such as MDEV-11972. I thought that implementing a watchdog on dict_sys.latch only would be sufficient. Of course, hangs should not exist in the first place. The sync_array only was useful for diagnosing some rather simple hangs that do not involve more than 2 mutexes. Here we have 3. In any case, to diagnose and fix nontrivial hangs like MDEV-32371 or MDEV-29835 we need stack traces.
Marko Mäkelä
added a comment - xan@biblionix.com , thank you for the stack traces that you provided via email. I can confirm that this hang is a duplicate of MDEV-32371 :
Thread 99 (Thread 0x7f72177a0700 (LWP 1099409)):
…
#6 sux_lock<ssux_lock_impl<true> >::s_lock (this=0x55cd115b7b30) at ./storage/innobase/include/sux_lock.h:362
No locals.
#7 buf_page_get_zip (page_id={m_id = 452504869516544}, zip_size=zip_size@entry=8192) at ./storage/innobase/buf/buf0buf.cc:2228
discard_attempted = <optimized out>
chain = @0x55cc92982928: {first = 0x55cd039e9cc8}
hash_lock = @0x55cc92982900: {lk = {writer = {lock = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 2147483649}, <No data fields>}, static HOLDER = 2147483648}, readers = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 2147483649}, <No data fields>}, static WRITER = 2147483648}}
bpage = 0x55cd115b7b18
#8 0x000055cc8acfbd94 in btr_copy_zblob_prefix (buf=<optimized out>, len=<optimized out>, zip_size=8192, id={m_id = 452504869516544}, offset=12) at ./storage/innobase/btr/btr0cur.cc:6739
… while executing a SELECT that is accessing an off-page column
The same buf_pool.page_hash latch is being waited for when a corrupted page is being evicted from the buffer pool on read completion:
Thread 133 (Thread 0x7f62fc8f5700 (LWP 1674819)):
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
No locals.
#1 0x000055cc8acad140 in ssux_lock_impl<true>::wait (lk=<optimized out>, this=0x55cc92982900) at ./storage/innobase/sync/srw_lock.cc:249
No locals.
#2 ssux_lock_impl<true>::wr_wait (this=this@entry=0x55cc92982900, lk=<optimized out>) at ./storage/innobase/sync/srw_lock.cc:383
No locals.
#3 0x000055cc8a51d6de in ssux_lock_impl<true>::wr_lock (this=0x55cc92982900) at ./storage/innobase/include/srw_lock.h:265
lk = <optimized out>
#4 page_hash_latch::lock (this=0x55cc92982900) at ./storage/innobase/include/buf0types.h:214
No locals.
#5 buf_pool_t::corrupted_evict (this=0x55cc8b74cb40 <buf_pool>, bpage=0x55ccdb6a1218, state=2147483648) at ./storage/innobase/buf/buf0lru.cc:1211
id = {m_id = <optimized out>}
chain = @0x55cc92982918: {first = 0x55ccfcfddb18}
hash_lock = @0x55cc92982900: {lk = {writer = {lock = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 2147483649}, <No data fields>}, static HOLDER = 2147483648}, readers = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 2147483649}, <No data fields>}, static WRITER = 2147483648}}
unfix = <optimized out>
s = <optimized out>
#6 0x000055cc8ad08df5 in buf_page_t::read_complete (this=0x55ccdb6a1218, node=@0x55cc8f60da58: {space = 0x55cc8f6df778, name = 0x55cc8f601ea8 "XXX.ibd", handle = {m_file = 40, m_psi = 0x0}, on_ssd = 1, punch_hole = 0, atomic_write = 0, is_raw_disk = 0, deferred = 0, size = 201728, init_size = 0, max_size = 4294967295, being_extended = {m = {_M_base = {static _S_alignment = 1, _M_i = false}}}, chain = {prev = 0x0, next = 0x0}, block_size = 4096}) at ./storage/innobase/buf/buf0buf.cc:3622
database_corrupted = <optimized out>
expected_id = {m_id = 452504869516546}
read_frame = <optimized out>
err = DB_FAIL
recovery = <optimized out>
ibuf_may_exist = <optimized out>
#7 0x000055cc8ad4c29e in IORequest::read_complete (this=this@entry=0x55cc91cdb8e0, io_error=<optimized out>) at ./storage/innobase/fil/fil0fil.cc:2879
err = <optimized out>
id = {m_id = 452504869516546}
#8 0x000055cc8ac2ecd6 in read_io_callback (c=0x55cc91cdb868) at ./storage/innobase/os/os0file.cc:3446
cb = 0x55cc91cdb868
request = @0x55cc91cdb8e0: {bpage = 0x55ccdb6a1218, slot = 0x0, node = 0x55cc8f60da58, type = IORequest::READ_ASYNC}
The watchdog on dict_sys.latch is not catching this hang, because we do not have any threads waiting for it, either in shared or exclusive mode. For a shared-mode wait, if there was a conflicting request (a thread is already holding an exclusive lock or waiting for one), I think that we would escalate to exclusive locking during the wait. This should employ the watchdog.
The old watchdog, which covered most of InnoDB, was removed in MDEV-21452 . The main motivation was to reduce the memory footprint, to simplify the data structures and to improve performance. There were also some bugs in that subsystem, such as MDEV-11972 . I thought that implementing a watchdog on dict_sys.latch only would be sufficient. Of course, hangs should not exist in the first place. The sync_array only was useful for diagnosing some rather simple hangs that do not involve more than 2 mutexes. Here we have 3. In any case, to diagnose and fix nontrivial hangs like MDEV-32371 or MDEV-29835 we need stack traces.
I see that dict_sys.freeze() invokes srw_lock::rd_lock(), which in the futex-based implementation would temporarily escalate the lock in rd_wait(). These waits are not covered by the watchdog that is implemented in dict_sys_t::lock_wait(). It is not trivial to change this to use the watchdog, because we have multiple different implementations here: the futex-based one on Linux and various BSDs (MDEV-26476), SRWLOCK on Microsoft Windows, and something that is wrapped by rw_lock_t on other platforms. An attempt to implement the watchdog for dict_sys.freeze() could reduce performance in some workloads. Besides, for this particular hang it would not help at all, because none of the threads are waiting for dict_sys.latch.
When it comes to the root cause of this corruption, I am a bit puzzled. Based on MDEV-32115 it seems that the default wsrep_sst_method=rsync should work reliably on 10.5 and later releases. Perhaps this would better be explained by MDEV-32174, which is a bug in ROLLBACK on ROW_FORMAT=COMPRESSED tables. But, I do not yet see how it could make individual pages look as corrupted when they are being read into the buffer pool.
Marko Mäkelä
added a comment - I see that dict_sys.freeze() invokes srw_lock::rd_lock() , which in the futex-based implementation would temporarily escalate the lock in rd_wait() . These waits are not covered by the watchdog that is implemented in dict_sys_t::lock_wait() . It is not trivial to change this to use the watchdog, because we have multiple different implementations here: the futex-based one on Linux and various BSDs ( MDEV-26476 ), SRWLOCK on Microsoft Windows, and something that is wrapped by rw_lock_t on other platforms. An attempt to implement the watchdog for dict_sys.freeze() could reduce performance in some workloads. Besides, for this particular hang it would not help at all, because none of the threads are waiting for dict_sys.latch .
When it comes to the root cause of this corruption, I am a bit puzzled. Based on MDEV-32115 it seems that the default wsrep_sst_method=rsync should work reliably on 10.5 and later releases. Perhaps this would better be explained by MDEV-32174 , which is a bug in ROLLBACK on ROW_FORMAT=COMPRESSED tables. But, I do not yet see how it could make individual pages look as corrupted when they are being read into the buffer pool.
People
Marko Mäkelä
Xan Charbonnet
Votes:
0Vote for this issue
Watchers:
2Start watching this issue
Dates
Created:
Updated:
Resolved:
Git Integration
Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.
Just today, while working on additional instrumentation for
MDEV-32899, I realized that the dict_sys.latch watchdog does not cover waits for a shared latch, only waits for an exclusive latch.You have already filed
MDEV-32371, whose fix I hope to push tomorrow, once some additional testing has been completed. The only reliable way to diagnose any hang is to produce the stack traces of all threads at the time of the hang, like you did in that bug. Is this bug something different?