[MDEV-32371] Deadlock between buf_page_get_zip() and buf_pool_t::corrupted_evict() on InnoDB ROW_FORMAT=COMPRESSED table corruption Created: 2023-10-07 Updated: 2023-12-11 Resolved: 2023-11-30 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.6, 10.7, 10.8, 10.9, 10.10, 10.11, 11.0, 11.1, 11.2, 11.3, 10.11.5 |
| Fix Version/s: | 10.6.17, 10.11.7, 11.0.5, 11.1.4, 11.2.3, 11.3.2 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Xan Charbonnet | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | hang | ||
| Environment: |
Debian 11 Bullseye; MariaDB from the Maria repositories. |
||
| Issue Links: |
|
||||||||||||||||
| Description |
|
I've had two different servers this week hit an issue with a semaphore wait lasting forever. This bug report isn't about what caused those wait-forever situations: it's about how the server handled them. If I'm reading the code in storage/innobase/srv/srv0srv.cc correctly, there is supposed to be a warning at 1/4 of the innodb_fatal_semaphore_wait_threshold value, at 1/2 of that value, and at 3/4 of that value before killing the server once the threshold value is reached. Examining the syslog, however, shows this doesn't seem to be working. One of my servers gave no warnings at all. It waited the full 600 seconds and then gave the fatal error as it crashed itself. The "Long wait" message never appeared at all. The other server did give the "Long wait" warning, starting at 150 seconds, which is the correct time to start (1/4 of 600). I would expect a warning at 300 seconds, then 450 seconds. However it instead warned again at 159 seconds, and then every 10 seconds after that until 289 at which time I killed the server myself rather than waiting the full 600. I would like to change my innodb_fatal_semaphore_wait_threshold setting to a dramatically lower number. I'll be able to tell whether it's safe to do so by making incremental changes and observing the presence or absence of "Long wait" warnings. However, if these warnings aren't behaving the way they're supposed to, that won't work. Apologies if I'm misunderstanding what the behavior is supposed to be. It doesn't seem like zero warnings in some cases and warnings every 10 seconds in other cases would be the intended behavior. |
| Comments |
| Comment by Marko Mäkelä [ 2023-10-07 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The "long semaphore wait" diagnostics and sync_array ( Can you please provide stack traces of all threads during the hangs? That is the only reliable way to identify and to fix hangs. A known hang that affects the currently latest releases is | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Xan Charbonnet [ 2023-10-11 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Marko, I'm not sure I understand your explanation of what ended up causing my weird pattern of "long wait" warnings, but I certainly appreciate your help! I'm afraid the only stack trace I have is the one where no warnings were given. For the other one, the one where warnings were appearing every 10 seconds, I killed the server manually rather than letting it time out and die. I've shrunk the timeout from 600 seconds to 152 seconds, so maybe something will be happening more frequently going forward. Here's what went into the syslog: 2023-10-04 1:31:50 0 [ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch. Please refer to https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-my | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-10-11 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
xan@biblionix.com, the stack trace that was written to the log is basically useless, because it apparently is from the thread that intentionally kills the server after noticing a hang. (The InnoDB function names in the stack trace were not even resolved by the built-in stack trace generator; this is a known problem.) To analyze the root cause of any hang, we really need the stack traces of all threads of the process. In https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mariadbd/#enabling-core-dumps you should be able to find how to enable core dumps and how to extract stack traces from a core dump. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Xan Charbonnet [ 2023-10-18 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Marko, I enabled core dumps and now have a stack trace for two crashes which have occurred over the past few days. There is definitely a lot more information in these stack traces. In fact there's so much information that I'm not sure I can guarantee it's appropriate for public display. Is there a more private way for me to get these stack traces to you? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-10-18 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
xan@biblionix.com, that is great news. You can send the stack traces to me by email. I just sent an email to you. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-10-19 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thank you, I got the stack traces of two hung processes. I will need to analyze them some time later, once I am done with some urgent tasks. This looks like a new type of hang. I see many threads invoking buf_pool_t::corrupted_evict(), trying to remove a corrupted page from the buffer pool. We did test this logic with some debug instrumentation during the development of | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-11-02 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
In one of the provided sets of stack traces, I found that a thread is waiting in buf_page_get_zip() here:
The comment may have been truthful before There is also one thread apparently in a busy loop, looking for a page of a ROW_FORMAT=COMPRESSED table:
I checked another set of stack traces and found the same pattern. In the most recent one trace_2023-10-31_21-13-25 we have Thread 76 waiting for buf_pool.mutex:
The page ID (452504869516550 == 0x19b8d0001b506) is almost identical with what we have in another thread, but it does not seem to share the buf_pool.page_hash latch with it (0x5638f1ceca00 is not 0x5638f1cec900):
By searching for 0x5638f1cec900 I found a thread that seems to be a culprit in this deadlock:
Bingo, this thread is actually holding buf_pool.mutex (blocking a lot of threads), and waiting for the page hash latch that Thread 58 is holding while holding the page latch that Thread 58 is waiting for:
The function buf_page_get_zip() needs to be revised so that it will not wait for the page latch while holding the buf_pool.page_hash latch. This function is fetching BLOB pages of ROW_FORMAT=COMPRESSED tables. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Matthias Leich [ 2023-11-24 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
origin/10.6- | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-11-30 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
xan@biblionix.com, in case you are interested, at https://ci.mariadb.org/41154/ you can find a packages of the current 10.11 development branch, which includes a fix of this bug. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Xan Charbonnet [ 2023-11-30 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Marko, thank you very much! | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Xan Charbonnet [ 2023-12-11 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
@marko, I ended up reccompiling the Debian package after making your modification. Have not seen a hang since. Thanks! |