[MDEV-31390] Assertion id.is_corrupted() failed in buf_page_create_low() Created: 2023-06-02 Updated: 2023-10-24 Resolved: 2023-10-24 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.6.6, 10.6, 10.7, 10.8, 10.9, 10.10, 10.11, 11.0, 11.1 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Critical |
| Reporter: | Marko Mäkelä | Assignee: | Matthias Leich |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | debug, need_rr, performance | ||
| Issue Links: |
|
||||||||
| Description |
|
The above code is trying to acquire the page latch while holding the buf_pool.mutex. This would violate the latching order if we didn't use a non-blocking wait. If the above patch was not applied, the page could be evicted and replaced with something else in the buffer pool while this thread is not holding either buf_pool.mutex or the page latch. To make this safe, we must first buffer-fix the block so that buf_page_t::can_relocate() will not hold, and then release the buf_pool.mutex. The observed symptom was a debug assertion failure that was caught by mleich:
In the core dump, we had id.m_id==0x1c00000070 but page_id.m_id==0x9b00000007. In the stack trace, I could see that we were trying to allocate a block for tablespace 0x9b. I believe that this could explain I will check if we are missing the buffer-fix in any other places that follow a similar pattern. |
| Comments |
| Comment by Marko Mäkelä [ 2023-06-02 ] | ||||||||||||||||||||||||||||
|
I realized that the debug assertion failure cannot lead to any corruption, because in the code path where it is executed, we would goto retry and proceed look up the page again. The patch that I posted will merely simplify code and slightly improve performance. I checked the code base, and any other use of trying to acquire page latches without waiting looks safe. | ||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-06-02 ] | ||||||||||||||||||||||||||||
|
In fact, the patch that I posted in the Description would trigger an assertion failure in the test innodb.innodb-wl5522-debug. We could try a simpler fix that should only invoke goto retry when the block had been read-fixed and found to be corrupted:
However, this patch will actually make the same test hang. The page read completion thread would wait forever for the buffer-fix to be released so that the corrupted page can be evicted:
The best alternative would seem to be to simply remove the failing debug assertion. Before we do that, it would be good to produce a simplified grammar for reproducing the debug assertion failure. |