[MDEV-31767] InnoDB tables are being flagged as corrupted on an I/O bound server Created: 2023-07-24 Updated: 2024-01-10 Resolved: 2023-07-26 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.6.12, 10.6.13, 10.7, 10.8, 10.9, 10.10, 10.11, 11.0, 11.1, 11.2, 10.6.14 |
| Fix Version/s: | 10.6.15, 10.9.8, 10.10.6, 10.11.5, 11.0.3, 11.1.2, 11.2.1 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Marko Mäkelä | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | CS0668311, corruption, race | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||
| Description |
|
This was reproduced while trying to reproduce an older issue Some InnoDB B-tree cursor refactoring in The root cause seems to be that some operations are accessing the buffer page frame contents while only holding a buffer-fix on the page, not a page latch. It could be the case that the page is being read into the buffer pool, or it is being decrypted or decompressed. In some core dumps of such failures (with additional instrumentation to essentially revert |
| Comments |
| Comment by Marko Mäkelä [ 2023-07-25 ] | |||||
|
There could be an even older culprit to this than I do not think that it ever is a good idea to use buffer-fixing for the first-time lookup of a data page in a mini-transaction. If the page was not in the buffer pool and had to be loaded into it, the buffer-fixing could gain access to the page before the read request was completed and the page checksum was validated. Before What my fix aims to do is to acquire proper page latches upfront. To avoid deadlocks when acquiring page latches in the wrong order (not from left to right), we can safely release a page latch for a short while while waiting for the left sibling page latch. A buffer-fix will prevent the current block from being evicted from the buffer pool. | |||||
| Comment by Thirunarayanan Balathandayuthapani [ 2023-07-26 ] | |||||
|
Patch looks OK to me | |||||
| Comment by Matthias Leich [ 2023-07-26 ] | |||||
|
Two runs of the RQG test battery on a RelWithDebInfo build and one run on a debug build of | |||||
| Comment by Michael Widenius [ 2023-08-26 ] | |||||
|
This bug affects at least long term releases versions 10.6.12 - 10.6.14 and 10.11.2-10.11.4 Anyone using a short term release should ugrade to the next long term release or to the latest one in their serie. | |||||
| Comment by Marko Mäkelä [ 2023-10-20 ] | |||||
|
This bug had been reproduced while trying to reproduce another issue
This bug was a race condition that would allow a being-read page to be accessed before it had been fully read or uncompressed. As a result, the table may be claimed to be corrupted, even though it is not. |