[MDEV-32639] InnoDB recovery fails due to corrupted page despite innodb_doublewrite=ON Created: 2023-10-31 Updated: 2023-11-13 Resolved: 2023-11-13 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.5, 10.6 |
| Fix Version/s: | 10.5.23, 10.6.16, 10.10.7, 10.11.6, 11.0.4, 11.1.3, 11.2.2, 11.3.1 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Marko Mäkelä | Assignee: | Marko Mäkelä |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | recovery | ||
| Description |
|
I observed a test failure in a CI system:
I downloaded https://ci.mariadb.org/39864/logs/aarch64-fedora-38/var.tar.gz and checked the contents. At the start of the page we can see FIL_PAGE_LSN 0x10c4b and a FIL_PAGE_TYPE of FIL_PAGE_TYPE_SYS. At the end, right before the 32-bit checksum, we see the 32 least significant bits of an LSN 0x39e3 (not 0x10c4b). It looks like the server had been killed in the middle of the page write. This is very well possible in the innodb_fts.crash_recovery,release test variant; it kills the server at random intervals. I assume that this was an undo log page that had been freed and later reallocated. The problem here is that there is no copy of the corrupted page in the doublewrite buffer of the InnoDB system tablespace ibdata1 (pages 64 to 191). I think that to reproduce this problem, I must add some fault injection so that we would occasionally write less than a full page. Upon the completion of such an incomplete write, we would crash. This should hopefully improve the chances of reproducing the corruption. This failure may occur in any crash recovery test:
|
| Comments |
| Comment by Marko Mäkelä [ 2023-11-13 ] |
|
This recovery bug was introduced in the main commit of |