[MDEV-27939] Log buffer wrap-around errors on PMEM Created: 2022-02-24 Updated: 2022-02-25 Resolved: 2022-02-25 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Backup, Storage Engine - InnoDB |
| Affects Version/s: | 10.8.1, 10.8.2 |
| Fix Version/s: | 10.8.3 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Marko Mäkelä | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | crash, recovery | ||
| Environment: |
GNU/Linux with libpmem |
||
| Issue Links: |
|
||||||||
| Description |
|
mleich reproduced an assertion failure in mariadb-backup --prepare:
In the core dump, we actually have recv_sys.offset == recv_sys.len, so it looks like the assertion should be relaxed to include the equality. Furthermore, at the end of the loop body, the recv_sys.offset needs to wrap around to log_sys.START_OFFSET (0x3000). To be able to repeat that bug, I modified an existing test case to trigger log file wrap-around exactly at a record boundary:
This test works fine when the mmap() based interface to the log is not being used. That interface would be used when InnoDB is linked with libpmem and the data directory resides in /dev/shm or in a mount -o dax mounted file system. If those conditions are met, InnoDB would fail to start up like this:
An analysis shows that parsing wrongly continues at recv_sys.offset=0x400000, that is, right past the end of the mmap() buffer. The offset should have wrapped around to log_sys.START_OFFSET. So, there is something to fix in both the server and the backup. |
| Comments |
| Comment by Marko Mäkelä [ 2022-02-24 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The following fixes the server startup for the changed test:
However, the test is not exercising the suspected backup bug, because backup is starting from recv_sys.offset=log_sys.START_LSN in the trace that I analyzed. The reason is that the checkpoint LSN had advanced to 0x1000fffffe20 (one extra checkpoint after startup). The reason for that is that the log is being rebuilt on startup because the log file was being rebuilt in order to add encryption. With these additional tweaks, I finally exercised the backup code:
And this temporary fault injection, to ignore the unwanted new checkpoint during backup, specific to this test case:
Backup did not crash in the expected place, because the pointer was being wrapped to recv_sys.offset=log_sys.START_OFFSET thanks to the above fix. But, something else failed:
The function expected a file name like "./c/b.ibd" and not just "c/b.ibd" like the test case is writing it. This wasn’t exercised earlier because the server had always rebuilt the log file at startup before backup got to read it. Finally, with that fixed, we got the next problem at mariadb-backup --prepare:
I will continue fixing this tomorrow. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-02-25 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
There was one more bug in backup, which caused 2 extra bytes to be written to the log, thus corrupting the backup. Also this one affected PMEM only:
|