While the testing of MDEV-33588, mleich encountered an rr replay trace where the purge_coordinator_task is trying to parse arbitrary data in the middle of an undo log record. I narrowed it down to the following code in purge_sys_t::choose_next_log():
const trx_undo_rec_t *undo_rec=
|
trx_undo_page_get_first_rec(b, hdr_page_no, hdr_offset);
|
Here, b is undo log page 13, and hdr_page_no and hdr_offset are related to that. I think that we would benefit from some testing with the following patch to catch the root cause of the corruption sooner:
@@ -134,8 +134,9 @@ trx_undo_page_get_first_rec(const buf_block_t *block, uint32_t page_no,
|
uint16_t offset)
|
{
|
uint16_t start= trx_undo_page_get_start(block, page_no, offset);
|
- return start == trx_undo_page_get_end(block, page_no, offset)
|
- ? nullptr : block->page.frame + start;
|
+ uint16_t end= trx_undo_page_get_end(block, page_no, offset);
|
+ ut_ad(start <= end);
|
+ return start >= end ? nullptr : block->page.frame + start;
|
}
|
|
/** Get the last undo log record on a page.
|
In the trace that I analyzed, we would be reading an invalid start value that points outside the page frame.