[MDEV-23806] Undo page corruption on recovery - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.5.2
Fix Version/s: 10.5.7
Component/s: Storage Engine - InnoDB
Labels:

Description

I analyzed an rr replay trace where an undo log page was corrupted. The undo log page contents was recovered entirely based on redo log records (thanks to ~~MDEV-12699~~).

It turns out that when we removed the MLOG_UNDO_ERASE_END record in MariaDB 10.3.3 in an attempt to reduce our redo log volume, we created a technical debt that was collected when ~~MDEV-12353~~ optimized the redo log volume further. The ~~MDEV-12353~~ replacement of mlog_write_ulint() would avoid logging the first bytes that were not actually changed in the page. But, because trx_undo_report_row_operation() is invoking memset() without writing redo log about it, the page images would differ between the time the server was killed, and the time the page was recovered.

To avoid this corruption, we must write redo log for the memset() operation unless the entire undo log page will be freed in the mini-transaction.

This bug is apparently very hard to hit, because even though ~~MDEV-12353~~ introduced it already in 10.5.2, we first saw it a week ago when testing a development version of ~~MDEV-23399~~ (which changes the page flushing algorithm and could therefore affect timings).

Attachments

Issue Links

is caused by

MDEV-12353 Efficient InnoDB redo log record format

Closed

Activity

Transition	Time In Source Status	Execution Times

Marko Mäkelä made transition - 2020-09-24 12:18

Open

Closed

1h 9m

People

Assignee:: Marko Mäkelä

Reporter:: Marko Mäkelä

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2020-09-24 11:09

Updated:: 2020-10-06 18:38

Resolved:: 2020-09-24 12:18

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server