[MDEV-38907] Optimistic Relay Log Crash Recovery - Jira

XML

Word

Printable

Details

Type: New Feature
Status: Stalled (View Workflow)
Priority: Major
Resolution: Unresolved
Fix Version/s: 13.2
Component/s: Replication
Labels:
- REPLICATION26
- crash-recovery

Epic Link:
Preserve Relay Logs with GTIDs
Sprint:
Q1/2026 Server Development, Q3/2026 Replic. Development
PM Planning:
- FR_ENGINEERING
- PM_TRIAGE

Description

In the feature, we should improve this routine in order to avoid throwing
away logs that are safely stored in the disk. Note also that this recovery
routine relies on the correctness of the relay-log.info and only tolerates
coordinate problems in master.info.
⸺ Code Documentation of `init_recovery()` of `slave.cc`

During startup (relay log initialization), a recovery process can keep the safe portion of the log intact and only truncate the trailing corruption.
This may be a partial event, an incomplete event group, or (failing all those) the last file written to.

This light procedure can even be activated automatically, superceding (if not deprecating) @@relay_log_recovery, which invalidates the entire relay log.

Truncating to the last whole event, leaving the group incomplete, will require leveraging the IO Thread's "ability" to resume mid-group.
But truncating to the last complete group has advantages:

MDEV-38906 wants to get rid of this error-prone ability.
Binlog Recovery (TC_LOG_BINLOG::recover()) already promises this result, just with XA rollback additionally.

Attachments

Issue Links

blocks

MDEV-4698 With GTID replication, relay logs cannot be relied upon while purging binary logs on master

Stalled

relates to

MDEV-6811 Try to recovery from relay log read problem automatically

Open

MDEV-8946 Add replication crash-safety for non-GTID slave.

Open

MDEV-24625 Failure to open binary log does not cause a fatal error, but leads to further assorted sporadic problems

Stalled

MDEV-38192 Extend Binlog-in-Engine to Replicate XA Prepare

Open

MDEV-38909 GTID State for Relay Log

Open

split from

MDEV-38911 GTID List based Relay Log seeking

Open

split to

MDEV-38906 Do not resume IO Threads in the middle of an event group

Open

MDEV-39051 Optimistic Relay Log Crash Recovery for non-GTID replication

Open

(1 relates to, 1 split from, 2 split to)

Activity

People

Assignee:: Jimmy Hú

Reporter:: Jimmy Hú

Assigned for Implementation:: Jimmy Hú

Assigned for Review:: Kristian Nielsen

Assigned for Testing:: Deepthi Eranti Sreenivas

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 2026-02-25 22:20

Updated:: 1 week ago 02:18

Time Tracking

Estimated:

Remaining:

2d 3h 50m

Logged:

4.75d

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.