[MDEV-38906] Do not resume IO Threads in the middle of an event group - Jira

XML

Word

Printable

Details

Type: Task
Status: Open (View Workflow)
Priority: Major
Resolution: Unresolved
Fix Version/s: None
Component/s: Replication
Labels:
- REPLICATION26

Epic Link:
Preserve Relay Logs with GTIDs

Description

It is dangerous if the IO thread resumes in the middle of an event group (e.g., transaction) but reconnects to (the middle of) a different group (such as one from a different server on the same web address).

The current error guard relies on the 'G' in "GTID" to hold between servers, but two equivalent groups can have transparent differences, not to mention scenarios without this equivalence.
Let alone non-GTID replication, which uses generic positions rather than “unique” identifiers.
Both would have to compare event-by-event to be sure of a strict equality.

But this error check can be avoided in the first place if the IO thread doesn't need to reconnect in the middle of an event group, including when handling crashes.

One concern is that STOP SLAVE (IO_THREAD) can take a long time to finish a humongous event group.
But such giant groups are bound to stall other components anyway.
In contrast, the SQL thread already does not pause mid-group, but could rather abort the transaction instead.

Attachments

Issue Links

includes

MDEV-33268 IO Thread Can Write Gtid_list_log_event Mid-transaction into Relay Log

Open

relates to

MDEV-39334 "Waiting for the slave SQL thread to free enough relay log space" Causes silent replication failure

Confirmed

split from

MDEV-38907 Optimistic Relay Log Crash Recovery

Stalled

Activity

People

Assignee:: Jimmy Hú

Reporter:: Jimmy Hú

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2026-02-25 22:00

Updated:: 2026-04-14 21:00

Time Tracking

Estimated:

Remaining:

2d 4h 35m

Logged:

3h 25m

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.