Details
-
Task
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
None
Description
It is dangerous if the IO thread resumes in the middle of an event group (e.g., transaction) but reconnects to (the middle of) a different group (such as one from a different server on the same web address).
The current error guard relies on the 'G' in "GTID" to hold between servers, but two equivalent groups can have transparent differences, not to mention scenarios without this equivalence.
Let alone non-GTID replication, which uses generic positions rather than “unique” identifiers.
Both would have to compare event-by-event to be sure of a strict equality.
But this error check can be avoided in the first place if the IO thread doesn't need to reconnect in the middle of an event group, including when handling crashes.
One concern is that STOP SLAVE (IO_THREAD) can take a long time to finish a humongous event group.
But such giant groups are bound to stall other components anyway.
In contrast, the SQL thread already does not pause mid-group, but could rather abort the transaction instead.
Attachments
Issue Links
- includes
-
MDEV-33268 IO Thread Can Write Gtid_list_log_event Mid-transaction into Relay Log
-
- Open
-
- relates to
-
MDEV-39334 "Waiting for the slave SQL thread to free enough relay log space" Causes silent replication failure
-
- Open
-
- split from
-
MDEV-38907 Optimistic Relay Log Crash Recovery
-
- Stalled
-