[MDEV-4698] With GTID replication, relay logs cannot be relied upon while purging binary logs on master - Jira

XML

Word

Printable

Details

Type: Bug
Status: Stalled (View Workflow)
Priority: Critical
Resolution: Unresolved
Affects Version/s: 10.0.10
Fix Version/s: 10.11, 11.4, 11.8
Component/s: Replication
Labels:

Bug Category:
Related to performance
Epic Link:
Meaningful Replication Testing is Severely Limited due to General Instability
Sprint:
Q3/2025 Maintenance, Q4/2025 Server Maintenance, Q1/2026 Server Maintenance

Description

I know from the corresponding thread on the mailing list that it is an intentional change for the sake of crash safety:

There's a well-known problem in MySQL that relay_log.info is not
always in sync with the database state, and killing and restarting
server at an unfortunate time will cause it to re-execute last
statement and potentially break replication. Did you fix this
situation with GTIDs and rpl_slave_state table? It looks like with

Yes. When using GTID, relay logs are ignored whenever slave threads are
restarted, such as slave server restart, events are fetched anew from the
server starting at the current GTID position.
⸺ knielsen, Re: GTID replication and relay_log.info

With traditional (binlog-position-based) replication it is quite possible and even reasonable to setup master binlog purging procedure based on the slave IO thread status: as soon as the IO thread is done with a master binary log and switched to the next one, all events are in the relay log, and the master binary log can be purged. It is efficient in the sense that if the slave thread is far behind, a lot of disk space can be spared by not storing the same events both in the master binlog and in the relay log; even more so if the server features the sql_delay (master_delay) functionality introduced in MySQL 5.6, and the slave is configured to keep a time gap with the master.
It also saves the network traffic if the lagging slave gets restarted, because the local relay logs are preserved and the IO thread does not have to re-read all the events again.

So, all in all, I expect there are real-life configurations which rely on this behavior.

Now, with GTID the relay logs are not stored on slave restart any longer, so users must not configure their purge procedure this way, but should use SQL thread position instead. It needs to be explicitly documented, because otherwise users can experience irreversible loss of events.

Attachments

Issue Links

blocks

MDEV-5274 binlog rotation based on slave state

Closed

causes

MDEV-8945 Avoid overloading the master NIC on restarting IO_THREAD on lagging slave.

Closed

duplicates

MDEV-8945 Avoid overloading the master NIC on restarting IO_THREAD on lagging slave.

Closed

is duplicated by

MDEV-33645 Stop and Start slave reset the Master_info_file

Confirmed

relates to

MDEV-6589 Incorrect relay log start position when restarting SQL thread after error in parallel replication

Closed

MDEV-8959 change master to master_use_gtid= purges relay log and fetchs binlog from master

Closed

(1 relates to)

Sub-Tasks

1.	How does non-GTID replication implement this with crash compatibility?		Closed	Jimmy Hú
2.	How to expanding this non-GTID capability for GTID replication?		Closed	Jimmy Hú

Activity

People

Assignee:: Jimmy Hú

Reporter:: Elena Stepanova

Assigned for Implementation:: Jimmy Hú

Assigned for Review:: Kristian Nielsen

Assigned for Testing:: Susil Behera

Votes:: 3 Vote for this issue

Watchers:: 12 Start watching this issue

Dates

Created:: 2013-06-23 16:15

Updated:: Yesterday 06:22

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.