Details
-
Task
-
Status: Open (View Workflow)
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
Description
This is more of a feature request than a bug, but would help speed up the recovery of replica that got stalled when the original binlog is no longer found on the master.
Scenario:
- Primary + replica, in sync.
- The slave thread on the replica stops due to some error (e.g., duplicate GTID with strict mode)
- The I/O thread keeps running normally, spooling binlogs locally.
- The error is deemed unimportant and the DBA decides he wants to skip over the error and continue the replication.
- He sets SQL_SLAVE_SKIP_COUNTER and restarts the slave process with SQL "stop slave; start slave".
- Starting the slave purges all spooled binlogs.
- The replica attempts to pull all binlogs anew (even if they were spooled before being purged on start).
- If the original binlog is no longer available on the master, the replication cannot commence and a complete rebuild of the replica is required.
It is suggested here that the existing spooled binlogs do not get purged on slave start if SQL_SLAVE_SKIP_COUNTER is already set.