[MDEV-28019] Do not clear queued binlogs on slave process restart if SQL_SLAVE_SKIP_COUNTER is set Created: 2022-03-07  Updated: 2022-03-07

Status: Open
Project: MariaDB Server
Component/s: None
Fix Version/s: None

Type: Task Priority: Minor
Reporter: Assen Totin (Inactive) Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None


 Description   

This is more of a feature request than a bug, but would help speed up the recovery of replica that got stalled when the original binlog is no longer found on the master.

Scenario:

  • Primary + replica, in sync.
  • The slave thread on the replica stops due to some error (e.g., duplicate GTID with strict mode)
  • The I/O thread keeps running normally, spooling binlogs locally.
  • The error is deemed unimportant and the DBA decides he wants to skip over the error and continue the replication.
  • He sets SQL_SLAVE_SKIP_COUNTER and restarts the slave process with SQL "stop slave; start slave".
  • Starting the slave purges all spooled binlogs.
  • The replica attempts to pull all binlogs anew (even if they were spooled before being purged on start).
  • If the original binlog is no longer available on the master, the replication cannot commence and a complete rebuild of the replica is required.

It is suggested here that the existing spooled binlogs do not get purged on slave start if SQL_SLAVE_SKIP_COUNTER is already set.


Generated at Thu Feb 08 09:57:26 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.