Details
-
New Feature
-
Status: Open (View Workflow)
-
Minor
-
Resolution: Unresolved
-
None
Description
In the event of a replication checksum error, or other configurable replication issue, an option to be made available to have replication retry connections, rather than stop (current behaviour)
Currently, in the event of a replication checksum error "[ERROR] Slave I/O: Replication event checksum verification failed while reading from network, Internal MariaDB error code: 1743" the manual solution (as recommended by MariaDB Support) is as follows:
- Solution Approach:
- - First we check if replication is broken with following error in "show replica status\G":
- - Last_IO_Error: Relay log write failure: could not queue event from master
- - Slave_IO_Running: No
- If that is the case then we try following steps:
- 1. Perform "STOP SLAVE" and "START SLAVE" commands to restart slave process
- 2. Check if replication error is gone and it is resumed successfully
- 3. If replication is still broken with same error after restarting slave process, then we need to re-establish the replication from a fresh master backup.
The request here, is to automate that process once detected and retry the connection for a configurable number of retries, much like currently happens in the event of a server being unreachable (Timeouts).