[MDEV-34694] Replication retry on checksum errors - Jira

Details

Type: New Feature
Status: Open (View Workflow)
Priority: Minor
Resolution: Unresolved
Fix Version/s: None
Component/s: Replication
Labels:
- replication

Description

In the event of a replication checksum error, or other configurable replication issue, an option to be made available to have replication retry connections, rather than stop (current behaviour)

Currently, in the event of a replication checksum error "[ERROR] Slave I/O: Replication event checksum verification failed while reading from network, Internal MariaDB error code: 1743" the manual solution (as recommended by MariaDB Support) is as follows:

Solution Approach:
- First we check if replication is broken with following error in "show replica status\G":
- Last_IO_Error: Relay log write failure: could not queue event from master
- Slave_IO_Running: No
If that is the case then we try following steps:
1. Perform "STOP SLAVE" and "START SLAVE" commands to restart slave process
2. Check if replication error is gone and it is resumed successfully
3. If replication is still broken with same error after restarting slave process, then we need to re-establish the replication from a fresh master backup.

The request here, is to automate that process once detected and retry the connection for a configurable number of retries, much like currently happens in the event of a server being unreachable (Timeouts).

Attachments

Activity

There are no comments yet on this issue.

People

Assignee:: Unassigned

Reporter:: Rich Meese

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 2024-08-02 10:40

Updated:: 2024-08-02 17:05

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server