Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34694

Replication retry on checksum errors

    XMLWordPrintable

Details

    Description

      In the event of a replication checksum error, or other configurable replication issue, an option to be made available to have replication retry connections, rather than stop (current behaviour)

      Currently, in the event of a replication checksum error "[ERROR] Slave I/O: Replication event checksum verification failed while reading from network, Internal MariaDB error code: 1743" the manual solution (as recommended by MariaDB Support) is as follows:

      • Solution Approach:
      • - First we check if replication is broken with following error in "show replica status\G":
      • - Last_IO_Error: Relay log write failure: could not queue event from master
      • - Slave_IO_Running: No
      • If that is the case then we try following steps:
      • 1. Perform "STOP SLAVE" and "START SLAVE" commands to restart slave process
      • 2. Check if replication error is gone and it is resumed successfully
      • 3. If replication is still broken with same error after restarting slave process, then we need to re-establish the replication from a fresh master backup.

      The request here, is to automate that process once detected and retry the connection for a configurable number of retries, much like currently happens in the event of a server being unreachable (Timeouts).

      Attachments

        Activity

          People

            Unassigned Unassigned
            richmeese Rich Meese
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.