Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-21002

Galera Cluster Node During IST Goes from "Synced" to "Joining: receiving State transfer" (stuck, requires kill -9)

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 10.4.11, 10.4.8, 10.4.9, 10.4.10
    • Fix Version/s: 10.4.13
    • Component/s: Galera
    • Labels:
      None
    • Environment:
      RHEL 7 on x64 VM. MariaDB from MariaDB repo via Artifactory (not RHEL repo). No docker. 5-node cluster.

      Description

      Summary: Appears Galera has difficulty switching the value for wsrep_local_state_comment from "Joining: receiving State transfer" back to "Synced" during network slowdowns (and subsequent IST) and then mysqld becomes unstable (unable to stop gracefully).

      Solution that works most of the time: kill -9 the process, delete entire datastore on a cluster node, re-join the cluster.

      Background: We built a new cluster from scratch using a fresh install of 10.4.8 (and 10.4.9). Imported data and grants fresh from SQL (no carry over of any data files). During some brief network outages, a random node will switch its value for wsrep_local_state_comment from "Synced" to "Joining: receiving State transfer" and stay there. No errors in the logs from either the donor or the random node and it looks like the sync successfully completed (no SST rsync processes or any other evidence transfer still in motion). Debug logging unhelpful. Trying to nicely stop MariaDB on random node not possible without kill -9.

      Troubleshooting: Mixing and matching MariaDB 10.4.8 (older) and Galera 26.4.3 (newer) seemed to reduce the frequency of it happening, but it still happens. SST doesn't seem to have an issue.

      Problem trigger: This situation is irritated into happening by temporary network loss and generally reproducible by the use of iptables to block cluster replication for a short period of time and then allowing the system to IST re-sync.

        Attachments

        1. mariadb.txt
          73 kB
        2. my.cnf
          0.4 kB

          Issue Links

            Activity

              People

              Assignee:
              jplindst Jan Lindström
              Reporter:
              jyusb Justin Y
              Votes:
              6 Vote for this issue
              Watchers:
              16 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: