Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-36140

Mariadb Galera node not able to join the primary component if it looses connectivity with one of node in the primary component

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 11.4.4
    • 11.4
    • Galera
    • None

    Description

      We are using 11.4.4 version of Mariadb Galera, but that is also seen in 10.6.4 release as well.

      We have 3 node cluster. We ran into a situation where there's a connectivity loss between only 2 nodes out of 3 nodes. Say N1 , N2 and N3 are three nodes. N1 - N3 connectivity got lost but N1 - N2 and N2 - N3 are still intact.

      In that process, N1 got restarted, but now it doesn't join the cluster and it just keeps restarting.

      N1 - 11.127.4.37
      N2 - 11.127.5.37
      N3 - 11.127.6.37
      Here is cluster address parameter on 11.127.4.37 :
      wsrep_cluster_address = gcomm://11.127.4.37,11.127.5.37,11.127.6.37

      N2 and N3 are connected and cluster members shows them as connected with 2 nodes in the cluster. When N1 is restarting, it is trying to connect to both N2 and N3, but as N1 and N3 connectivity is down, they cannot connect. N1 is able to connect to N2.

      There is an error - failed to open gcomm backend connection: 110: failed to reach primary view

      Here are few related logs:
      *******************************************************************************************
      2025-02-20 11:48:26 0 [Note] WSREP: (d4c47019-bc8c, 'tcp://0.0.0.0:4567') connection established to bdf18832-92a5 tcp://11.127.5.37:4567
      2025-02-20 11:48:29 0 [Note] WSREP: (d4c47019-bc8c, 'tcp://0.0.0.0:4567') turning message relay requesting off
      2025-02-20 11:48:30 0 [Note] WSREP: (d4c47019-bc8c, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://11.127.6.37:4567 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 2000000 lost: 1 last_data_recv: 165364883 cwnd: 1 last_queued_since: 165664882973365 last_delivered_since: 165664882973365 send_queue_length: 0 send_queue_bytes: 0
      2025-02-20 11:48:30 0 [Note] WSREP: Failed to establish connection: Operation aborted.
      2025-02-20 11:48:30 0 [Note] WSREP: view(view_id(NON_PRIM,d4c47019-bc8c,27) memb

      { d4c47019-bc8c,0 }

      joined {
      } left {
      } partitioned

      { 252ce51f-aee4,0 bdf18832-92a5,0 }

      )
      2025-02-20 11:48:33 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view
      at /bitnami/blacksmith-sandox/libgalera-26.4.21/gcomm/src/pc.cpp:connect():160
      2025-02-20 11:48:33 0 [ERROR] WSREP: /bitnami/blacksmith-sandox/libgalera-26.4.21/gcs/src/gcs_core.cpp:gcs_core_open():256: Failed to open backend connection: -110 (Connection timed out)
      2025-02-20 11:48:33 0 [Note] WSREP: Failed to establish connection: Operation aborted.
      2025-02-20 11:48:54 0 [ERROR] WSREP: /bitnami/blacksmith-sandox/libgalera-26.4.21/gcs/src/gcs.cpp:gcs_open():1701: Failed to open channel 'nrdGalera' at 'gcomm://11.127.4.37,11.127.5.37,11.127.6.37': -110 (Connection timed out)
      2025-02-20 11:48:54 0 [ERROR] WSREP: gcs connect failed: Operation timed out
      2025-02-20 11:48:54 0 [ERROR] WSREP: wsrep::connect(gcomm://11.127.4.37,11.127.5.37,11.127.6.37) failed: 7
      2025-02-20 11:48:54 0 [ERROR] Aborting
      *******************************************************************************************

      Is it expected behavior ? If so, please provide the documentation link for the galera arbitration process. Thanks!

      Attachments

        Activity

          Sahai Har Gagan added a comment -

          Any update on this is appreciated. Thanks!

          Sahai Har Gagan added a comment - Any update on this is appreciated. Thanks!

          People

            sysprg Julius Goryavsky
            Sahai Har Gagan
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.