Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26869

Mariadb going to non-primary after one node leaves the cluster while doing host shutdown.

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.2.32, 10.4(EOL)
    • 10.3(EOL)
    • Galera
    • None

    Description

      We are facing the issue in mariadb galera cluster deployed in k8. We are having a problem where if there is power issue in one of the node hosting one of the galera other two galera nodes goes non primary. What we have seen and proved that network was stable between the nodes on 4567 ports that were left in cluster. After enabling the debug logs I was able to see that only difference between when it doesn't happen and when it happens is install message was never being exchanged between the nodes.
      This is reproduce-able in both 10.2 and 10.4 with both galleria3 and galera4.
      Good scenarios where cluster didn't die has this message exchanged

      021-10-15T19:29:22.390596315Z stderr F 2021-10-15 19:29:22,390 - OpenStack-Helm Mariadb - INFO - b'2021-10-15 19:29:22 140341999027968 [Note] [Debug] WSREP: gcomm/src/pc_proto.cpp:handle_install():1103: cd32f6ad handle install from a4cb7bc7 pcmsg{ type=INSTALL, seq=0, flags= 0, node_map {\ta4cb7bc7,prim=1,un=0,last_seq=58,last_prim=view_id(PRIM,a4cb7bc7,19),to_seq=245997,weight=1,segment=0'

      For Bad node scenario

      2021-10-15T19:29:54.4610095Z stderr F 2021-10-15 19:29:54,460 - OpenStack-Helm Mariadb - INFO - b'2021-10-15 19:29:54 140712551864064 [Warning] WSREP: gcomm/src/evs_proto.cpp:handle_install_timer():690: evs::proto(be28c9b9, GATHER, view_id(REG,17423d3f,11)) install timer expired'

      if two nodes are survivor, the cluster should survive. The issue is delaying the production readiness testing and defeating the purpose of the clustering in first place.

      I have attached the debug logs and pcap supporting the argument this was not the network issue.

      Attachments

        Activity

          People

            seppo Seppo Jaakola
            jas Jasvinder singh kwatra
            Votes:
            2 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.