Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-3312

Initiate PM failover on PM service down

    XMLWordPrintable

Details

    • New Feature
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Won't Do
    • None
    • Icebox
    • N/A
    • None

    Description

      As of version 1.2.3 CS has automatic PM failover of DBRoots, but it seems it only works if a PM dies completely at networking level (shut down the OS or firewall out the node). It does not work if only some or all of the CS processes on the PM stop working or responding, in which case we get a broken cluster.

      Simple way to check: in a multi-PM cluster with failover enabled (external storage), do a "systemctl stop columnstore" on one PM. This should trigger the failover as the PM is effectively gone (despite the OS being up and responding to ICMP probes) - but it does not.

      Another way to define the change would be to say that keepalive checks (heartbeats) should be at TCP level against one or more CS services and not at networking level as ICMP probes.

      Attachments

        Activity

          People

            toddstoffel Todd Stoffel (Inactive)
            assen.totin Assen Totin (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.