[MCOL-3312] Initiate PM failover on PM service down Created: 2019-05-15  Updated: 2023-07-02  Resolved: 2023-07-02

Status: Closed
Project: MariaDB ColumnStore
Component/s: N/A
Affects Version/s: None
Fix Version/s: Icebox

Type: New Feature Priority: Major
Reporter: Assen Totin (Inactive) Assignee: Todd Stoffel (Inactive)
Resolution: Won't Do Votes: 0
Labels: None

Epic Link: ColumnStore Failover Improvements

 Description   

As of version 1.2.3 CS has automatic PM failover of DBRoots, but it seems it only works if a PM dies completely at networking level (shut down the OS or firewall out the node). It does not work if only some or all of the CS processes on the PM stop working or responding, in which case we get a broken cluster.

Simple way to check: in a multi-PM cluster with failover enabled (external storage), do a "systemctl stop columnstore" on one PM. This should trigger the failover as the PM is effectively gone (despite the OS being up and responding to ICMP probes) - but it does not.

Another way to define the change would be to say that keepalive checks (heartbeats) should be at TCP level against one or more CS services and not at networking level as ICMP probes.



 Comments   
Comment by Todd Stoffel (Inactive) [ 2023-07-02 ]

The "create date" on this ticket is pre-convergence with MariaDB server. If the issue still exists in a modern version of the engine/plugin please submit a new ticket.

Generated at Thu Feb 08 02:41:47 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.