Details
-
Task
-
Status: Stalled (View Workflow)
-
Major
-
Resolution: Unresolved
-
23.02.14, 23.10.4
-
None
-
2025-10
Description
Consider 3 nodes cluster. Non-primary node lost network connection and triggered failover. In this situation agent decided to shutdown the node b/c it assuming there is a network outage. Later when network outage has been resolved primary communicates with shutdown node first and initiated cluster rebuild adding the failed node back. At the same time CMAPI Failover Agent does the same and after cluster has been reformed following requests sent by the primary Agent triggered another sequence of restarts adding one node at time to reform a cluster.
Attachments
Issue Links
- relates to
-
MCOL-5508 make timeout configurable " A node is unresponsive for cmd = 4, no reconfigure in at least 300 seconds. Setting read-only mode."
-
- Closed
-
-
MCOL-6092 If CMAPI is manually stopped optionaly take the node with stopped CMAPI from failover
-
- Open
-
-
MCOL-6094 controlernode left shmem locks behind forcing save_brm to fail b/c of the locks
-
- Closed
-