Details
-
Epic
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
-
High Availability
Description
This epic focuses on enhancing the high availability (HA) and fault tolerance of MariaDB ColumnStore deployments to ensure greater resilience in production environments. The goal is to reduce single points of failure, improve failover behavior, and support more seamless recovery across multi-node clusters and containerized environments.
We aim to make HA easier to set up, operate, and trust, prioritizing solutions that work with minimal manual intervention and integrate cleanly with MariaDB Server, CMAPI, and industry-standard orchestration tools.
Key objectives include:
- Improve fault detection and recovery across nodes
- Enhance CMAPI awareness of node health and quorum state
- Support automated rebalancing and safe rejoining of failed nodes
- Implement new features such as replication factor/ data mirroring for higher uptime requirements
- Implement rolling cluster upgrades for zero downtime to support higher uptime requirements
- Align behavior with standard clustering tools (e.g., Kubernetes, systemd, Pacemaker)
- Simplify HA setup and observability for operators
These improvements will allow ColumnStore to be reliably deployed in always-on, mission-critical environments with confidence, minimizing down and more.