[MDEV-25855] Added support for Galera replication with cluster auto bootstrapping Created: 2021-06-04  Updated: 2023-09-19

Status: Stalled
Project: MariaDB Server
Component/s: Docker, Galera
Fix Version/s: 10.2

Type: Task Priority: Major
Reporter: Daniel Black Assignee: Unassigned
Resolution: Unresolved Votes: 2
Labels: containers, contribution


 Description   

https://github.com/MariaDB/mariadb-docker/pull/377 is a community contribution to start a galeracluster automatically. It contains the logical to pick the first wsrep-cluster-address address as the first node for bootstrap.

Is the logical around this and the gvwstate.dat state correct?

Are all the of the constructs around hostname correct?

Are their any obvious configuration settings that would impact the correct bootstrapping of the cluster using this?

Recovery from an entire cluster down scenario will probably come as a separate change.



 Comments   
Comment by Daniel Black [ 2021-07-07 ]

Thanks for looking jplindst

Comment by Daniel Black [ 2022-02-15 ]

from https://github.com/MariaDB/mariadb-docker/pull/377:

I've based and squashed the commits up. Shell check changed a few things. As a basic bootstrap its ok. I'm still looking at what crash recovery would look like. Probably need to make our own state transition diagram.

Taking recovery modes from https://galeracluster.com/library/documentation/crash-recovery.html assuming a 3 node cluster.

Node 1 Is Gracefully Stopped

Indicated by:

  • populated datadir
  • grastate.dat containing uuid and sequence number

Action:

  • start mariadbd as normal

Two Nodes Are Gracefully Stopped

Indicated by ( as per 1 node)

Action:

  • start mariadbd as normal

As per above galera documentation, this won't be an ideal start as the donor determination isn't optimal.

All Three Nodes Are Gracefully Stopped

Indications, per single node.

Action:

  • start mariadbd as normal
  • background recovery script to determine most advanced sequence number. Then "SET GLOBAL wsrep_provider_options='pc.bootstrap=true'".

Galera docs indicate that using --wsrep-new-cluster is a away, but the two node crash scenariou uses pc.bootstrap=true, so I'm assuming its equivant.

One Node Disappears from the Cluster

Indicated by:

  • Broken grastate file (seqno: -1)

Action:

  • start mariadbd as normal

Two Nodes Disappear from the Cluster

Indicated by:

  • Broken gratstate

Action:

  • Crashed nodes need to start
  • Advanced sequence number need to be identified.
  • Unlike clean shutdown, grastate cannot be examined, the sequence number from running instance is needed

All Nodes Go Down Without a Proper Shutdown Procedure

  • Use wsrep-recover=1 to determine sequence number.
  • By quorum, determine most advanced node.
  • safe_to_bootstrap: 1 in grastate file recommended, I don't know why not "SET GLOBAL wsrep_provider_options='pc.bootstrap=true'"

gvwstate.dat appears to facilitate auto-recovery

The Cluster Loses its Primary State Due to Split Brain

Indicated by:

  • cluster has 50% of its membership exactly

Action:

  • SET GLOBAL wsrep_provider_options='pc.bootstrap=true'; on user defined node
Comment by Daniel Black [ 2023-09-19 ]

MariaDB operator is progressing well with this I think
https://github.com/mariadb-operator/mariadb-operator

Generated at Thu Feb 08 09:40:54 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.