[MDEV-29950] one galera node got hardware issue but caused other 2 nodes split brain Created: 2022-11-05  Updated: 2022-11-05

Status: Open
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.6.7
Fix Version/s: None

Type: Bug Priority: Major
Reporter: William Wong Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Environment:

redhat x86-64 on vmware


Attachments: File galera-garbd-wfsesub_esptdb1.cnf     Text File galera-garbd-wfsesub_esptdb1.log     Text File p1vdbs-wfsesub-esptdb1-1-01.log     File p1vdbs-wfsesub-esptdb1-1-01.mariadb.cnf     Text File p2vdbs-wfsesub-esptdb1-2-01.log     File p2vdbs-wfsesub-esptdb1-2-01.mariadb.cnf    

 Description   

our galera cluster is 3 nodes configration (2 db nodes + 1 arbitrator). 2 days ago, one db node is down due to hardware issue. The remaining db node and arbitrator got split brain and db service down.

Checked from log, remaining nodes do not have message of each other until the dead node is confirmed down. There is around 10s time. We don't know why the good nodes do not declare each other stable in this 10s.

Kindly advise the directory to troubleshoot the problem.

Only 2 galera timeout are set while other timeout settings are still default values.

gmcast.peer_timeout=PT10S;
evs.suspect_timeout=PT12S;

DB configuration file and error log of each node are attached


Generated at Thu Feb 08 10:12:30 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.