[MXS-4779] Maxscale monitor suddenly loses entire cluster status (galeramon) Created: 2023-09-27 Updated: 2023-10-25 Resolved: 2023-10-10 |
|
| Status: | Closed |
| Project: | MariaDB MaxScale |
| Component/s: | galeramon |
| Affects Version/s: | 6.4.10 |
| Fix Version/s: | 6.4.11 |
| Type: | Bug | Priority: | Major |
| Reporter: | Rick Pizzi | Assignee: | markus makela |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | triage | ||
| Environment: |
3 maxscale nodes behind ALB, aws vms |
||
| Description |
|
A 5-node galera cluster loses two nodes (NODE03 and NODE04) within a couple minutes due to OOM events.
|
| Comments |
| Comment by markus makela [ 2023-09-27 ] | |||||||||
|
I think one improvement that could be done is to store the last reason why a node lost the Synced status in the monitor and report that in the state change messages. | |||||||||
| Comment by Rick Pizzi [ 2023-09-27 ] | |||||||||
|
Looking again at the logs, this happened two times the same afternoon.
| |||||||||
| Comment by markus makela [ 2023-09-27 ] | |||||||||
|
This could be somehow related to how the cluster UUID is calculated (i.e. set_galera_cluster() and calculate_cluster()) and used to see whether the nodes are in the same cluster. | |||||||||
| Comment by markus makela [ 2023-10-10 ] | |||||||||
|
The relevant Galera variables are now logged in the log message which will explain why the Synced status is lost. |