Status: Closed (View Workflow)
Resolution: Fixed
3 maxscale nodes behind ALB, aws vms
A 5-node galera cluster loses two nodes (NODE03 and NODE04) within a couple minutes due to OOM events.
The cluster reconfigures and remains healthy with the remaining 3 nodes.
However, maxscale loses status for ALL nodes and causes an outage.
2023-09-25 14:07:29.827 error : (mon_report_query_error): Failed to execute query on server 'NODE04' ([]:3306): Lost connection to server during query
2023-09-25 14:08:10.003 notice : (log_state_change): Server changed state: NODE04[]: slave_down. [Slave, Synced, Running] -> [Down]
2023-09-25 14:09:06.851 error : (985612) (NODE03); (socket_write): Write to Backend DCB in state DCB::State::POLLING failed: 104, Connection reset by peer
2023-09-25 14:09:30.579 error : [galeramon] (post_tick): There are no cluster members
2023-09-25 14:09:30.579 notice : (log_state_change): Server changed state: NODE01[]: lost_master. [Master, Synced, Running] -> [Running]
2023-09-25 14:09:30.579 notice : (log_state_change): Server changed state: NODE02[]: lost_slave. [Slave, Synced, Running] -> [Running]
2023-09-25 14:09:30.579 notice : (log_state_change): Server changed state: NODE03[]: slave_down. [Slave, Synced, Running] -> [Down]
2023-09-25 14:09:30.579 notice : (log_state_change): Server changed state: NODE05[]: lost_slave. [Slave, Synced, Running] -> [Running]
2023-09-25 14:09:30.579 notice : (log_state_change): Server changed state: NODER02[]: lost_slave. [Slave, Running] -> [Running]
2023-09-25 14:09:30.579 notice : (log_state_change): Server changed state: NODER03[]: lost_slave. [Slave, Running] -> [Running]
2023-09-25 14:09:30.579 notice : (log_state_change): Server changed state: NODER04[]: lost_slave. [Slave, Running] -> [Running]2023-09-25 2023-09-25 14:09:30.594 error : (987213) [readwritesplit] (rwsplit-service); (open_connections): Couldn't find suitable Master from 5 candidates.