[MXS-1679] Maxscale does not detect failover executed by another Maxscale in 2 Maxscales + keepalived configuration Created: 2018-02-21  Updated: 2018-03-19  Resolved: 2018-03-19

Status: Closed
Project: MariaDB MaxScale
Component/s: failover, mariadbmon
Affects Version/s: None
Fix Version/s: 2.2.4

Type: Bug Priority: Major
Reporter: Timofey Turenko Assignee: Esa Korhonen
Resolution: Fixed Votes: 0
Labels: None

Attachments: File maxscale001.log    
Sprint: MXS-SPRINT-53

 Description   

Test:

two Maxscales + keepalived
1 Master, 3 slaves

1. Start both Maxscales, 000 active, 001 passive
2. kill Master
3. check Maxscale 000 log for currect failover messages, check servers status
4. stop Maxscale 000
5. check that Maxscale 001 is active now
6. kill new Master
7. check Maxscale 001 logs for failover messages, check servers status

Expected result: third node is master
Actual result: 2 slaves, {{warning: [mariadbmon] Failover of server 'server1' did not take place within 90 seconds, failover needs to be re-triggered }}



 Comments   
Comment by Esa Korhonen [ 2018-03-12 ]

Using latest 2.2, I cannot reproduce this. After step 6, the second MaxScale is active and notices the master [Down] and performs failover.

I also tried shutting down the master server and then MaxScale000 before it could do failover. In this case it takes longer, as MaxScale001 waits 90 seconds before deciding that it's up to it to do the thing.

2018-03-12 15:02:22 notice : Updated 'passive' from 'true' to 'false'
2018-03-12 15:03:38 warning: [mariadbmon] Failover of server 'server1' did not take place within 90 seconds, failover needs to be re-triggered
2018-03-12 15:03:38 notice : [mariadbmon] Performing automatic failover to replace failed master 'server1'.

Generated at Thu Feb 08 04:08:36 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.