Details
-
New Feature
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
None
-
None
-
MXS-SPRINT-127, MXS-SPRINT-128
Description
Hi Team,
Recently, customer have observed one scenario where master was down due to some reason and maxscale said that "Master has failed. If master status does not change in 4 monitor passes, failover begins." but before failover happened, master started and maxscale set it to "Slave, Running" .
2020-10-17 01:14:19 error : Monitor was unable to connect to server node2[10.232.86.133:6603] : ''
|
2020-10-17 01:14:19 notice : Server changed state: node2[10.232.86.133:6603]: master_down. [Master, Running] -> [Down]
|
2020-10-17 01:14:19 warning: [mariadbmon] Master has failed. If master status does not change in 4 monitor passes, failover begins.
|
2020-10-17 01:14:44 warning: [mariadbmon] The current master server 'node2' is no longer valid because it is in read-only mode, but there is no valid alternative to swap to.
|
2020-10-17 01:14:44 error : [mariadbmon] No Master can be determined. Last known was 10.232.86.133:6603
|
2020-10-17 01:14:44 notice : Server changed state: node2[10.232.86.133:6603]: slave_up. [Down] -> [Slave, Running]
|
So now they had three node with "Slave, Running" and Maxscale didn't make any of the server to master. Finally, they had to restart node2 server and then failover happened.
2020-10-17 01:54:01 error : Monitor was unable to connect to server node2[10.232.86.133:6603] : ''
|
2020-10-17 01:54:01 error : [mariadbmon] No Master can be determined. Last known was 10.232.86.133:6603
|
2020-10-17 01:54:01 notice : Server changed state: node2[10.232.86.133:6603]: slave_down. [Slave, Running] -> [Down]
|
2020-10-17 01:54:01 warning: [mariadbmon] Master has failed. If master status does not change in 4 monitor passes, failover begins.
|
...
|
2020-10-17 01:54:21 notice : [mariadbmon] Selecting a server to promote and replace 'node2'. Candidates are: 'node1', 'node3'.
|
2020-10-17 01:54:21 notice : [mariadbmon] Selected 'node1'.
|
2020-10-17 01:54:21 notice : [mariadbmon] Performing automatic failover to replace failed master 'node2'.
|
2020-10-17 01:54:21 notice : [mariadbmon] Redirecting 'node3' to replicate from 'node1' instead of 'node2'.
|
2020-10-17 01:54:21 notice : [mariadbmon] All redirects successful.
|
2020-10-17 01:54:22 notice : [mariadbmon] All redirected slaves successfully started replication from 'node1'.
|
2020-10-17 01:54:22 notice : [mariadbmon] Failover 'node2' -> 'node1' performed.
|
Can we add some functionality in maxscale which can check about master server frequently ?
and if there is no master then based on GTID, it can decide which has latest ID and make it master and other nodes to slaves?