Details
-
Bug
-
Status: Closed (View Workflow)
-
Minor
-
Resolution: Cannot Reproduce
-
2.3.6
-
None
Description
In a situation where a master has two slaves on delayed replication and the master goes down, Maxscale cannot promote a slave due to replication lag until the delay has expired.
However, if the original master becomes available again before the slave delay expires, the master should automatically resume it's previous role.
This doesn't seem to be a problem anymore:
2021-08-25 13:08:56 error : Monitor timed out when connecting to server server1[127.0.0.1:3000] : 'Lost connection to server at 'handshake: reading initial communication packet', system error: 110'
2021-08-25 13:08:56 notice : Server changed state: server1[127.0.0.1:3000]: master_down. [Master, Running] -> [Down]
2021-08-25 13:08:56 warning: [mariadbmon] Master has failed. If master does not return in 4 monitor tick(s), failover begins.
2021-08-25 13:09:14 notice : [mariadbmon] Selecting a server to promote and replace 'server1'. Candidates are: 'server2', 'server3', 'server4'.
2021-08-25 13:09:14 warning: [mariadbmon] Slave 'server2' has gtid_strict_mode disabled. Enabling this setting is recommended. For more information, see https://mariadb.com/kb/en/library/gtid/#gtid_strict_mode
2021-08-25 13:09:14 warning: [mariadbmon] Slave 'server3' has gtid_strict_mode disabled. Enabling this setting is recommended. For more information, see https://mariadb.com/kb/en/library/gtid/#gtid_strict_mode
2021-08-25 13:09:14 warning: [mariadbmon] Slave 'server4' has gtid_strict_mode disabled. Enabling this setting is recommended. For more information, see https://mariadb.com/kb/en/library/gtid/#gtid_strict_mode
2021-08-25 13:09:14 notice : [mariadbmon] Selected 'server2'.
2021-08-25 13:09:14 warning: [mariadbmon] The relay log of 'server2' has 5 unprocessed events (Gtid_IO_Pos: 0-3000-34, Gtid_Current_Pos: 0-3000-29). To avoid data loss, failover is postponed until the log has been processed.
2021-08-25 13:09:14 warning: [mariadbmon] Not performing automatic failover. Will keep retrying with most error messages suppressed.
2021-08-25 13:09:47 notice : Server changed state: server1[127.0.0.1:3000]: master_up. [Down] -> [Master, Running]
This was tested with a replication delay of 3000 seconds to make sure the servers never catch up.