[MXS-2488] Master Recovery With Delayed Slaves - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Minor
Resolution: Cannot Reproduce
Affects Version/s: 2.3.6
Fix Version/s: N/A
Component/s: failover
Labels:
None

Description

In a situation where a master has two slaves on delayed replication and the master goes down, Maxscale cannot promote a slave due to replication lag until the delay has expired.

However, if the original master becomes available again before the slave delay expires, the master should automatically resume it's previous role.

Attachments

Activity

markus makela added a comment - 2021-08-25 10:10 - edited

This doesn't seem to be a problem anymore:

2021-08-25 13:08:56   error  : Monitor timed out when connecting to server server1[127.0.0.1:3000] : 'Lost connection to server at 'handshake: reading initial communication packet', system error: 110'

2021-08-25 13:08:56   notice : Server changed state: server1[127.0.0.1:3000]: master_down. [Master, Running] -> [Down]

2021-08-25 13:08:56   warning: [mariadbmon] Master has failed. If master does not return in 4 monitor tick(s), failover begins.

2021-08-25 13:09:14   notice : [mariadbmon] Selecting a server to promote and replace 'server1'. Candidates are: 'server2', 'server3', 'server4'.

2021-08-25 13:09:14   warning: [mariadbmon] Slave 'server2' has gtid_strict_mode disabled. Enabling this setting is recommended. For more information, see https://mariadb.com/kb/en/library/gtid/#gtid_strict_mode

2021-08-25 13:09:14   warning: [mariadbmon] Slave 'server3' has gtid_strict_mode disabled. Enabling this setting is recommended. For more information, see https://mariadb.com/kb/en/library/gtid/#gtid_strict_mode

2021-08-25 13:09:14   warning: [mariadbmon] Slave 'server4' has gtid_strict_mode disabled. Enabling this setting is recommended. For more information, see https://mariadb.com/kb/en/library/gtid/#gtid_strict_mode

2021-08-25 13:09:14   notice : [mariadbmon] Selected 'server2'.

2021-08-25 13:09:14   warning: [mariadbmon] The relay log of 'server2' has 5 unprocessed events (Gtid_IO_Pos: 0-3000-34, Gtid_Current_Pos: 0-3000-29). To avoid data loss, failover is postponed until the log has been processed.

2021-08-25 13:09:14   warning: [mariadbmon] Not performing automatic failover. Will keep retrying with most error messages suppressed.

2021-08-25 13:09:47   notice : Server changed state: server1[127.0.0.1:3000]: master_up. [Down] -> [Master, Running]

This was tested with a replication delay of 3000 seconds to make sure the servers never catch up.

markus makela added a comment - 2021-08-25 10:10 - edited This doesn't seem to be a problem anymore: 2021-08-25 13:08:56 error : Monitor timed out when connecting to server server1[127.0.0.1:3000] : 'Lost connection to server at 'handshake: reading initial communication packet', system error: 110' 2021-08-25 13:08:56 notice : Server changed state: server1[127.0.0.1:3000]: master_down. [Master, Running] -> [Down] 2021-08-25 13:08:56 warning: [mariadbmon] Master has failed. If master does not return in 4 monitor tick(s), failover begins. 2021-08-25 13:09:14 notice : [mariadbmon] Selecting a server to promote and replace 'server1'. Candidates are: 'server2', 'server3', 'server4'. 2021-08-25 13:09:14 warning: [mariadbmon] Slave 'server2' has gtid_strict_mode disabled. Enabling this setting is recommended. For more information, see https://mariadb.com/kb/en/library/gtid/#gtid_strict_mode 2021-08-25 13:09:14 warning: [mariadbmon] Slave 'server3' has gtid_strict_mode disabled. Enabling this setting is recommended. For more information, see https://mariadb.com/kb/en/library/gtid/#gtid_strict_mode 2021-08-25 13:09:14 warning: [mariadbmon] Slave 'server4' has gtid_strict_mode disabled. Enabling this setting is recommended. For more information, see https://mariadb.com/kb/en/library/gtid/#gtid_strict_mode 2021-08-25 13:09:14 notice : [mariadbmon] Selected 'server2'. 2021-08-25 13:09:14 warning: [mariadbmon] The relay log of 'server2' has 5 unprocessed events (Gtid_IO_Pos: 0-3000-34, Gtid_Current_Pos: 0-3000-29). To avoid data loss, failover is postponed until the log has been processed. 2021-08-25 13:09:14 warning: [mariadbmon] Not performing automatic failover. Will keep retrying with most error messages suppressed. 2021-08-25 13:09:47 notice : Server changed state: server1[127.0.0.1:3000]: master_up. [Down] -> [Master, Running] This was tested with a replication delay of 3000 seconds to make sure the servers never catch up.

People

Assignee:: Unassigned

Reporter:: Todd Stoffel (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2019-05-15 03:22

Updated:: 2021-08-25 10:11

Resolved:: 2021-08-25 10:11

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB MaxScale