[MXS-2036] A slave with sql thread stopped causes wrong master after failover - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 2.2.13
Fix Version/s: 2.2.14
Component/s: mariadbmon
Labels:
None

Sprint:
MXS-SPRINT-65

Description

As described in the support ticket. In short, a slave with IO thread running but SQL thread stopped is in limbo, and causes wrong master to be selected after a failover unless the new master has other slaves.

This is again an effect of the way the 2.2 monitor works. The slave which is still connected or trying to connect to the master (IO thread is on or connecting) but not actually replicating (sql thread is off) is counted as a slave of that node, even if the master node is down. During switchover/failover, servers with a broken slave sql thread are not redirected (since they are not real slaves and cannot replicate from the new master anyway). This difference produces the weird result where the old master gets to be master even after failover. In 2.3 this doesn't happen because the monitor works differently.

Fixing this in 2.2 requires choosing between changing the master selection code or the failover/switchover code. I will try with the latter, since changing the former could affect various other places as well.

Attachments

Activity

People

Assignee:: Esa Korhonen

Reporter:: Esa Korhonen

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 2018-09-04 14:05

Updated:: 2024-07-07 17:28

Resolved:: 2018-09-11 09:46

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.