[MXS-3490] Xpand monitor should detect and handle group change explicitly - Jira

XML

Word

Printable

Details

Type: New Feature
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 22.08.0
Component/s: xpandmon
Labels:
None

Sprint:
MXS-SPRINT-153, MXS-SPRINT-154

Description

Currently the Xpand monitor treats group change errors as any other error. That is, it'll cause the monitor to abandon the current "hub" (the Xpand node it uses for fetching cluster topology information) and connect to another node, which will fail with a group change error. After that the monitor will at regular intervals connect to each node, which will fail, until the group change is over.

At the same time, the monitor will ping the health check port of each node and but for a node that is removed, it will continue to return OK. That is, as far as any routers are concerned those nodes/servers appear to be ready to use. However, that's just an appearance as any attempt to use them will end with a group change error.

This means that there will be an awful amount of activity and error handling that simply cannot be resolved before the group change is over. Thus, the Xpand monitor:

should detect whenever a monitor operation fails due to a group change, and in that case
stop the normal health check ping,
mark all servers (internally) as being down,
regularly connect in order to find out whether the group change has finished, and in that case
check the cluster configuration and remove/add servers, and
turn on the regular health check ping, which will cause the servers to be marked as being up.

That way a great deal of activity will basically stop for the duration of the group change. Until the group change is over, there is no point in doing anything else than checking whether the group change is over.

Attachments

Issue Links

split from

MXS-3472 Transaction Replay: transactions not replayed after Xpand group change

Closed

Activity

People

Assignee:: Johan Wikman

Reporter:: Johan Wikman

Votes:: 1 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 2021-04-12 07:00

Updated:: 2024-10-03 15:53

Resolved:: 2022-04-08 11:25

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.