[MXS-3088] Support for replication lag monitoring/on-demand availability - Jira

XML

Word

Printable

Details

Type: New Feature
Status: Closed (View Workflow)
Priority: Major
Resolution: Won't Do
Affects Version/s: None
Fix Version/s: N/A
Component/s: readconnroute
Labels:
None

Description

Hello folks,
We would like MaxScale to have the ability to control when replicas are available or placed on a "standby" mode, whenever 2
specific thresholds are reached:

1. replication lag greater then X seconds

Queries returned by replicas lagging too far behind their primary server can possibly return stale/wrong data. In order to prevent
wrong information sent back to the client, we would like to prevent new queries hitting the replica with replication lag greater than
a given threshold.

2. number of active queries is greater then Y queries

Clients would like the ability to prevent queries from hitting a server after a given number of active queries has been reached.
This can be for a variety of reasons, i.e.: application design, on-going backups causing locks, etc ...

We would like MaxScale to prevent new query requests from being sent to a replica whenever one of the 2 thresholds above
are exceeded. A new State within MaxScale would show the servers which are affected by the above as
"standby (throttled)" (or something else you deem more appropriate) and also a new column showing the lag, example below:

┌───────────────┬────────────────┬──────┬─────────────┬─────────────────┬────────────────────────────┐─────────────────┐

│ Server        │ Address        │ Port │    Lag      |   Connections   │    State                   │    GTID         │

├───────────────┼────────────────┼──────┼─────────────┼─────────────────┼────────────────────────────┤─────────────────┤

│ dbServer1     │ 192.168.88.101 │ 3306 │      0      |     20          │ Master, Running            │ 0-8180-15692671 │

├───────────────┼────────────────┼──────┼─────────────┼─────────────────┼────────────────────────────┤─────────────────┤

│ dbServer2     │ 192.168.88.102 │ 3306 │      0      |     40          │ Slave, Running             │ 0-8180-15692671 │

├───────────────┼────────────────┼──────┼─────────────┼─────────────────┼────────────────────────────┤─────────────────┤

│ dbServer3     │ 192.168.88.103 │ 3306 │     500     |     40          │ Slave, Standby(throttled)  │ 0-8180-15690132 │

└───────────────┴────────────────┴──────┴─────────────┴─────────────────┴────────────────────────────┘─────────────────┘

Both thresholds should be independent of each other and these settings should be dynamic and no restart required.
We could have a failsafe logic, and if only 1 replica is available, these 2 thresholds would be ignored.
Once threshold is cleared (i.e. lag falls below it), replicas are automatically made available and the state is updated.
Existing queries are not affected.

Scenario:

1. replication lag greater then X seconds or
2. number of active queries is greater then Y queries

If #1 or #2 is met, new queries are not sent to replicas matching that threshold.
Once #1 or #2 falls below the configured threshold, new queries can be routed to the replica again.

Attachments

Activity

People

Assignee:: Todd Stoffel (Inactive)

Reporter:: Daniel Almeida (Inactive)

Votes:: 1 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2020-07-22 13:42

Updated:: 2024-10-03 15:53

Resolved:: 2022-09-08 08:55

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.