Uploaded image for project: 'MariaDB MaxScale'
  1. MariaDB MaxScale
  2. MXS-3088

Support for replication lag monitoring/on-demand availability

    XMLWordPrintable

Details

    • New Feature
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Won't Do
    • None
    • N/A
    • readconnroute
    • None

    Description

      Hello folks,
      We would like MaxScale to have the ability to control when replicas are available or placed on a "standby" mode, whenever 2
      specific thresholds are reached:

      1. replication lag greater then X seconds

      Queries returned by replicas lagging too far behind their primary server can possibly return stale/wrong data. In order to prevent
      wrong information sent back to the client, we would like to prevent new queries hitting the replica with replication lag greater than
      a given threshold.

      2. number of active queries is greater then Y queries

      Clients would like the ability to prevent queries from hitting a server after a given number of active queries has been reached.
      This can be for a variety of reasons, i.e.: application design, on-going backups causing locks, etc ...

      We would like MaxScale to prevent new query requests from being sent to a replica whenever one of the 2 thresholds above
      are exceeded. A new State within MaxScale would show the servers which are affected by the above as
      "standby (throttled)" (or something else you deem more appropriate) and also a new column showing the lag, example below:

      ┌───────────────┬────────────────┬──────┬─────────────┬─────────────────┬────────────────────────────┐─────────────────┐
      │ Server        │ Address        │ Port │    Lag      |   Connections   │    State                   │    GTID         │
      ├───────────────┼────────────────┼──────┼─────────────┼─────────────────┼────────────────────────────┤─────────────────┤
      │ dbServer1     │ 192.168.88.101 │ 3306 │      0      |     20          │ Master, Running            │ 0-8180-15692671 │
      ├───────────────┼────────────────┼──────┼─────────────┼─────────────────┼────────────────────────────┤─────────────────┤
      │ dbServer2     │ 192.168.88.102 │ 3306 │      0      |     40          │ Slave, Running             │ 0-8180-15692671 │
      ├───────────────┼────────────────┼──────┼─────────────┼─────────────────┼────────────────────────────┤─────────────────┤
      │ dbServer3     │ 192.168.88.103 │ 3306 │     500     |     40          │ Slave, Standby(throttled)  │ 0-8180-15690132 │
      └───────────────┴────────────────┴──────┴─────────────┴─────────────────┴────────────────────────────┘─────────────────┘
      
      

      Both thresholds should be independent of each other and these settings should be dynamic and no restart required.
      We could have a failsafe logic, and if only 1 replica is available, these 2 thresholds would be ignored.
      Once threshold is cleared (i.e. lag falls below it), replicas are automatically made available and the state is updated.
      Existing queries are not affected.

      Scenario:

      1. replication lag greater then X seconds or
      2. number of active queries is greater then Y queries

      If #1 or #2 is met, new queries are not sent to replicas matching that threshold.
      Once #1 or #2 falls below the configured threshold, new queries can be routed to the replica again.

      Attachments

        Activity

          People

            toddstoffel Todd Stoffel (Inactive)
            dalmeida Daniel Almeida (Inactive)
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.