Uploaded image for project: 'MariaDB MaxScale'
  1. MariaDB MaxScale
  2. MXS-3088

Support for replication lag monitoring/on-demand availability

    XMLWordPrintable

    Details

    • Type: New Feature
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: Icebox
    • Component/s: readconnroute
    • Labels:
      None

      Description

      Hello folks,
      We would like MaxScale to have the ability to control when replicas are available or placed on a "standby" mode, whenever 2
      specific thresholds are reached:

      1. replication lag greater then X seconds

      Queries returned by replicas lagging too far behind their primary server can possibly return stale/wrong data. In order to prevent
      wrong information sent back to the client, we would like to prevent new queries hitting the replica with replication lag greater than
      a given threshold.

      2. number of active queries is greater then Y queries

      Clients would like the ability to prevent queries from hitting a server after a given number of active queries has been reached.
      This can be for a variety of reasons, i.e.: application design, on-going backups causing locks, etc ...

      We would like MaxScale to prevent new query requests from being sent to a replica whenever one of the 2 thresholds above
      are exceeded. A new State within MaxScale would show the servers which are affected by the above as
      "standby (throttled)" (or something else you deem more appropriate) and also a new column showing the lag, example below:

      ┌───────────────┬────────────────┬──────┬─────────────┬─────────────────┬────────────────────────────┐─────────────────┐
      │ Server        │ Address        │ Port │    Lag      |   Connections   │    State                   │    GTID         │
      ├───────────────┼────────────────┼──────┼─────────────┼─────────────────┼────────────────────────────┤─────────────────┤
      │ dbServer1     │ 192.168.88.101 │ 3306 │      0      |     20          │ Master, Running            │ 0-8180-15692671 │
      ├───────────────┼────────────────┼──────┼─────────────┼─────────────────┼────────────────────────────┤─────────────────┤
      │ dbServer2     │ 192.168.88.102 │ 3306 │      0      |     40          │ Slave, Running             │ 0-8180-15692671 │
      ├───────────────┼────────────────┼──────┼─────────────┼─────────────────┼────────────────────────────┤─────────────────┤
      │ dbServer3     │ 192.168.88.103 │ 3306 │     500     |     40          │ Slave, Standby(throttled)  │ 0-8180-15690132 │
      └───────────────┴────────────────┴──────┴─────────────┴─────────────────┴────────────────────────────┘─────────────────┘
      
      

      Both thresholds should be independent of each other and these settings should be dynamic and no restart required.
      We could have a failsafe logic, and if only 1 replica is available, these 2 thresholds would be ignored.
      Once threshold is cleared (i.e. lag falls below it), replicas are automatically made available and the state is updated.
      Existing queries are not affected.

      Scenario:

      1. replication lag greater then X seconds or
      2. number of active queries is greater then Y queries

      If #1 or #2 is met, new queries are not sent to replicas matching that threshold.
      Once #1 or #2 falls below the configured threshold, new queries can be routed to the replica again.

        Attachments

          Activity

            People

            Assignee:
            toddstoffel Todd Stoffel
            Reporter:
            dalmeida Daniel Almeida
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

              Dates

              Created:
              Updated: