Uploaded image for project: 'MariaDB MaxScale'
  1. MariaDB MaxScale
  2. MXS-3088

Support for replication lag monitoring/on-demand availability



    • Type: New Feature
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: Icebox
    • Component/s: readconnroute
    • Labels:


      Hello folks,
      We would like MaxScale to have the ability to control when replicas are available or placed on a "standby" mode, whenever 2
      specific thresholds are reached:

      1. replication lag greater then X seconds

      Queries returned by replicas lagging too far behind their primary server can possibly return stale/wrong data. In order to prevent
      wrong information sent back to the client, we would like to prevent new queries hitting the replica with replication lag greater than
      a given threshold.

      2. number of active queries is greater then Y queries

      Clients would like the ability to prevent queries from hitting a server after a given number of active queries has been reached.
      This can be for a variety of reasons, i.e.: application design, on-going backups causing locks, etc ...

      We would like MaxScale to prevent new query requests from being sent to a replica whenever one of the 2 thresholds above
      are exceeded. A new State within MaxScale would show the servers which are affected by the above as
      "standby (throttled)" (or something else you deem more appropriate) and also a new column showing the lag, example below:

      │ Server        │ Address        │ Port │    Lag      |   Connections   │    State                   │    GTID         │
      │ dbServer1     │ │ 3306 │      0      |     20          │ Master, Running            │ 0-8180-15692671 │
      │ dbServer2     │ │ 3306 │      0      |     40          │ Slave, Running             │ 0-8180-15692671 │
      │ dbServer3     │ │ 3306 │     500     |     40          │ Slave, Standby(throttled)  │ 0-8180-15690132 │

      Both thresholds should be independent of each other and these settings should be dynamic and no restart required.
      We could have a failsafe logic, and if only 1 replica is available, these 2 thresholds would be ignored.
      Once threshold is cleared (i.e. lag falls below it), replicas are automatically made available and the state is updated.
      Existing queries are not affected.


      1. replication lag greater then X seconds or
      2. number of active queries is greater then Y queries

      If #1 or #2 is met, new queries are not sent to replicas matching that threshold.
      Once #1 or #2 falls below the configured threshold, new queries can be routed to the replica again.




            toddstoffel Todd Stoffel
            dalmeida Daniel Almeida
            1 Vote for this issue
            3 Start watching this issue