Uploaded image for project: 'MariaDB Connector/J'
  1. MariaDB Connector/J
  2. CONJ-595

Create option to configure DONOR/DESYNCED Galera nodes to be unavailable for load-balancing

Details

    • Task
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • None
    • 2.2.5, 1.7.4
    • Failover
    • None

    Description

      When a node is in the DONOR/DESYNCED state, it doesn't participate in flow control, so its data can get stale. Galera states are explained here:

      http://galeracluster.com/documentation-webpages/nodestates.html#changes-in-the-node-state

      A node's state can be checked with wsrep_local_state:

      http://galeracluster.com/documentation-webpages/galerastatusvariables.html#wsrep-local-state

      MaxScale's Galera Monitor treats nodes in the DONOR/DESYNCED state as unavailable unless available_when_donor is configured. This ensures that MaxScale does not route queries to a node that has stale data.

      https://mariadb.com/kb/en/mariadb-enterprise/mariadb-maxscale-22-galera-monitor/#available_when_donor

      As far as I can tell, MariaDB Connector/J's load balancing implementation does not have a way to keep queries from being sent to desynced Galera nodes. Maybe we should add an option that would enable that kind of behavior?

      Attachments

        Activity

          diego dupin Diego Dupin added a comment - - edited

          There is no monitor inside connector, but still, there is a specific implementation for galera.

          Pools validate connection state before borrowing it. This is done using Connection.isValid(timeout).

          Standard implementation of Connection.isValid() will emit a COM_PING. For galera, connection will emit a query in place of COM_PING : "show status like 'wsrep_cluster_status' ", and check that status is "PRIMARY". To ensure not only that socket is set, but that server is in PRIMARY state. So, using 'wsrep_cluster_status' not 'wsrep_local_state', but result will be the same.

          see CONJ-400

          diego dupin Diego Dupin added a comment - - edited There is no monitor inside connector, but still, there is a specific implementation for galera. Pools validate connection state before borrowing it. This is done using Connection.isValid(timeout). Standard implementation of Connection.isValid() will emit a COM_PING. For galera, connection will emit a query in place of COM_PING : "show status like 'wsrep_cluster_status' ", and check that status is "PRIMARY". To ensure not only that socket is set, but that server is in PRIMARY state. So, using 'wsrep_cluster_status' not 'wsrep_local_state', but result will be the same. see CONJ-400

          It's not really correct that checking wsrep_cluster_status for PRIMARY will have the same result as checking wsrep_local_state. The problem is that a node can be in the DONOR/DESYNCED state while still being in the cluster's primary component. When a node is desynced, it means that it doesn't participate in flow control. This means that it can fall behind the other nodes in the cluster's primary component. It's kind of similar to slave lag with a traditional replication slave. If the node has fallen behind, then it might not make sense to use it for load balancing in some applications.

          Maybe Connector/J should have an option that would make Connection.isValid() check wsrep_cluster_status and wsrep_local_state?

          GeoffMontee Geoff Montee (Inactive) added a comment - It's not really correct that checking wsrep_cluster_status for PRIMARY will have the same result as checking wsrep_local_state. The problem is that a node can be in the DONOR/DESYNCED state while still being in the cluster's primary component. When a node is desynced, it means that it doesn't participate in flow control. This means that it can fall behind the other nodes in the cluster's primary component. It's kind of similar to slave lag with a traditional replication slave. If the node has fallen behind, then it might not make sense to use it for load balancing in some applications. Maybe Connector/J should have an option that would make Connection.isValid() check wsrep_cluster_status and wsrep_local_state?
          diego dupin Diego Dupin added a comment - - edited

          right !
          As state here : http://galeracluster.com/documentation-webpages/nodestates.html#node-state-changes
          Primary is not enough, server must be also Synced.

          Implementation will change to rely on wsrep_local_state to check "sync" status.

          diego dupin Diego Dupin added a comment - - edited right ! As state here : http://galeracluster.com/documentation-webpages/nodestates.html#node-state-changes Primary is not enough, server must be also Synced. Implementation will change to rely on wsrep_local_state to check "sync" status.
          diego dupin Diego Dupin added a comment -

          New option "galeraAllowedState" permit to correct implementation.

          If option "galeraAllowedState" if not set (default), Connection.isValid() just send an empty packet to the server, and the server responds with a small packetto ensure connectivity (COM_PING).

          When this option "galeraAllowedState" is set, the connector will ensure that server "wsrep_local_state" correspond to allowed values (separated by comma)
          Example using option "galeraAllowedState=4" will ensure that the server is available and in "sync" state, not just primary.

          diego dupin Diego Dupin added a comment - New option "galeraAllowedState" permit to correct implementation. If option "galeraAllowedState" if not set (default), Connection.isValid() just send an empty packet to the server, and the server responds with a small packetto ensure connectivity ( COM_PING ). When this option "galeraAllowedState" is set, the connector will ensure that server "wsrep_local_state" correspond to allowed values (separated by comma) Example using option "galeraAllowedState=4" will ensure that the server is available and in "sync" state, not just primary.

          People

            diego dupin Diego Dupin
            GeoffMontee Geoff Montee (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.