Uploaded image for project: 'MariaDB MaxScale'
  1. MariaDB MaxScale
  2. MXS-3855

incorrect routing of selects with disable_master_failback

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Not a Bug
    • 6.1.4
    • N/A
    • galeramon, readwritesplit
    • None
    • centos7, kernel 3.10.0-1127.el7.x86_64
      maxscale 6.1.4
      mariadb 10.4.21
      galera cluster

    Description

      Description:

      With disable_master_failback=true set with a readwritesplit router, the selects are not distributed correctly after the master went down and up again.

      How to repeat:

      galera cluster config:

      [mysqld]
      server-id=1
      log-error=/var/log/mysql/error.log
      log_bin=mysql-bin
      log_slave_updates=ON
      expire_logs_days=3
      general_log=ON
       
      [galera]
      wsrep_on=ON
      wsrep_provider='/usr/lib64/galera-4/libgalera_smm.so'
      wsrep_cluster_address='gcomm://192.168.0.201,192.168.0.202,192.168.0.203'
      binlog_format=row
      default_storage_engine=InnoDB
      innodb_autoinc_lock_mode=2
      wsrep_node_address='192.168.0.201'
      bind-address=0.0.0.0
      

      maxscale config:

      [maxscale]
      threads=auto
       
      [server1]
      type=server
      address=192.168.0.201
      port=3306
      protocol=MariaDBBackend
       
      [server2]
      type=server
      address=192.168.0.202
      port=3306
      protocol=MariaDBBackend
       
      [server3]
      type=server
      address=192.168.0.203
      port=3306
      protocol=MariaDBBackend
       
      [Galera-Monitor]
      type=monitor
      module=galeramon
      servers=server1,server2,server3
      user=monitor_user
      password=my_password
      monitor_interval=2000ms
      disable_master_failback=true
       
      [Read-Write-Service]
      type=service
      router=readwritesplit
      servers=server1,server2,server3
      user=maxscale
      password=maxscale_pw
       
      [Read-Write-Listener]
      type=listener
      service=Read-Write-Service
      protocol=MariaDBClient
      port=3307
      address=192.168.0.200
      

      1. Do a `tail -f general.log` on all servers.
      2. Repeatably do a `select whatever from a_table` via `mysql -h=192.168.0.200 --port=3307`. You can observe that the selects are evenly distributed on the slaves in a round robin fashion. So far so good.
      3. `systemctl stop mariadb` on the master. Now you can see that the selects are only received by the one remaining slave. The master does not execute the selects. So far so good.
      4. `systemctl start mariadb` on the host that used to be the master. Since disable_master_failback is set, it does not become the master again. `maxctrl list servers` says that the state is "Slave, Synced, Running". The selects are only received/executed by one slave from now on.

      Expected behaviour:

      The select queries should be evenly distributed on all slaves in a round-robin fashion again, after the host comes back up.

      Solution:

      Restarting the maxscale service on the current master fixes the problem. Restarting the service on the slaves does not.

      Attachments

        Activity

          markus makela markus makela added a comment -

          Are you doing queries with only one connection? If so then this would be expected behavior as readwritesplit never truly round-robins the reads: each query is routed according to the selection criteria and in the cases of ties, it routes the request to the server which has processed the smallest amount of traffic. This just happens to end up in round-robin behavior when MaxScale is started with a static set of servers and there's only one client doing requests.

          There are also a couple of internal factors that affect the load balancing decisions. The score that readwritesplit calculates for each candidate server has some hard-coded factors in it to avoid opening extra connections when valid sub-optimal candidates are available. This is based on the assumption that using an already opened TCP connection that is slightly worse score-wise is preferable over opening a new TCP connection to the "perfect" candidate. At the time of writing this factor is:

                  if (!b->in_use())
                  {
                      // To prefer servers that we are connected to, inflate the score of unconnected servers
                      score = (score + 5.0) * 1.5;
                  }
          

          This means that with two candidates, one with a score of 95 (by default the score is the number of ongoing queries) to which we are already connected to and another which we aren't connected to, the latter candidate must have a score of 55 or lower before the sessions start favoring it over the already connected ones.

          Having measured and tested this, with about 100 concurrent clients executing SELECT SLEEP(1) with two candidates, this translates to rough 60/40 split favoring the already connected server. This reveals that factor could be lowered as the other server ends up doing more work. Even if this bias value was changed, it wouldn't change the behavior with only one client connection as the open connection would always be preferred.

          markus makela markus makela added a comment - Are you doing queries with only one connection? If so then this would be expected behavior as readwritesplit never truly round-robins the reads: each query is routed according to the selection criteria and in the cases of ties, it routes the request to the server which has processed the smallest amount of traffic. This just happens to end up in round-robin behavior when MaxScale is started with a static set of servers and there's only one client doing requests. There are also a couple of internal factors that affect the load balancing decisions. The score that readwritesplit calculates for each candidate server has some hard-coded factors in it to avoid opening extra connections when valid sub-optimal candidates are available. This is based on the assumption that using an already opened TCP connection that is slightly worse score-wise is preferable over opening a new TCP connection to the "perfect" candidate. At the time of writing this factor is: if (!b->in_use()) { // To prefer servers that we are connected to, inflate the score of unconnected servers score = (score + 5.0) * 1.5; } This means that with two candidates, one with a score of 95 (by default the score is the number of ongoing queries) to which we are already connected to and another which we aren't connected to, the latter candidate must have a score of 55 or lower before the sessions start favoring it over the already connected ones. Having measured and tested this, with about 100 concurrent clients executing SELECT SLEEP(1) with two candidates, this translates to rough 60/40 split favoring the already connected server. This reveals that factor could be lowered as the other server ends up doing more work. Even if this bias value was changed, it wouldn't change the behavior with only one client connection as the open connection would always be preferred.
          markus makela markus makela added a comment - - edited

          tbenkert One more question: did you reconnect the client after you restarted the server?

          markus makela markus makela added a comment - - edited tbenkert One more question: did you reconnect the client after you restarted the server?
          markus makela markus makela added a comment -

          I'll close this as Not a Bug since it looks like expected behavior.

          markus makela markus makela added a comment - I'll close this as Not a Bug since it looks like expected behavior.

          People

            markus makela markus makela
            tbenkert Thomas Benkert
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.