Uploaded image for project: 'MariaDB MaxScale'
  1. MariaDB MaxScale
  2. MXS-2489

ReadWriteSplit service redirect some queries to laggy slave

    XMLWordPrintable

    Details

      Description

      I'm using MaxScale 2.3 with 3 MariaDB servers, 1 of them is master and the others are slaves and I set max_slave_replication_lag to 5 secs

      [Read-Write-Service]
      type=service
      router=readwritesplit
      servers=server1,server3,server2
      max_slave_replication_lag=5
      master_failure_mode=fail_on_write
      

      After that I stop slaves for a period of time, and kept master running, all queries redirected to master, then slaves come back online, but they were hours behind master, when Monitor check slave status, sometime SHOW SLAVE STATUS return:
      Slave_IO_Running Preparing
      Seconds_Behind_Master NULL

      then the Monitor decides that slave is up to date and redirect some queries to that slave, even slave is hours behind master!

      I took a look over source code and IMHO I think this block of code is the reason

      static inline bool rpl_lag_is_ok(SRWBackend& backend, int max_rlag)
      {
         return max_rlag == MXS_RLAG_UNDEFINED || backend->server()->rlag <= max_rlag;
      }
      

      So maybe removing ( max_rlag == MXS_RLAG_UNDEFINED) from condition might help in this case and damage in case slave was really up to date, maybe can check GTID for master and slave to determine that master and slave are really at same point of transaction

      Master and slaves are running MariaDB 10.1.40

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              Unassigned
              Reporter:
              omegaes Abdul Rahman Babil
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: