Uploaded image for project: 'MariaDB MaxScale'
  1. MariaDB MaxScale
  2. MXS-2036

A slave with sql thread stopped causes wrong master after failover

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.2.13
    • Fix Version/s: 2.2.14
    • Component/s: mariadbmon
    • Labels:
      None
    • Sprint:
      MXS-SPRINT-65

      Description

      As described in the support ticket. In short, a slave with IO thread running but SQL thread stopped is in limbo, and causes wrong master to be selected after a failover unless the new master has other slaves.

      This is again an effect of the way the 2.2 monitor works. The slave which is still connected or trying to connect to the master (IO thread is on or connecting) but not actually replicating (sql thread is off) is counted as a slave of that node, even if the master node is down. During switchover/failover, servers with a broken slave sql thread are not redirected (since they are not real slaves and cannot replicate from the new master anyway). This difference produces the weird result where the old master gets to be master even after failover. In 2.3 this doesn't happen because the monitor works differently.

      Fixing this in 2.2 requires choosing between changing the master selection code or the failover/switchover code. I will try with the latter, since changing the former could affect various other places as well.

        Attachments

          Activity

            People

            Assignee:
            esa.korhonen Esa Korhonen
            Reporter:
            esa.korhonen Esa Korhonen
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: