Uploaded image for project: 'MariaDB MaxScale'
  1. MariaDB MaxScale
  2. MXS-2010

MaxScale Failover Not Working as Expected

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 2.2.13
    • 2.3.0
    • mariadbmon
    • None
    • MXS-SPRINT-64, MXS-SPRINT-65

    Description

      We are testing with maxscale 2.2.13 with failover scenarios we noticed that one scenario is failing.

      Tested this for enough times and the same thing is observed.

      scenario.

      node1 ==> Master
      node2 ==> Slave
      node3 ==> Slave

      success
      ===================================

      node1 ==> down , rejoined as slave
      node2 ==> Master, Promoted as Master
      node3 ==> Slave , No Change

      Success

      ===================================

      when node1 and node2 is brough down at a time node3 is promoted as Master Successfully
      node1 ==> down , Down At a Time
      node2 ==> down , Down At a Time
      node3 ==> Master , Promoted as Master

      Success

      ====================================
      ====================================
      when bringing up both the nodes at a time (Node2 followed by Node1)

      node1 ==> Running
      node2 ==> Running
      node3 ==> Already Master

      out put

      node1 ==> Slave Running
      node2 ==> Master Running
      node3 ==> Running (Out of Cluster) Including Data Loss

      Failure

      In the above scenario we started both nodes at a time node2 followed by node1
      The current Master went to running state.

      [maxscale@x18tcldgpapp06 ~]$ maxctrl list servers
      ┌────────┬──────────────┬──────┬─────────────┬─────────────────┬────────────┐
      │ Server │ Address │ Port │ Connections │ State │ GTID │
      ├────────┼──────────────┼──────┼─────────────┼─────────────────┼────────────┤
      │ node1 │ 10.1.1.96 │ 6603 │ 0 │ Down │ 1-3-341679 │
      ├────────┼──────────────┼──────┼─────────────┼─────────────────┼────────────┤
      │ node2 │ 10.1.1.81 │ 6603 │ 0 │ Down │ 1-3-341679 │
      ├────────┼──────────────┼──────┼─────────────┼─────────────────┼────────────┤
      │ node3 │ 10.1.1.82 │ 6603 │ 0 │ Master, Running │ 1-3-341679 │
      └────────┴──────────────┴──────┴─────────────┴─────────────────┴────────────┘
      [maxscale@x18tcldgpapp06 ~]$ maxctrl list servers
      ┌────────┬──────────────┬──────┬─────────────┬─────────────────┬────────────┐
      │ Server │ Address │ Port │ Connections │ State │ GTID │
      ├────────┼──────────────┼──────┼─────────────┼─────────────────┼────────────┤
      │ node1 │ 10.1.1.96 │ 6603 │ 0 │ Slave, Running │ 1-3-341679 │
      ├────────┼──────────────┼──────┼─────────────┼─────────────────┼────────────┤
      │ node2 │ 10.1.1.81 │ 6603 │ 0 │ Master, Running │ 1-3-341679 │
      ├────────┼──────────────┼──────┼─────────────┼─────────────────┼────────────┤
      │ node3 │ 10.1.1.82 │ 6603 │ 0 │ Running │ 1-1-342807 │
      └────────┴──────────────┴──────┴─────────────┴─────────────────┴────────────┘
      [maxscale@x18tcldgpapp06 ~]$ maxctrl list servers
      ┌────────┬──────────────┬──────┬─────────────┬─────────────────┬────────────┐
      │ Server │ Address │ Port │ Connections │ State │ GTID │
      ├────────┼──────────────┼──────┼─────────────┼─────────────────┼────────────┤
      │ node1 │ 10.1.1.96 │ 6603 │ 0 │ Slave, Running │ 1-3-341679 │
      ├────────┼──────────────┼──────┼─────────────┼─────────────────┼────────────┤
      │ node2 │ 10.1.1.81 │ 6603 │ 0 │ Master, Running │ 1-3-341679 │
      ├────────┼──────────────┼──────┼─────────────┼─────────────────┼────────────┤
      │ node3 │ 10.1.1.82 │ 6603 │ 0 │ Running │ 1-1-343390 │
      └────────┴──────────────┴──────┴─────────────┴─────────────────┴────────────┘
      

      ====================================
      ====================================

      While doing individual failover nodes this is promoting perfectly.

      But when doing both nodes at a time this is failing on same error.

      Uploaded failover maxctrl output for your reference.

      Attachments

        Activity

          Also, it has now been tested multiple times with read_only=1 in my.cnf, with still the same results.

          The previous master is not joining as slave it's promoting as master.

          I will upload the latest logs for your reference.

          ccalender Chris Calender (Inactive) added a comment - Also, it has now been tested multiple times with read_only=1 in my.cnf, with still the same results. The previous master is not joining as slave it's promoting as master. I will upload the latest logs for your reference.
          esa.korhonen Esa Korhonen added a comment -

          This is difficult to fix for 2.2, as the monitoring logic does not remember the previous master. In 2.3 the logic is different and these kinds of issues should not be a problem, or at least they are easier to fix.

          esa.korhonen Esa Korhonen added a comment - This is difficult to fix for 2.2, as the monitoring logic does not remember the previous master. In 2.3 the logic is different and these kinds of issues should not be a problem, or at least they are easier to fix.

          People

            esa.korhonen Esa Korhonen
            ccalender Chris Calender (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.