Uploaded image for project: 'MariaDB MaxScale'
  1. MariaDB MaxScale
  2. MXS-3971

Failover after switchover fails if no transaction is ran between them

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Minor
    • Resolution: Cannot Reproduce
    • 6.2.1
    • 6.4.2
    • mariadbmon
    • None
    • MXS-SPRINT-165, MXS-SPRINT-166

    Description

      I recently stumbled onto an intriguing MaxScale phenomenon. It is not perhaps exactly a bug, but there still seems to be a way to improve MaxScale and avoid what I experienced. My setup and flow were as follows:

      • Setup one master and two slave nodes with one MaxScale in front of them. Run few transactions through the master and confirm they are replicated. Ensure MaxScale sees the topology properly (maxctrl list servers). Take a note of the GTID values on each of the three nodes (show global variables like '%gtid%').
      • Conduct a switchover via MaxScale.
      • Do not run any transactions through the new master yet. Observe that the two slaves are connected to the newly promoted master and have no errors. Confirm MaxScale sees the altered topology properly. Take another reading of the GTID values on each of the three nodes.
      • Turn off the newly promoted master.
      • It may be expected that MaxScale will promote one of the two remaining nodes to a new master. In reality, MaxScale spits an error about missing GTID values (even quoting the name of the master that has been switched off, which is likely a separate misleading error as this node is no longer accessible) and declares there is no suitable node to promote.

      Digging around this I found that if at least one transaction is ran through the newly promoted master after the switchover, then all GTID values are updated on all nodes and turning off this promoted master results in MaxScale successfully choosing one of the remaining slaves and making it a master.

      If, however, there was no transaction ran through the newly promoted master between the switchover and the disconnection of this promoted master, MaxScale finds itself unable to do a failover.

      It may be that the same effect could be achieved with two failovers or two switchovers - I have not tested.

      If the reason for the error is indeed the fact that some of the GTID variables will only be updated after at least one transaction is passed through the newly promoted master, then perhaps the easiest way to mitigate this is to make MaxScale do an artificial transaction after each failover or switchover as a part of the process (and then do another one to negate the first).

      Attachments

        Activity

          People

            esa.korhonen Esa Korhonen
            assen.totin Assen Totin (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.