Uploaded image for project: 'MariaDB MaxScale'
  1. MariaDB MaxScale
  2. MXS-1893

CHANGE MASTER lost connection on auto_rejoin should retry

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 2.2.5
    • 2.2.10
    • mariadbmon
    • None

    Description

      When node/container comes up and attempts to auto_rejoin as slave, if a CHANGE MASTER looses connection (but actually works), you end up with the node pointing to the master however the slave is never started and auto_rejoin is disabled (which is the worst part).

      If you just went ahead and did the START SLAVE anyway this would have worked, or there should be a retry of the auto_rejoin before disabling it and calling it a failure. I really don't know why the lost connection but this was a transient event.

      Following is the excerpt from the maxscale.log (since I couildn't attach it).
      Here is the times:
      14:13:12 mariadb-0 comes back up and attempts to rejoin master, however CHANGE MASTER TO... seems to get a lost connection and fails. The CHANGE MASTER actually worked and the slave had the new master already setup.
      14:18:19: I went into mariadb-0 node and saw that the CHANGE MASTER did work and manually did a START SLAVE; Now this node was successfully replicating from the local Master.
      ------------------------------------
      2018-05-31 14:13:10 notice : Server changed state: mdb-dc1-mariadb-
      0[192.168.1.218:3306]: server_up. [Down] -> [Running]
      2018-05-31 14:13:11 notice : Executed monitor script '/usr/lib/maxscale/maxscale_notify.py --initiator=[192.168.1.218]:3306 --event=server_up --servers=[192.168.1.78]:3306,[192.168.1.218]:3306,[192.168.1.128]:3306 --masters=[192.168.1.78]:3306 --slaves=[192.168.1.128]:3306' on event 'server_up'
      2018-05-31 14:13:11 notice : [mariadbmon] Server 'mdb-dc1-mariadb-0' is replicating from a server other than 'mdb-dc1-mariadb-1', redirecting it to 'mdb-dc1-mariadb-1'.
      2018-05-31 14:13:12 warning: [mariadbmon] Slave 'mdb-dc1-mariadb-0' redirection failed: 'Lost connection to MySQL server during query'. Query: 'CHANGE MASTER TO ...'.
      2018-05-31 14:13:12 error : [mariadbmon] A cluster join operation failed, disabling automatic rejoining. To re-enable, manually set 'auto_rejoin' to 'true' for monitor 'MariaDB-Monitor' via MaxAdmin or the REST API.
      2018-05-31 14:18:19 notice : Server changed state: mdb-dc1-mariadb-0[192.168.1.218:3306]: new_slave. [Running] -> [Slave, Running]
      2018-05-31 14:18:19 notice : Executed monitor script '/usr/lib/maxscale/maxscale_notify.py --initiator=[192.168.1.218]:3306 --event=new_slave --servers=[192.168.1.78]:3306,[192.168.1.218]:3306,[192.168.1.128]:3306 --masters=[192.168.1.78]:3306 --slaves=[192.168.1.218]:3306,[192.168.1.128]:3306' on event 'new_slave'

      Attachments

        Activity

          People

            esa.korhonen Esa Korhonen
            rvlane Richard Lane
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.