[MXS-1893] CHANGE MASTER lost connection on auto_rejoin should retry - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 2.2.5
Fix Version/s: 2.2.10
Component/s: mariadbmon
Labels:
None

Description

When node/container comes up and attempts to auto_rejoin as slave, if a CHANGE MASTER looses connection (but actually works), you end up with the node pointing to the master however the slave is never started and auto_rejoin is disabled (which is the worst part).

If you just went ahead and did the START SLAVE anyway this would have worked, or there should be a retry of the auto_rejoin before disabling it and calling it a failure. I really don't know why the lost connection but this was a transient event.

Following is the excerpt from the maxscale.log (since I couildn't attach it).
Here is the times:
14:13:12 mariadb-0 comes back up and attempts to rejoin master, however CHANGE MASTER TO... seems to get a lost connection and fails. The CHANGE MASTER actually worked and the slave had the new master already setup.
14:18:19: I went into mariadb-0 node and saw that the CHANGE MASTER did work and manually did a START SLAVE; Now this node was successfully replicating from the local Master.
------------------------------------
2018-05-31 14:13:10 notice : Server changed state: mdb-dc1-mariadb-
0[192.168.1.218:3306]: server_up. [Down] -> [Running]
2018-05-31 14:13:11 notice : Executed monitor script '/usr/lib/maxscale/maxscale_notify.py --initiator=[192.168.1.218]:3306 --event=server_up --servers=[192.168.1.78]:3306,[192.168.1.218]:3306,[192.168.1.128]:3306 --masters=[192.168.1.78]:3306 --slaves=[192.168.1.128]:3306' on event 'server_up'
2018-05-31 14:13:11 notice : [mariadbmon] Server 'mdb-dc1-mariadb-0' is replicating from a server other than 'mdb-dc1-mariadb-1', redirecting it to 'mdb-dc1-mariadb-1'.
2018-05-31 14:13:12 warning: [mariadbmon] Slave 'mdb-dc1-mariadb-0' redirection failed: 'Lost connection to MySQL server during query'. Query: 'CHANGE MASTER TO ...'.
2018-05-31 14:13:12 error : [mariadbmon] A cluster join operation failed, disabling automatic rejoining. To re-enable, manually set 'auto_rejoin' to 'true' for monitor 'MariaDB-Monitor' via MaxAdmin or the REST API.
2018-05-31 14:18:19 notice : Server changed state: mdb-dc1-mariadb-0[192.168.1.218:3306]: new_slave. [Running] -> [Slave, Running]
2018-05-31 14:18:19 notice : Executed monitor script '/usr/lib/maxscale/maxscale_notify.py --initiator=[192.168.1.218]:3306 --event=new_slave --servers=[192.168.1.78]:3306,[192.168.1.218]:3306,[192.168.1.128]:3306 --masters=[192.168.1.78]:3306 --slaves=[192.168.1.218]:3306,[192.168.1.128]:3306' on event 'new_slave'

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

maxscale.log
14 kB
2018-06-01 19:01

Activity

People

Assignee:: Esa Korhonen

Reporter:: Richard Lane

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2018-06-01 19:01

Updated:: 2018-08-17 09:59

Resolved:: 2018-08-17 09:59

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.