[MXS-1893] CHANGE MASTER lost connection on auto_rejoin should retry Created: 2018-06-01 Updated: 2018-08-17 Resolved: 2018-08-17 |
|
| Status: | Closed |
| Project: | MariaDB MaxScale |
| Component/s: | mariadbmon |
| Affects Version/s: | 2.2.5 |
| Fix Version/s: | 2.2.10 |
| Type: | Bug | Priority: | Major |
| Reporter: | Richard Lane | Assignee: | Esa Korhonen |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
| Description |
|
When node/container comes up and attempts to auto_rejoin as slave, if a CHANGE MASTER looses connection (but actually works), you end up with the node pointing to the master however the slave is never started and auto_rejoin is disabled (which is the worst part). If you just went ahead and did the START SLAVE anyway this would have worked, or there should be a retry of the auto_rejoin before disabling it and calling it a failure. I really don't know why the lost connection but this was a transient event. Following is the excerpt from the maxscale.log (since I couildn't attach it). |
| Comments |
| Comment by markus makela [ 2018-06-02 ] |
|
As the auto-rejoin operation is "non-destructive", it should be perfectly OK to keep on trying to rejoin servers even if a rejoin fails. |
| Comment by Richard Lane [ 2018-06-08 ] |
|
I actually am requesting that maxscale have an option to retry the rejoin operation if one of the STOP SLAVE, RESET SLAVE, CHANGE MASTER TO fails. |
| Comment by markus makela [ 2018-06-12 ] |
|
As a temporary workaround, adding query_retries=2 and query_retry_timeout=10 under the [maxscale] section should allow automated retrying of these queries. |
| Comment by Esa Korhonen [ 2018-08-17 ] |
|
As of 2.2.10, auto_rejoin is no longer turned off if it fails. This may lead to a situation where it's attempted every loop, but that is quite unlikely. With this and the options mentioned above the rejoin seems quite error-tolerant. |