[MXS-3983] Add switchover-force command Created: 2022-02-03 Updated: 2023-09-14 Resolved: 2023-06-19 |
|
| Status: | Closed |
| Project: | MariaDB MaxScale |
| Component/s: | maxctrl |
| Affects Version/s: | 2.5.14 |
| Fix Version/s: | 23.08.0 |
| Type: | New Feature | Priority: | Major |
| Reporter: | Rick Pizzi | Assignee: | Esa Korhonen |
| Resolution: | Fixed | Votes: | 4 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Sprint: | MXS-SPRINT-184 | ||||||||
| Description |
|
If master becomes unresponsive (eg: InnoDB clogged by contention issues) we want to be able to promote a replica and then kill the unrecoverable master. This is currently not possible:
Proposal: either move forward even if read only cannot be set after some time, or provide an option to switchover to force this when need be. Thank you. |
| Comments |
| Comment by markus makela [ 2022-02-24 ] | ||||||||||||||||||
|
Some sort of a --force flag for switchover might make sense. | ||||||||||||||||||
| Comment by Rick Pizzi [ 2022-03-23 ] | ||||||||||||||||||
|
It looks like this also happens during automatic failovers, which is severe....
I think in the above situation , the replica should be promoted anyways. | ||||||||||||||||||
| Comment by markus makela [ 2022-03-23 ] | ||||||||||||||||||
|
That happened because the server was deleted which caused the monitor to discard the old information. Not doing it would've prevented that. | ||||||||||||||||||
| Comment by Rick Pizzi [ 2022-03-23 ] | ||||||||||||||||||
|
What would cause the server to be deleted from maxscale monitor view? Automation? | ||||||||||||||||||
| Comment by Rick Pizzi [ 2022-03-23 ] | ||||||||||||||||||
|
Why is the remaining slave not considered a valid master, is the question | ||||||||||||||||||
| Comment by markus makela [ 2022-03-23 ] | ||||||||||||||||||
|
Based on that output I'd say someone did a maxctrl delete server followed by a maxctrl create server. Given that it happened right after the failure, it could be some script that did it. | ||||||||||||||||||
| Comment by markus makela [ 2022-03-23 ] | ||||||||||||||||||
|
There might be something strange going on as the logs above were generated with enforce_simple_topology enabled. Further investigation might be needed. | ||||||||||||||||||
| Comment by Rick Pizzi [ 2022-03-23 ] | ||||||||||||||||||
|
This is SkySQL so it may be part of the operator logic; I don't know.. But the part where maxscale refuses to do a failover because it is unable to set read only on current master, is what Let's take for example the (not uncommon) case where the master goes out of available connections ("too many connections") – failover will never happen as maxscale will never able to set the current master to read only. Thanks | ||||||||||||||||||
| Comment by markus makela [ 2022-03-23 ] | ||||||||||||||||||
|
I agree, even if the switchover fails, automatic failover should eventually be able to promote one of the remaining servers. As to why this didn't happen, we don't know at this point and we'll need to investigate. It's possible that the action of deleting the server is what triggered this but we need to be able to reproduce this to be sure. | ||||||||||||||||||
| Comment by markus makela [ 2023-02-15 ] | ||||||||||||||||||
|
I changed this to a New Feature as the current behavior is expected behavior. esa.korhonen and rpizzi, if you think this is wrong, let me know and we can think what is the correct classification for this issue. |