[MXS-4759] Force failover flag Created: 2023-09-13 Updated: 2023-10-16 Resolved: 2023-10-16 |
|
| Status: | Closed |
| Project: | MariaDB MaxScale |
| Component/s: | mariadbmon |
| Affects Version/s: | None |
| Fix Version/s: | 23.08.2 |
| Type: | New Feature | Priority: | Major |
| Reporter: | Bryan Bancroft (Inactive) | Assignee: | Esa Korhonen |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Sprint: | MXS-SPRINT-192 | ||||||||
| Description |
|
Either a force failover or cluster refresh option request is to have a way to make a server master regardless of risk. This was spurred from a situation where a outage was extended when we had to remove a bad server from the cnf and restart maxscale to promote a known up-to-date slave. Need her is a situation where the admin knows what needs to be done but the technology is clocking the action. Example command
below status of a issue cluster due to gtid repl off
|
| Comments |
| Comment by markus makela [ 2023-09-13 ] |
|
The existing reset-replication command seems to do most of what is required here. It prepares the cluster for use with automatic failover even at the risk of potential data loss: https://mariadb.com/kb/en/mariadb-maxscale-2308-mariadb-monitor/#operation-details The documentation doesn't seem to mention whether the monitor already does this but one improvement is to wait for the relay log to be consumed and to auto-select the best candidate based on the existing GTID position. |
| Comment by markus makela [ 2023-09-15 ] |
|
There's an optional argument to reset-replication that allows the caller to pick which server to promote. The fact that the replication was modified even if it claims to have failed should be filed as a separate bug. It's possible that the reason why it didn't pick a primary server is because the servers have read_only enabled. If enforce_simple_topology was enabled along with enforce_writable_master and enforce_read_only_slaves it might've fixed the problem on its own. The switchover-force is identical to the normal switchover except that it does this:
If the server that was labeled as Master is down and you forcefully promote another node, by definition you're not switching over to something, you're failing over to something. As such, |
| Comment by Esa Korhonen [ 2023-10-16 ] |
|
Unclear if anything is needed still. Closing for now. |