[MXS-2744] Switchover improvements Created: 2019-10-29 Updated: 2023-06-19 Resolved: 2023-06-19 |
|
| Status: | Closed |
| Project: | MariaDB MaxScale |
| Component/s: | mariadbmon |
| Affects Version/s: | None |
| Fix Version/s: | 23.08.0 |
| Type: | New Feature | Priority: | Major |
| Reporter: | Wagner Bianchi (Inactive) | Assignee: | Esa Korhonen |
| Resolution: | Fixed | Votes: | 7 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Sprint: | MXS-SPRINT-183, MXS-SPRINT-184 | ||||||||
| Description |
|
Rogue transactions executed by old connections during a switchover prevent the old master from rejoining the cluster. To prevent this, active transactions should be killed during failover. The transparent switching of the master can be achieved with a combination of waiting for transactions to commit and using transaction_replay to migrate them to the new master. Original description: As per the chat we had on our meeting to discuss issues about MaxScale and the switchover, this is for documenting what we discussed. The case is that our customers are running native GTID based replication (required by the MaxScale automatic features) used to execute the switchover to perform a rolling upgrade on servers. Most of the time, when we execute the below command...
...we have replication broken due to having the former master, now a slave, with GTIDs in a most advanced position as the current promoted master. So, neither replication and grid_strict_mode works at this point. The conversation went to the following question: How do we deal with a long transaction, not committed yet, running during the switchover?
We have these options available so we can exercise these. |
| Comments |
| Comment by Massimo [ 2019-10-31 ] |
|
When a switch happen in a intensive write environment, there are few things to keep in mind |
| Comment by Rick Pizzi [ 2019-10-31 ] |
|
When performing a switchover, we could wait for the transaction to complete, but at the same time we should block new ones from starting. And if the long transaction takes time, we are actually freezing the master. This is what is commonly done with backup, when FTWRL is issued to freeze non transactional tables when taking the backup position. Rick |