Rogue transactions executed by old connections during a switchover prevent the old master from rejoining the cluster. To prevent this, active transactions should be killed during failover. The transparent switching of the master can be achieved with a combination of waiting for transactions to commit and using transaction_replay to migrate them to the new master.
As per the chat we had on our meeting to discuss issues about MaxScale and the switchover, this is for documenting what we discussed. The case is that our customers are running native GTID based replication (required by the MaxScale automatic features) used to execute the switchover to perform a rolling upgrade on servers.
Most of the time, when we execute the below command...
...we have replication broken due to having the former master, now a slave, with GTIDs in a most advanced position as the current promoted master. So, neither replication and grid_strict_mode works at this point. The conversation went to the following question:
How do we deal with a long transaction, not committed yet, running during the switchover?
- Wait for the long transaction to finish?
- Have the transaction replay (2.3++) and kill the master to force the failover?
- Have the transaction replay (2.3++), set maintenance --force, force the failover?
We have these options available so we can exercise these.