Details
-
New Feature
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
The switchover process drops the Master status bit which will cause ongoing transactions to break. Without transaction_replay=true in readwritesplit, this will cause all partial transactions to fail and even with it, the transaction gets replayed which is both wasted work as well as a potential failure point.
An improvement to this would be to communicate the intention to demote a Master server by flagging it as being drained of transactions. This flag could then be read by readwritesplit that then proceeds to only allow old transactions to commit while blocking new transactions. The monitor could then wait for the transactions to finish before proceeding with the switchover.
Another problem that this would avoid is preventing transactions when the user has the SUPER or READ ONLY ADMIN grant. In these cases, writes on the database will ignore the value of read_only which interferes with the switchover process. If readwritesplit were to stop routing transactions when the switchover would start, this problem would be mitigated.
A simple implementation for the waiting could be done by waiting for a fixed amount of time after flagging the server as being drained of transactions before continuing with the switchover. This would give transactions some time to finish as well as the replication some time to catch up. The flag could be set very early on to speed up the whole switchover process.
For readwritesplit, the routing should fail if the current primary is marked as being drained of transactions and a transaction is not open. If a transaction is open, the routing can proceed normally. If a transaction is about to start, the routing should fail. If delayed_retry is enabled, the query will wait for delayed_retry_timeout before failing.