Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
2.5.16
-
None
-
MXS-SPRINT-144
Description
When a read-only command is executed inside of a read-only transaction with causal_reads enabled, it is possible that the causal read times out and the query is retried on the master. As this is implemented as a routing hint, the retry will cause the query to fail with an error that state that the master server has changed.
Unfortunately it's not possible to handle this correctly without replaying the whole transaction. When causal_reads is enabled with transaction_replay, the whole transaction could be retried on the current master server where we know the data is up to date. However, if transaction_replay is not enabled, then there's no way to read data that is guaranteed to be up to date which in turn means a decision has to be made: either to report an error to the client that a causal read was not possible or to return a result that can be stale.
The simple solution is to return an error to the client if the causal read fails inside of a read-only transaction. This does not fully fix the original problem as an error will still be returned but at least the whole MaxScale connection is not closed. An alternative, but more complex to implement, solution would be to re-execute the query but this time without the causal read SQL attached to it. Since this information is not currently expressed by readwritesplit, a new mechanism would need to be added.