[MXS-4776] Sescmd target selection is sub-optimal with lazy_connect Created: 2023-09-23  Updated: 2023-09-26  Resolved: 2023-09-26

Status: Closed
Project: MariaDB MaxScale
Component/s: readwritesplit
Affects Version/s: 6.4.10
Fix Version/s: 23.08.2

Type: Bug Priority: Major
Reporter: markus makela Assignee: markus makela
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MXS-4390 Deferred execution of session commands Open

 Description   

The behavior of lazy_connect is somewhat hardwired to expect it to be used for writes. As is described in MXS-4769, this is very bad for read-only workloads as it forces work to be done on the master node of the cluster when all the work could be offloaded to a replica. This currently makes it very hard to use for workloads where communication with the master is slow e.g edge computing with local replicas and a remote master server.

The documentation for lazy_connect is also ambiguous as it does not cover what happens when a session command is executed:

If the client executes only read queries, no connection to the primary is made. If only write queries are made, only the primary connection is used.

A solution to this is would be to check in create_one_connection_for_sescmd() whether a master connection is truly needed. The existing need_master_for_sescmd() function already exposes this information but this is not checked inside create_one_connection_for_sescmd(). For normal workloads, a master connection should still be preferred, if available, but when lazy_connect is enabled, it should prefer the server with the lowest "score" like normal queries are. This does, however, cause an extra connection to be created for workloads that start with session commands but then always perform writes. To behave correctly in all cases would require that readwritesplit knows what the first command to be executed is. An acceptable compromise might be to check if master_accept_reads and lazy_connect are both enabled and then prefer the master server for the first session command.

The most reliable way to have the correct behavior would be to introduce a new parameter that controls whether lazy_connect should prefer a master connection or not for the first session command. However this will cause an extra parameter to be introduced that is very specific to a single use-case and even then the behavior is not ideal. The deferred session command execution described in MXS-4390 would solve it without needing to configure anything in MaxScale apart from possible resource limits.

As such, this would allow the existing server selection code to be used for connection creation when done in the context of session commands. This also fixes the problem where session commands that would end up picking a replica were not load balanced but instead always choose the first valid server.


Generated at Thu Feb 08 04:31:04 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.