Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Cannot Reproduce
-
2.5.9
-
google cloud
-
MXS-SPRINT-130
Description
user reported that they cannot use `readwritesplit` router for read-only services after upgrading to 2.5.9. Before, maxscale was v2.5.6.
Mar 16 18:39:36 maxscale-staging-1 maxscale[3565997]: (931) [mariadbclient] Routing the query failed. Session will be closed. |
Mar 16 18:39:54 maxscale-staging-1 maxscale[3565997]: (932) [readwritesplit] (mariadb-reporting-service) [mariadb-reporting-service] Write query received from tocktix7@::ffff:10.8.11.4. Could not find a valid master connection. Closing |
client connection.
|
Mar 16 18:39:54 maxscale-staging-1 maxscale[3565997]: (932) [readwritesplit] (mariadb-reporting-service) Could not find valid server for target type TARGET_MASTER (COM_QUERY: /* Username: mieko@tockhq.com, Task ID: 581faba5-0c08-4661-9b |
55-48f2e4b91991, Scheduled: False, Query Hash: 0e1a2394e409cfa1542e82c8e57f4ce8, Queue: queries, Query ID: adhoc */ select business.name, business.id, business_details.address_lat, business_details.address_lng, business.is_deleted, busi |
ness.is_authorized_by_tock, business.created_at
|
from business_details
|
join business on business.business_details_id = business_details.id
|
where business_details.address_lat is null |
;), closing connection.
|
|
name: [node_secondary-0] status: [Slave, Running] state: [IN_USE] last opened at: [Tue Mar 16 18:39:54 2021] last closed at: [not closed] last close reason: [] num sescmd: [0] |
name: [node_secondary-1] status: [Slave, Running] state: [IN_USE|WAITING_RESULT] last opened at: [Tue Mar 16 18:39:54 2021] last closed at: [not closed] last close reason: [] num ses |
cmd: [1] |
Attached config and log file.
The query classification for those queries seems correct: most of the failing queries are classified as reads which means that something else causes the target type to be set to TARGET_MASTER. One possible explanation might be that there's a stored procedure call but this wouldn't work with MaxScale 2.5.6 either.
The remaining thing that could cause this to happen is if causal_reads=fast somehow causes the server selection to fail because some of the servers don't have the required GTID. The problem is that this should only be possible if the connection caused something to generate a transaction with a GTID that is then returned by the server in the @@last_gtid variable. Otherwise the code should ignore the current GTID value.