Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
21.06.16, 22.08.12, 23.02.9
-
None
Description
The fixes to MXS-4979 and MXS-5010 resulted in a regression where an early mismatching response to a session command does not correctly trigger the closing of the connection. This results in the connection being left open with a diverged session state.
In practice this seems to happen with COM_STMT_PREPARE where warnings for unknown PS IDs are logged. Here is the output from the test case that reproduces the problem:
2025-03-03 09:54:20 info : (6) > Autocommit: [enabled], trx is [not open], cmd: (0x16) COM_STMT_PREPARE, plen: 30, type: QUERY_TYPE_READ|QUERY_TYPE_PREPARE_STMT, stmt: SELECT id, a FROM test.t1
|
2025-03-03 09:54:20 info : (6) (RW-Split-Router); > Autocommit: [enabled], trx is [not open], cmd: (0x16) COM_STMT_PREPARE, plen: 30, type: QUERY_TYPE_READ|QUERY_TYPE_PREPARE_STMT, stmt: SELECT id, a FROM test.t1
|
2025-03-03 09:54:20 info : (6) [readwritesplit] (RW-Split-Router); Session write, routing to all servers.
|
2025-03-03 09:54:20 info : (6) [readwritesplit] (RW-Split-Router); Route query to master: server1
|
2025-03-03 09:54:20 info : (6) [readwritesplit] (RW-Split-Router); Route query to slave: server2
|
2025-03-03 09:54:20 info : (6) [readwritesplit] (RW-Split-Router); Will return response from 'server1' to the client
|
2025-03-03 09:54:20 info : (6) [readwritesplit] (RW-Split-Router); Reply complete from 'server2', discarding it: Error: 1054, 42S22 Unknown column 'a' in 'SELECT'
|
2025-03-03 09:54:22 info : (6) PS internal ID 3 maps to external ID 206 on server 'server1'
|
2025-03-03 09:54:22 info : (6) [readwritesplit] (RW-Split-Router); Reply complete from 'server1' (OK: 0 warnings)
|
2025-03-03 09:54:22 info : (6) [MariaDBProtocol] (RW-Split-Router); Added COM_STMT_PREPARE to history with ID 3: SELECT id, a FROM test.t1 (result: OK)
|
2025-03-03 09:54:22 info : (6) > Autocommit: [enabled], trx is [not open], cmd: (0x17) COM_STMT_EXECUTE, plen: 14, type: QUERY_TYPE_READ, stmt: ID: 3
|
2025-03-03 09:54:22 info : (6) (RW-Split-Router); > Autocommit: [enabled], trx is [not open], cmd: (0x17) COM_STMT_EXECUTE, plen: 14, type: QUERY_TYPE_READ, stmt: ID: 3
|
2025-03-03 09:54:22 info : (6) [readwritesplit] (RW-Split-Router); Route query to slave: server2 <
|
2025-03-03 09:54:22 info : (6) [readwritesplit] (RW-Split-Router); COM_STMT_EXECUTE on server2
|
2025-03-03 09:54:22 warning: (6) (server2); MaxScale has encountered an unexpected situation: Unknown prepared statement handler (3) given to MaxScale for COM_STMT_EXECUTE by 'skysql'@'::ffff:192.168.122.1'. Add 'session_trace=200' under the [maxscale] section to enable session level tracing to make the debugging of this problem easier.
|
2025-03-03 09:54:22 warning: (6) (server2); Unknown prepared statement handler (3) given to MaxScale for COM_STMT_EXECUTE by 'skysql'@'::ffff:192.168.122.1'
|
This simulates the case where the replication of an ALTER TABLE ... ADD COLUMN command was too slow and the added field didn't exist on server2 which results in the COM_STMT_PREPARE failing. The correct action would've been to close the connections that returned mismatching responses.
The cause of this bug is an inverted boolean condition in the no_longer_in_history() function which treats the latest command as never being in the history when it should treat it as always being in the history.