Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
2.5.23, 6.4.4, 22.08.3
-
None
-
MXS-SPRINT-174
Description
The following transaction demonstrates the problem:
BEGIN; |
SELECT id FROM db1.t1 FOR UPDATE; |
UPDATE db1.t1 SET id = id + 1 WHERE id = 1; |
COMMIT; |
The BEGIN and COMMIT are routed according to the queries without explicit databases case mentioned in the limitations. The result of this is that the queries are not routed to the same server where the SQL content of the transaction is sent.
Here is a log excerpt from MaxScale that demonstrates the problem:
2023-01-03 12:14:24 info : (3) Found matching user 'maxuser'@'127.0.0.1' for client 'maxuser'@'::ffff:127.0.0.1' with sufficient privileges.
|
2023-01-03 12:14:24 info : (3) [schemarouter] (SchemaRouter-Router); Connected Master, Running in 'server1'
|
2023-01-03 12:14:24 info : (3) [schemarouter] (SchemaRouter-Router); Connected Slave, Running in 'server2'
|
2023-01-03 12:14:24 info : (3) [schemarouter] (SchemaRouter-Router); Connected Master, Running in 'server3'
|
2023-01-03 12:14:24 info : (3) [schemarouter] (SchemaRouter-Router); Connected Running in 'server4'
|
2023-01-03 12:14:24 info : (3) Started SchemaRouter-Router client session [3] for 'maxuser' from ::ffff:127.0.0.1
|
2023-01-03 12:14:24 info : (3) Connected to 'server1' with thread id 19
|
2023-01-03 12:14:24 info : (3) Connected to 'server2' with thread id 19
|
2023-01-03 12:14:24 info : (3) Connected to 'server3' with thread id 16621
|
2023-01-03 12:14:24 info : (3) Connected to 'server4' with thread id 764
|
2023-01-03 12:14:24 info : (3) Authentication to 'server4' succeeded.
|
2023-01-03 12:14:24 info : (3) Authentication to 'server1' succeeded.
|
2023-01-03 12:14:24 info : (3) Authentication to 'server2' succeeded.
|
2023-01-03 12:14:24 info : (3) Authentication to 'server3' succeeded.
|
2023-01-03 12:14:24 info : (3) > Autocommit: [enabled], trx is [not open], cmd: (0x03) COM_QUERY, plen: 10, type: QUERY_TYPE_BEGIN_TRX, stmt: begin
|
2023-01-03 12:14:24 info : (3) [schemarouter] (SchemaRouter-Router); > Command: COM_QUERY, stmt: begin
|
2023-01-03 12:14:24 info : (3) [schemarouter] (SchemaRouter-Router); Route query to server1 <
|
2023-01-03 12:14:24 info : (3) [schemarouter] (SchemaRouter-Router); Reply complete from 'server1'
|
2023-01-03 12:14:24 info : (3) > Autocommit: [enabled], trx is [not open], cmd: (0x03) COM_QUERY, plen: 37, type: QUERY_TYPE_READ|QUERY_TYPE_WRITE, stmt: select id from db1.t1 for update
|
2023-01-03 12:14:24 info : (3) [schemarouter] (SchemaRouter-Router); > Command: COM_QUERY, stmt: select id from db1.t1 for update
|
2023-01-03 12:14:24 info : (3) [schemarouter] (SchemaRouter-Router); Query targets table on server 'server4'
|
2023-01-03 12:14:24 info : (3) [schemarouter] (SchemaRouter-Router); Route query to server4 <
|
2023-01-03 12:14:24 info : (3) [schemarouter] (SchemaRouter-Router); Reply complete from 'server4'
|
2023-01-03 12:14:24 info : (3) > Autocommit: [enabled], trx is [not open], cmd: (0x03) COM_QUERY, plen: 34, type: QUERY_TYPE_WRITE, stmt: update db1.t1 set id = id + 1
|
2023-01-03 12:14:24 info : (3) [schemarouter] (SchemaRouter-Router); > Command: COM_QUERY, stmt: update db1.t1 set id = id + 1
|
2023-01-03 12:14:24 info : (3) [schemarouter] (SchemaRouter-Router); Query targets table on server 'server4'
|
2023-01-03 12:14:24 info : (3) [schemarouter] (SchemaRouter-Router); Route query to server4 <
|
2023-01-03 12:14:24 info : (3) [schemarouter] (SchemaRouter-Router); Reply complete from 'server4'
|
2023-01-03 12:14:24 info : (3) > Autocommit: [enabled], trx is [not open], cmd: (0x03) COM_QUERY, plen: 11, type: QUERY_TYPE_COMMIT, stmt: commit
|
2023-01-03 12:14:24 info : (3) [schemarouter] (SchemaRouter-Router); > Command: COM_QUERY, stmt: commit
|
2023-01-03 12:14:24 info : (3) [schemarouter] (SchemaRouter-Router); Route query to server1 <
|
2023-01-03 12:14:24 info : (3) [schemarouter] (SchemaRouter-Router); Reply complete from 'server1'
|
2023-01-03 12:14:24 info : (3) > Autocommit: [enabled], trx is [not open], cmd: (0x01) COM_QUIT, plen: 5, type: QUERY_TYPE_SESSION_WRITE, stmt:
|
2023-01-03 12:14:24 info : (3) [schemarouter] (SchemaRouter-Router); > Command: COM_QUIT, stmt:
|
2023-01-03 12:14:24 info : (3) [schemarouter] (SchemaRouter-Router); Route query to master: server1
|
2023-01-03 12:14:24 info : (3) [schemarouter] (SchemaRouter-Router); Route query to slave: server2
|
2023-01-03 12:14:24 info : (3) [schemarouter] (SchemaRouter-Router); Route query to master: server3
|
2023-01-03 12:14:24 info : (3) [schemarouter] (SchemaRouter-Router); Route query to slave: server4
|
2023-01-03 12:14:24 info : (3) Stopped SchemaRouter-Router client session [3]
|
This also affects transactions that use autocommit=0 but commit it using the COMMIT SQL statement. Transactions that commit the transaction with SET autocommit=1 are not affected by this.
The ideal thing to do would be to delay the routing of the explicit transaction start until the server on which it is started is known. A less ideal solution that would still solve this is to treat all BEGIN and COMMIT commands as session commands and route them to all backends.
The case for ROLLBACK and COMMIT is simple: assuming the transaction is not empty and has one target, the logical destination for the command is the same target where the previous command was routed. This will result in correct behavior when the transaction is located completely on a single target. However, transactions that touch more than one shard will result in unwanted behavior where only one shard receives the COMMIT.
Another approach would be to route all BEGIN, COMMIT and ROLLBACK statements to all nodes. This would cause the transaction to be committed on all nodes but it would result in an increase of traffic on all shards. Neither of these approaches provide cross-shard consistency for transactions but the latter would cause BEGIN to also start a transaction on all nodes, same as SET autocommit=0.