[MXS-4330] Xpand parallel replication does not work with maxscale Created: 2022-09-29  Updated: 2023-02-08  Resolved: 2023-02-08

Status: Closed
Project: MariaDB MaxScale
Component/s: Core, readwritesplit
Affects Version/s: 2.5.21, 6.3.1, 6.4.2, 22.08.1
Fix Version/s: 23.02.0

Type: New Feature Priority: Major
Reporter: Hristiyan Nikolov (Inactive) Assignee: markus makela
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocks
Sprint: MXS-SPRINT-176

 Description   

I tried to setup xpand parallel replication and I this error. This is on the xpand master.

2022-09-29 10:54:11.949066 UTC nid 1 hnn-xpand1-mxp-0.hnn-xpand1-mxp clxnode: INFO mysql/replication/master.c:534 start_replication_master(): {conn_id=214017 closing=0 lock count=1}: Replication request from IPv4(10.156.15.221:56124) for xpand-bin: cid 0x6335778e10085806 pos { proto: clx current_commit_id: 0x6335778e10085806 current_event_id: 2 flags: WANT_ROWS SLOW_START batch_size: 21474836480 low_bound: 15372286728091293014 high_bound: 18446744073709551615}
2022-09-29 10:54:11.951066 UTC nid 1 hnn-xpand1-mxp-0.hnn-xpand1-mxp clxnode: INFO mysql/replication/master.c:534 start_replication_master(): {conn_id=215041 closing=0 lock count=1}: Replication request from IPv4(10.156.15.221:56132) for xpand-bin: cid 0x6335778e10085806 pos { proto: clx current_commit_id: 0x6335778e10085806 current_event_id: 2 flags: WANT_STATEMENTS SLOW_START batch_size: 21474836480 low_bound: 0 high_bound: 0}
2022-09-29 10:54:12.392068 UTC nid 1 hnn-xpand1-mxp-0.hnn-xpand1-mxp clxnode: WARNING mysql/replication/master.c:96 mysql_master_close(): Replication master error: slave IPv4(10.156.15.221:56124): Session killed due to closed client socket: 
2022-09-29 10:54:12.393068 UTC nid 1 hnn-xpand1-mxp-0.hnn-xpand1-mxp clxnode: WARNING mysql/replication/master.c:96 mysql_master_close(): Replication master error: slave IPv4(10.156.15.221:56132): Session killed due to closed client socket: 

Maxscale logs on the master:

2022-09-29 12:09:10   error  : (54) Unexpected result state. cmd: 0x00, len: 28 server: @@Xpand-Monitor:node-2
2022-09-29 12:09:10   error  : (54) Unexpected result state. cmd: 0x00, len: 28 server: @@Xpand-Monitor:node-2
2022-09-29 12:09:10   error  : (54) Unexpected result state. cmd: 0x00, len: 28 server: @@Xpand-Monitor:node-2
2022-09-29 12:09:10   error  : (54) Unexpected result state. cmd: 0x00, len: 28 server: @@Xpand-Monitor:node-2 (subsequent similar messages suppressed for 10000 milliseconds)
2022-09-29 12:09:29   error  : (60) Unexpected result state. cmd: 0x00, len: 28 server: @@Xpand-Monitor:node-2
2022-09-29 12:09:29   error  : (60) Unexpected result state. cmd: 0x00, len: 334 server: @@Xpand-Monitor:node-2
2022-09-29 12:09:29   error  : (60) Unexpected result state. cmd: 0x00, len: 28 server: @@Xpand-Monitor:node-2
2022-09-29 12:09:29   error  : (60) Unexpected result state. cmd: 0x00, len: 28 server: @@Xpand-Monitor:node-2
2022-09-29 12:09:29   error  : (60) Unexpected result state. cmd: 0x00, len: 197 server: @@Xpand-Monitor:node-2
2022-09-29 12:09:29   error  : (60) Unexpected result state. cmd: 0x00, len: 28 server: @@Xpand-Monitor:node-2
2022-09-29 12:09:29   error  : (60) Unexpected result state. cmd: 0x00, len: 28 server: @@Xpand-Monitor:node-2
2022-09-29 12:09:29   error  : (60) Unexpected result state. cmd: 0x00, len: 120 server: @@Xpand-Monitor:node-2
2022-09-29 12:09:29   error  : (60) Unexpected result state. cmd: 0x00, len: 28 server: @@Xpand-Monitor:node-2
2022-09-29 12:09:29   error  : (60) Unexpected result state. cmd: 0x00, len: 28 server: @@Xpand-Monitor:node-2 (subsequent similar messages suppressed for 10000 milliseconds)

I have added one database with 1 table with 2 rows in the master xpand.
Only the database and the table were replicated very slowly, but the two rows in the table never got replicated.
On the same xpands I changed the two maxscales images to version 6.2.3 then I setup the replication again and all the data was replicated.



 Comments   
Comment by markus makela [ 2022-09-29 ]

Please fill in affected version, attach the configuration you used and explain a bit more how you configured the replication.

Comment by markus makela [ 2022-09-30 ]

OK, managed to reproduce this and it is what I suspected: Xpand sends an unknown command 66 (0x42) which is not how standard replication is started. The only difference between those two versions is that MaxScale now prevents crashes by closing the connections immediately when an unexpected internal state is encountered. This was never something that was expected to work given that Xpand replication is started with a non-standard command.

Using an older version is not a safe solution as it is prone to crash at any moment since the old behavior was essentially undefined behavior.

Comment by Todd Stoffel (Inactive) [ 2022-09-30 ]

I'm moving this to a new feature request since this only worked in old versions by accident. This was never a designed feature of MaxScale to begin with. This feature predates the GA implementation of parallel replication in Xpand. Additionally Xpand replication is a different beast than MariaDB replication.

Comment by Manjinder Nijjar [ 2022-09-30 ]

I did a bisect on maxscale versions and it seems this broke in maxscale version 6.3.1. This used to work fine until 6.3.0.

Comment by Manjinder Nijjar [ 2022-10-01 ]

I have filled XPT-501 (http://bugs.colo.sproutsys.com/show_bug.cgi?id=36252) for Xpand engineering. This issue is not observed with serial replication so we are safe there.

Comment by Tim Deeb-Swihart (Inactive) [ 2022-10-06 ]

Event 66 (0x42) is the command sent from the slave to initiate parallel replication. It's not a bug but part of our parallel replication protocol. Parallel replication is not standard mysql replication but an extension of the protocol

Comment by Tim Deeb-Swihart (Inactive) [ 2022-10-06 ]

Xpand extended the wire protocol with three additional commands: 0x24, 0x42, and 0x43

Comment by markus makela [ 2023-02-02 ]

timods can you clarify whether the 0x24 and 0x43 are commands or whether they are replication event types?

Comment by Tim Deeb-Swihart (Inactive) [ 2023-02-02 ]

They are events

Generated at Thu Feb 08 04:27:52 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.