[MXS-4678] Maxscale 3-node cluster with KafkaCDC sends payload all tagged as `"server_id":1,`. Created: 2023-07-21 Updated: 2023-08-04 Resolved: 2023-08-04 |
|
| Status: | Closed |
| Project: | MariaDB MaxScale |
| Component/s: | kafkacdc |
| Affects Version/s: | 22.08.0 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Minor |
| Reporter: | Presnickety | Assignee: | markus makela |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | replication | ||
| Environment: |
RHEL 8 |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
Hi, We're seeing a potential issue with KafkaCDC in a three instance setup of Maxscale, all data appears at the Kafka topic tagged as server_id=1 e.g. we don't see server_id=2 or server_id=3 in the payload. Unsure if it's related to Kafka partition keys or because we're using Galera/galeramon etc..,, however 3 copies of each change appears on the topic all tagged as server_id=1. Example topic output below; {"domain":1,"server_id":1,"sequence":11075901205,"event_number":1,"timestamp":1689904677,"event_type":"update_before","table_schema":"Merged_Datastore","table_name":"session",.........} {"domain":1,"server_id":1,"sequence":11075901205,"event_number":2,"timestamp":1689904677,"event_type":"update_after","table_schema":"Merged_Datastore","table_name":"session",.........}Setup details: Data is sent to a Kafka topic mds-cpvnf-process-notifier, setup on a container as per below; The my.cnf's, maxscale.cnf's & galera.cnf's from the 3 nodes is attached. Thanks. |
| Comments |
| Comment by markus makela [ 2023-07-21 ] |
|
The reason why you get three events for the same changes is because cooperative_replication does not currently support Galera clusters:
Galera does not officially support GET_LOCK() which prevents the mariadbmon approach from being used. Most likely an invasive approach of using an actual table where the MaxScale instances would write data would be needed. This would of course pollute the data stream with events internal to MaxScale which isn't very nice. You could work around this limitation in a few different ways. The simplest way is to put the KafkaCDC service into a separate MaxScale configuration and run only one copy of that. This trades the high-availability aspect of the setup for simplicity. If you are doing this with some sort of a container orchestration system (i.e. kubernetes), the failure time of a single non-critical container should be acceptably low to make this a usable approach. The other option is to have a normal MariaDB server that replicates from the Galera cluster by first going through MaxScale. You can point the MariaDB server to either the readconnroute or readwritesplit services and it'll route the replication traffic to a valid Galera node. This is all done so you can use mariadbmon, which does support cooperative monitoring, with that server. |
| Comment by Presnickety [ 2023-07-24 ] |
|
Hi, Just to confirm, if we switched to mariadbmon we would have each server payload appear on it's own partition in the Kafka topic? The reason we chose Galera was for the (1) multimaster feature, which by the way continually failed due to certification errors, so we stayed with the one write master, and (2) the cluster self-recovery after replication failures, though this doesn't always work. Thanks. |
| Comment by markus makela [ 2023-07-24 ] |
|
No, switching to mariadbmon would only get rid of the triplicate events that get sent to Kafka, the events would all still have server_id: 1. The reason why all events have server_id: 1 is because you're writing the events to only one node and Galera assigns the server_id component of the GTID based on where the event originated from. The domain, server_id and sequence fields are the broken-down version of the GTID and combined with the event_number field allow you to uniquely identify an event. These four values are also used as the key when the message is published to the broker and can be used to perform deduplication on the consumer side. The message key is of the format domain-server_id-sequence:event_number e.g. a GTID for domain 0 and server_id 123 at the sequence 4 with two events inside the transaction would look like 0-123-4:1 for the first event in the transaction and 0-123-4:2 for the second event. You can also use this to reconstruct the order in which the events arrived even if they end up being consumed out-of-order. |
| Comment by markus makela [ 2023-08-04 ] |
|
I'll close this as Not a Bug since this is currently expected behavior with Galera. |