Details
-
Bug
-
Status: Closed (View Workflow)
-
Minor
-
Resolution: Not a Bug
-
22.08.0
-
RHEL 8
Description
Hi,
We're seeing a potential issue with KafkaCDC in a three instance setup of Maxscale, all data appears at the Kafka topic tagged as server_id=1 e.g. we don't see server_id=2 or server_id=3 in the payload. Unsure if it's related to Kafka partition keys or because we're using Galera/galeramon etc..,, however 3 copies of each change appears on the topic all tagged as server_id=1.
Example topic output below;
{"domain":1,"server_id":1,"sequence":11075901205,"event_number":1,"timestamp":1689904677,"event_type":"update_before","table_schema":"Merged_Datastore","table_name":"session",.........} {"domain":1,"server_id":1,"sequence":11075901205,"event_number":2,"timestamp":1689904677,"event_type":"update_after","table_schema":"Merged_Datastore","table_name":"session",.........}Setup details:
The Apache Kafka topic has been setup with 3 partitions and replication factor of 3 across 3 node cluster. We have a 3 node cluster of MariaDB with Galera repplication utilising binary logging & GTIDs. For redundancy we also have an instance of Maxscale on each MariaDB node.
Data is sent to a Kafka topic mds-cpvnf-process-notifier, setup on a container as per below;
$ podman exec -ti kafka-sidecar-ap /opt/kafka/bin/kafka-topics.sh --zookeeper 10.195.241.85:2181 --topic mds-cpvnf-process-notifier --create --partitions 3 --replication-factor 3 --config retention.ms=3600000
The my.cnf's, maxscale.cnf's & galera.cnf's from the 3 nodes is attached.
Thanks.
Attachments
Issue Links
- is blocked by
-
MDEV-30473 Do not allow GET_LOCK() / RELEASE_LOCK() in cluster
-
- Closed
-
The reason why you get three events for the same changes is because cooperative_replication does not currently support Galera clusters:
Galera does not officially support GET_LOCK() which prevents the mariadbmon approach from being used. Most likely an invasive approach of using an actual table where the MaxScale instances would write data would be needed. This would of course pollute the data stream with events internal to MaxScale which isn't very nice.
You could work around this limitation in a few different ways. The simplest way is to put the KafkaCDC service into a separate MaxScale configuration and run only one copy of that. This trades the high-availability aspect of the setup for simplicity. If you are doing this with some sort of a container orchestration system (i.e. kubernetes), the failure time of a single non-critical container should be acceptably low to make this a usable approach.
The other option is to have a normal MariaDB server that replicates from the Galera cluster by first going through MaxScale. You can point the MariaDB server to either the readconnroute or readwritesplit services and it'll route the replication traffic to a valid Galera node. This is all done so you can use mariadbmon, which does support cooperative monitoring, with that server.