[MXS-3854] Automatic data purging option for KafkaCDC avro files(avsc data files) Created: 2021-11-04  Updated: 2022-03-30  Resolved: 2022-03-30

Status: Closed
Project: MariaDB MaxScale
Component/s: avrorouter, cdc
Affects Version/s: None
Fix Version/s: N/A

Type: New Feature Priority: Major
Reporter: Naresh Chandra Assignee: Todd Stoffel (Inactive)
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Duplicate
duplicates MXS-4010 provide a method to purge avro files Closed

 Description   

Add a new feature for KafkaCDC router to purge the data from avro files automatically like data retention policy. For example if we want to keep data only for on week then automatically data should purge the data and keep only 7 days data. Better add a button/option in the Maxscale GUI could be better.



 Comments   
Comment by markus makela [ 2021-12-08 ]

I'm assuming this is for the avrorouter as the kafkacdc module doesn't use avro files.

Comment by Naresh Chandra [ 2021-12-08 ]

Hi Markus,

KafkaCDC router internally storing the data as a avro files under the /var/lib/maxscale/Kafka-CDC folder. We need a auto purge option for the avro data files in the KafkaCDC router section itself not an avrorouter section.. Why because we are not using Avrorouter, we are using KafkaCDC router. So please implement the Purge option for KafkaCDC router section only,

Please find the avro files which is storing under the KafkaCDC router directory.
[root@test404 maxscale]# pwd
/var/lib/maxscale
[root@test404 maxscale]#
[root@test404 maxscale]# cd Kafka-CDC
[root@test404 Kafka-CDC]#
[root@test404 Kafka-CDC]# ls -lrth
total 76K
rw-rr- 1 maxscale maxscale 893 Dec 7 23:49 miquartz.MITIME_QRTZ_SCHEDULER_STATE.000001.avsc
rw-rr- 1 maxscale maxscale 1.1K Dec 7 23:49 g2quartz.QRTZ_SIMPLE_TRIGGERS.000001.avsc
rw-rr- 1 maxscale maxscale 2.0K Dec 7 23:49 g2quartz.QRTZ_TRIGGERS.000001.avsc
rw-rr- 1 maxscale maxscale 886 Dec 7 23:49 g2quartz.QRTZ_SCHEDULER_STATE.000001.avsc
rw-rr- 1 maxscale maxscale 654 Dec 7 23:50 bnsbatch.BATCH_JOB_SEQ.000001.avsc
rw-rr- 1 maxscale maxscale 664 Dec 7 23:50 bnsbatch.BATCH_JOB_EXECUTION_SEQ.000001.avsc
rw-rr- 1 maxscale maxscale 860 Dec 7 23:50 bnsbatch.BATCH_JOB_INSTANCE.000001.avsc
rw-rr- 1 maxscale maxscale 1.6K Dec 7 23:50 bnsbatch.BATCH_JOB_EXECUTION.000001.avsc
rw-rr- 1 maxscale maxscale 1.3K Dec 7 23:50 bnsbatch.BATCH_JOB_EXECUTION_PARAMS.000001.avsc
rw-rr- 1 maxscale maxscale 790 Dec 7 23:50 bnsbatch.BATCH_JOB_EXECUTION_CONTEXT.000001.avsc
rw-rr- 1 maxscale maxscale 665 Dec 7 23:50 bnsbatch.BATCH_STEP_EXECUTION_SEQ.000001.avsc
rw-rr- 1 maxscale maxscale 2.2K Dec 7 23:50 bnsbatch.BATCH_STEP_EXECUTION.000001.avsc
rw-rr- 1 maxscale maxscale 792 Dec 7 23:50 bnsbatch.BATCH_STEP_EXECUTION_CONTEXT.000001.avsc
rw-rr- 1 maxscale maxscale 1.4K Dec 7 23:50 g2auth.country_default_groups.000001.avsc
rw-rr- 1 maxscale maxscale 2.8K Dec 7 23:53 portus.users.000001.avsc
rw-rr- 1 maxscale maxscale 1.5K Dec 7 23:53 proquote_gbm.odometer_bin_buckets.000001.avsc
rw-rr- 1 maxscale maxscale 1.1K Dec 7 23:54 miquartz.MITIME_QRTZ_SIMPLE_TRIGGERS.000001.avsc
rw-rr- 1 maxscale maxscale 2.0K Dec 7 23:54 miquartz.MITIME_QRTZ_TRIGGERS.000001.avsc
rw-rr- 1 maxscale maxscale 71 Dec 7 23:54 current_gtid.txt
[root@test404 Kafka-CDC]#

Please find the KafkaCDC router config details.

[server4]
type=server
address=10.142.1.143
port=3306
protocol=MariaDBClient
authenticator = mariadbauth

[MariaDB-Monitor]
type=monitor
module=mariadbmon
servers=server4
user=db_test
password=test123
monitor_interval=10s

[Kafka-CDC]
type=service
router=kafkacdc
servers=server4
user=db_test
password=test123
bootstrap_servers=test-kafka4.dev.com:9091
topic=testdev_db
gtid=0-2-41135150,1-1-156510926,110-1-1489,120-2-2278,130-3-23,140-4-468287
kafka_ssl=ON
kafka_sasl_user=test-dev-user
kafka_sasl_password="XNDi03422kil"
kafka_sasl_mechanism=SCRAM-SHA-256

Comment by markus makela [ 2021-12-08 ]

Oh, those are the schema files that are needed by the kafkacdc. They contain only metadata about the tables and don't contain the data itself which means they'll never grow in size.

Comment by Naresh Chandra [ 2021-12-08 ]

oh okay, then the source data directly stream from Mariadb to Kafkabroker without storing locally in the maxscale right?

Comment by markus makela [ 2021-12-08 ]

Yeah, the data is never stored on disk.

Comment by Naresh Chandra [ 2021-12-08 ]

okay, thanks for the update. If we need then we can change this ticket to avrorouter, so that we can do purge for avro files.

Comment by markus makela [ 2022-03-30 ]

I'll close this as a duplicate of MXS-4010 even if we usually keep the oldest issue. This is because for MXS-4010 already exists that refers to that ticket.

Generated at Thu Feb 08 04:24:26 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.