[MXS-4404] Maxscale: KafkaCDC writes to current_gtid.txt causes high disk utilisation. Created: 2022-11-17  Updated: 2022-11-22  Resolved: 2022-11-22

Status: Closed
Project: MariaDB MaxScale
Component/s: kafkacdc
Affects Version/s: 22.08.0
Fix Version/s: 2.5.24, 6.4.4, 22.08.3

Type: Bug Priority: Major
Reporter: Presnickety Assignee: markus makela
Resolution: Fixed Votes: 0
Labels: Maxscale, performance
Environment:

OS: RHEL 8.4
VM: vCenter 7.0.3, build 20150588, ESXi 6.7.
Hardware: Cisco UCS C220 M4 Small Form Factor (SFF) 1RU server.
Database: MariaDB v10.7
Load balancer: Maxscale v22.08
Three three database & three load balancer clusters hosted on three VMs.
Database cluster utilises Galera replication. Load balancers are using the read-write, read-only & kafka-cdc routers, and uses galeramon.


Attachments: PNG File KafkaCDC-01.PNG     File maxscale.cnf     File my.cnf    

 Description   

Hi, the KafkaCDC truncates, then writes, to the current_gtid.txt file for each GTID it processes. The file lives in the Maxscale data directory. We've observed this is causing very high disk utilisation (almost 100%), and double the normal system IOWait. Disk utilisation was literally 0% prior to KafkaCDC. Data appears in Kafka topic to which KafakaCDC writes to, but KafkaCDC cannot keep up with database binary logs as they are purged before reading all of them. The Kafka topic has only one partition. Kafka broker is hosted on a three-host cluster. Database has only three tables, two of which KafkaCDC excludes. Note, to minimise database contention, the transaction binary logs, Galera cache file and database logfile reside on a different virtual disk to what the database resides on.

Can you provide an option to write the GTID value to memory, instead of/as well as to file?

  1. tail -f /data10/maxscale/Kafka-CDC/current_gtid.txt
    1-1-6459191831
    tail: /data10/maxscale/Kafka-CDC/current_gtid.txt: file truncated
    1-1-6459191834
    tail: /data10/maxscale/Kafka-CDC/current_gtid.txt: file truncated
    1-1-6459191835
    tail: /data10/maxscale/Kafka-CDC/current_gtid.txt: file truncated
    1-1-6459191836
    tail: /data10/maxscale/Kafka-CDC/current_gtid.txt: file truncated
    1-1-6459191839
    tail: /data10/maxscale/Kafka-CDC/current_gtid.txt: file truncated
    1-1-6459191840
    tail: /data10/maxscale/Kafka-CDC/current_gtid.txt: file truncated
    1-1-6459191841
    tail: /data10/maxscale/Kafka-CDC/current_gtid.txt: file truncated
    1-1-6459191842
    tail: /data10/maxscale/Kafka-CDC/current_gtid.txt: file truncated
    1-1-6459191843
    tail: /data10/maxscale/Kafka-CDC/current_gtid.txt: file truncated
    1-1-6459191844
    tail: /data10/maxscale/Kafka-CDC/current_gtid.txt: file truncated
    1-1-6459191845
    tail: /data10/maxscale/Kafka-CDC/current_gtid.txt: file truncated
    1-1-6459191846
    tail: /data10/maxscale/Kafka-CDC/current_gtid.txt: file truncated
    1-1-6459191847
    tail: /data10/maxscale/Kafka-CDC/current_gtid.txt: file truncated
    1-1-6459191848
    tail: /data10/maxscale/Kafka-CDC/current_gtid.txt: file truncated
  • Database & Load balancer configs attached.
  • Netdata disk utilisation graph attached.

Thanks.



 Comments   
Comment by markus makela [ 2022-11-17 ]

The directory the file is located in is created from the value of the datadir parameter (defaults to var/lib/maxscale/ and the name of the service. For a temporary solution, you could try mounting a tmpfs filesystem at /data10/maxscale/Kafka-CDC/.

Comment by Presnickety [ 2022-11-22 ]

Hi, confirming the suggested workaround resolved the issue. Thanks.

Comment by markus makela [ 2022-11-22 ]

The file is now written using the same file handle and without truncating the old data. This should improve the performance of the GTID tracking for busy systems.

Generated at Thu Feb 08 04:28:23 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.