[MXS-4404] Maxscale: KafkaCDC writes to current_gtid.txt causes high disk utilisation. Created: 2022-11-17 Updated: 2022-11-22 Resolved: 2022-11-22 |
|
| Status: | Closed |
| Project: | MariaDB MaxScale |
| Component/s: | kafkacdc |
| Affects Version/s: | 22.08.0 |
| Fix Version/s: | 2.5.24, 6.4.4, 22.08.3 |
| Type: | Bug | Priority: | Major |
| Reporter: | Presnickety | Assignee: | markus makela |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | Maxscale, performance | ||
| Environment: |
OS: RHEL 8.4 |
||
| Attachments: |
|
| Description |
|
Hi, the KafkaCDC truncates, then writes, to the current_gtid.txt file for each GTID it processes. The file lives in the Maxscale data directory. We've observed this is causing very high disk utilisation (almost 100%), and double the normal system IOWait. Disk utilisation was literally 0% prior to KafkaCDC. Data appears in Kafka topic to which KafakaCDC writes to, but KafkaCDC cannot keep up with database binary logs as they are purged before reading all of them. The Kafka topic has only one partition. Kafka broker is hosted on a three-host cluster. Database has only three tables, two of which KafkaCDC excludes. Note, to minimise database contention, the transaction binary logs, Galera cache file and database logfile reside on a different virtual disk to what the database resides on. Can you provide an option to write the GTID value to memory, instead of/as well as to file?
Thanks. |
| Comments |
| Comment by markus makela [ 2022-11-17 ] |
|
The directory the file is located in is created from the value of the datadir parameter (defaults to var/lib/maxscale/ and the name of the service. For a temporary solution, you could try mounting a tmpfs filesystem at /data10/maxscale/Kafka-CDC/. |
| Comment by Presnickety [ 2022-11-22 ] |
|
Hi, confirming the suggested workaround resolved the issue. Thanks. |
| Comment by markus makela [ 2022-11-22 ] |
|
The file is now written using the same file handle and without truncating the old data. This should improve the performance of the GTID tracking for busy systems. |