[MXS-1271] cdc.py consuming 100% of CPU and never sending to kafka Created: 2017-05-22 Updated: 2017-06-20 Resolved: 2017-06-20 |
|
| Status: | Closed |
| Project: | MariaDB MaxScale |
| Component/s: | avrorouter |
| Affects Version/s: | 2.0.5 |
| Fix Version/s: | 2.0.6, 2.1.4 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Josh Becker | Assignee: | markus makela |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Ubuntu 14.04 |
||
| Attachments: |
|
| Description |
|
When using the cdc.py tool to pipe data to kafka it unfortunately fails for a table with a lot of changes. If I use the MariaDB table which has much fewer changes (lets say, our users): It works properly (and still pegs the CPU at 100%). But when I switch to another table (items) with more data/changes, it pegs the CPU at 100%, and nothing is ever sent to the kafka topic. The server running the script is on AWS (m4.large). |
| Comments |
| Comment by markus makela [ 2017-05-23 ] | |||||||
|
This is possibly fixed by commit 7d7d8a0560023f84bcc292d0b74b98a00fb5c910. I would suggest trying out this version of the cdc_kafka script: https://github.com/mariadb-corporation/MaxScale/tree/2.0/server/modules/protocol/examples | |||||||
| Comment by Josh Becker [ 2017-05-23 ] | |||||||
|
That is the version I am using. https://gist.github.com/Geesu/4b3a9316afe082c49b57e1b8d4a5a376 I'm using the 2.0.5 build you created for me in another issue ( | |||||||
| Comment by markus makela [ 2017-05-23 ] | |||||||
|
Ah, then this is a new bug. Does the cdc.py script work correctly without the Kafka producer part? | |||||||
| Comment by Josh Becker [ 2017-05-23 ] | |||||||
|
Yes, and it works correctly if I specify a table that doesn't have as many changes. | |||||||
| Comment by Josh Becker [ 2017-05-23 ] | |||||||
|
I'm actually quite surprised it hasn't come up before, it feels like this script could easily be a bottleneck for other clients trying to push to kafka as well. | |||||||
| Comment by Dipti Joshi (Inactive) [ 2017-05-25 ] | |||||||
|
This script is an example to show how CDC api can be used by inside a Kafka producer. Users are expected to write their own Kafka producers that uses CDC api and manage the scalability side of Kafka producers. | |||||||
| Comment by Josh Becker [ 2017-05-25 ] | |||||||
|
That's definitely not the impression from your docs/marketing. | |||||||
| Comment by markus makela [ 2017-05-26 ] | |||||||
|
geesu I think adding some debug output to the script should solve some of the issues. If you can alter the code from this:
to this:
you should start receiving errors in the standard output for invalid JSON. When it hangs, you should receive the same error over and over again. If this happens, please post the error (without any sensitive data). I suspect that the string to JSON conversion might be hitting some problems that cause it to constantly try and convert the same string into valid JSON. Due to changes in the cdc.py script and how the avrorouter sends the data, I think the cdc_kafka_producer can be simplified even more. Please try this simplified version of the cdc_kafka_producer.py script and see if it solves the problems: https://gist.github.com/markus456/1e3f77693c7211df803e551729a2d417 | |||||||
| Comment by Josh Becker [ 2017-06-06 ] | |||||||
|
I was originally using your script that doesn't do any JSON parsing (the one added to 2.0.5), so adding those exception handlers wouldn't help. I just tried out the new script and it still wouldn't work, but it did work properly for a smaller table with fewer changes. | |||||||
| Comment by markus makela [ 2017-06-07 ] | |||||||
|
OK, then we can rule out the JSON parsing and deduce that something else is indeed going on with tables with large amounts of changes. By a large amount of changes do you mean that the rate of changes is high or the amount of data in each change is large? If it's possible for you to give an example definition of a large table, we could try and see if we can reproduce it on our side. Does the cdc.py script work alone without the kafka part even on the table with the large amount of changes? If you replace the following lines (42 and 43 in the new script) with a call to print(data) does the script work?
If the script works without the actual Kafka part, I think the problem might be either in how we use the kafka python library or in the library itself. | |||||||
| Comment by Josh Becker [ 2017-06-14 ] | |||||||
|
Thanks - I think I have it figured out. Here is the exception it is throwing: 'ascii' codec can't decode byte 0xc3 in position 153: ordinal not in range(128) I've attached the binary data it was trying to decode. last_data.bin And the code that produced it: https://gist.github.com/Geesu/8cc7158dfdd135ee701d7893bfc15dfe | |||||||
| Comment by Josh Becker [ 2017-06-14 ] | |||||||
|
If I remove the encode() it works properly. Is that ok? | |||||||
| Comment by markus makela [ 2017-06-16 ] | |||||||
|
Yes, that should be OK. I don't think the encode call is necessary and it could probably be removed as the trailing newline can be removed with the following code:
| |||||||
| Comment by markus makela [ 2017-06-20 ] | |||||||
|
Closing as fixed since a solution was found. 2.1.4 will be the first release with this fix. |