[MXS-1271] cdc.py consuming 100% of CPU and never sending to kafka Created: 2017-05-22  Updated: 2017-06-20  Resolved: 2017-06-20

Status: Closed
Project: MariaDB MaxScale
Component/s: avrorouter
Affects Version/s: 2.0.5
Fix Version/s: 2.0.6, 2.1.4

Type: Bug Priority: Critical
Reporter: Josh Becker Assignee: markus makela
Resolution: Fixed Votes: 0
Labels: None
Environment:

Ubuntu 14.04


Attachments: File last_data.bin     PNG File screen_shot_2017-05-25_at_10.30.40_am.png    

 Description   

When using the cdc.py tool to pipe data to kafka it unfortunately fails for a table with a lot of changes.

If I use the MariaDB table which has much fewer changes (lets say, our users):
/usr/bin/cdc.py -u cdc_user --p123456 -h 127.0.0.1 -P 4001 production.users | /usr/bin/cdc_kafka_producer.py --kafka-broker=172.30.0.190:9092 --kafka-topic=users

It works properly (and still pegs the CPU at 100%).

But when I switch to another table (items) with more data/changes, it pegs the CPU at 100%, and nothing is ever sent to the kafka topic.

The server running the script is on AWS (m4.large).



 Comments   
Comment by markus makela [ 2017-05-23 ]

This is possibly fixed by commit 7d7d8a0560023f84bcc292d0b74b98a00fb5c910.

I would suggest trying out this version of the cdc_kafka script: https://github.com/mariadb-corporation/MaxScale/tree/2.0/server/modules/protocol/examples

Comment by Josh Becker [ 2017-05-23 ]

That is the version I am using. https://gist.github.com/Geesu/4b3a9316afe082c49b57e1b8d4a5a376

I'm using the 2.0.5 build you created for me in another issue (MXS-1191).

Comment by markus makela [ 2017-05-23 ]

Ah, then this is a new bug. Does the cdc.py script work correctly without the Kafka producer part?

Comment by Josh Becker [ 2017-05-23 ]

Yes, and it works correctly if I specify a table that doesn't have as many changes.

Comment by Josh Becker [ 2017-05-23 ]

I'm actually quite surprised it hasn't come up before, it feels like this script could easily be a bottleneck for other clients trying to push to kafka as well.

Comment by Dipti Joshi (Inactive) [ 2017-05-25 ]

This script is an example to show how CDC api can be used by inside a Kafka producer. Users are expected to write their own Kafka producers that uses CDC api and manage the scalability side of Kafka producers.

Comment by Josh Becker [ 2017-05-25 ]

That's definitely not the impression from your docs/marketing.

Comment by markus makela [ 2017-05-26 ]

geesu I think adding some debug output to the script should solve some of the issues. If you can alter the code from this:

   # JSONDecoder will return a ValueError if a partial JSON object is read
   except ValueError as err:
      pass

to this:

   # JSONDecoder will return a ValueError if a partial JSON object is read
   except ValueError as err:
      print(err)
      pass

you should start receiving errors in the standard output for invalid JSON. When it hangs, you should receive the same error over and over again. If this happens, please post the error (without any sensitive data). I suspect that the string to JSON conversion might be hitting some problems that cause it to constantly try and convert the same string into valid JSON.

Due to changes in the cdc.py script and how the avrorouter sends the data, I think the cdc_kafka_producer can be simplified even more. Please try this simplified version of the cdc_kafka_producer.py script and see if it solves the problems: https://gist.github.com/markus456/1e3f77693c7211df803e551729a2d417

Comment by Josh Becker [ 2017-06-06 ]

I was originally using your script that doesn't do any JSON parsing (the one added to 2.0.5), so adding those exception handlers wouldn't help.

I just tried out the new script and it still wouldn't work, but it did work properly for a smaller table with fewer changes.

Comment by markus makela [ 2017-06-07 ]

OK, then we can rule out the JSON parsing and deduce that something else is indeed going on with tables with large amounts of changes.

By a large amount of changes do you mean that the rate of changes is high or the amount of data in each change is large? If it's possible for you to give an example definition of a large table, we could try and see if we can reproduce it on our side.

Does the cdc.py script work alone without the kafka part even on the table with the large amount of changes?

If you replace the following lines (42 and 43 in the new script) with a call to print(data) does the script work?

      producer.send(topic=opts.kafka_topic, value=data)
      producer.flush()

If the script works without the actual Kafka part, I think the problem might be either in how we use the kafka python library or in the library itself.

Comment by Josh Becker [ 2017-06-14 ]

Thanks - I think I have it figured out. Here is the exception it is throwing:

'ascii' codec can't decode byte 0xc3 in position 153: ordinal not in range(128)

I've attached the binary data it was trying to decode. last_data.bin

And the code that produced it: https://gist.github.com/Geesu/8cc7158dfdd135ee701d7893bfc15dfe

Comment by Josh Becker [ 2017-06-14 ]

If I remove the encode() it works properly. Is that ok?

Comment by markus makela [ 2017-06-16 ]

Yes, that should be OK. I don't think the encode call is necessary and it could probably be removed as the trailing newline can be removed with the following code:

data = buf[:-1]

Comment by markus makela [ 2017-06-20 ]

Closing as fixed since a solution was found. 2.1.4 will be the first release with this fix.

Generated at Thu Feb 08 04:05:29 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.