[MXS-738] CPU Usage increases over time Created: 2016-05-26  Updated: 2016-06-07  Resolved: 2016-06-07

Status: Closed
Project: MariaDB MaxScale
Component/s: N/A
Affects Version/s: 1.4.3
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Onno Steenbergen Assignee: Timofey Turenko
Resolution: Not a Bug Votes: 0
Labels: None
Environment:

Debian GNU/Linux 8.4 (jessie)
MySQL 5.7


Attachments: File maxscale.cnf     File server-101.log     Text File server-101.txt     File server-201.log     Text File server-201.txt    

 Description   

Running Maxscale for a few days result in high CPU usage. It starts out at 1-3% CPU, after a day its 15-20% and after a week it runs at 90% CPU.

The same configuration is on two servers, one of those is the master and receives some testing traffic, while the other only executes internal database tasks. Both servers show this behaviour. Servers are Xen VMs with 4CPUs, 6GB memory and show no signs of high disk IO or running out of memory. Configuration currently uses one thread, however behaviour is also visible with 4 threads or more.

I restarted server-201, while keeping server-101 running at 90% CPU. Via maxadmin I extracted as much information as I could and they are attached in the text files.

Logs for server-201 contain nothing interesting since its last reboot (running 20%)

Server-101 has some errors about a master going down (and repeats them 30 times a second), since then no new logs but CPU has been increasing from a few percent to its current 90% so I don't expect this to be an issue.

I restarted server-201 with debug logging which I could include if needed.

Application using maxscale has problems and slows down when CPU usage is too high.



 Comments   
Comment by markus makela [ 2016-06-01 ]

What kind of a connector does the client use? Does the client do any connection pooling? Does traffic increase or is it at constant levels?

Comment by Onno Steenbergen [ 2016-06-01 ]

Client uses connection pooling, traffic is constant.

Did a little digging in the client code and it uses SQLAlchemy with pool_size set to 5 and pool_recycle -1. Also tried setting the pool_recycle value to something lower than the timeout specified by MaxScale but that didn't resolve the issue.

Comment by markus makela [ 2016-06-01 ]

I'm thinking that this might be a problem related with the storing of the command history. Pooled connections should usually disabled this feature in the readwritesplit module. You can do this by appending disable_sescmd_history=true to the router_options parameter. If this is the cause, I guess the reason might be caused by the excessively long list of session commands being executed when a replacement connection is created.

For more information about this option and why it is needed, please refer to the Readwritesplit Documentation on the Knowledge Base.

Comment by Onno Steenbergen [ 2016-06-01 ]

Enabled the setting for one server while the other still uses the old config. Will report back if I see a difference in CPU usage.

Comment by Johan Wikman [ 2016-06-06 ]

Tentatively setting fix version to 2.0.1. This needs more investigation.

Comment by Onno Steenbergen [ 2016-06-07 ]

After a few days the server without the change was at 100% CPU while the other is running at 1%. Thanks for the help!

Comment by Johan Wikman [ 2016-06-07 ]

osteenbergen many thanks for reporting this.

Comment by Johan Wikman [ 2016-06-07 ]

Caused by the maintenance of the session command history, so not directly a bug.

However, this situation should either be prevented (e.g. by not storing the session history but instead asking the state from the master and updating the new slave with that) and/or detected so that an informative log message could be generated.

Generated at Thu Feb 08 04:01:34 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.