[MXS-48] Performance issue on Sysbench 0.5 OLTP Created: 2015-03-18  Updated: 2016-01-12  Due: 2015-03-20  Resolved: 2016-01-12

Status: Closed
Project: MariaDB MaxScale
Component/s: Core
Affects Version/s: 1.0.5
Fix Version/s: 1.2.0

Type: Task Priority: Major
Reporter: Dipti Joshi (Inactive) Assignee: Massimiliano Pinto (Inactive)
Resolution: Not a Bug Votes: 1
Labels: None

Attachments: File MaxScaleEA.cnf    

 Description   

Performance issue has been reported when doing the Sysbench for OLTP. Please see the comments for details



 Comments   
Comment by markus makela [ 2015-03-24 ]

I would suggest a significant drop in the number of threads, anything above the number of hardware threads is a performance loss. Taking an optimistic guess of 32 hardware threads leaves 224 extra threads doing pointless work.

Comment by Tim Vaillancourt [ 2015-04-28 ]

Hi Markus, sorry for the delay and thanks for looking into this!

Our dedicated MaxScale host has 8 CPU cores (16 hyperthreaded), so I experimented with 8, 16 and 24 threads and saw a much improved CPU utilization, thanks!

Unfortunately, now the bottleneck has moved from high CPU usage to lack of available threads to perform queries. Frequently during my testing on lower MaxScale threads I can see queries piling up because the 8-24 threads are totally busy with work, this is what led me to increase the thread count so high in the first place.. We could throw more hardware at the problem but it seems like we shouldn't need to at these somewhat low volumes, plus HaProxy seems to be able to serve the same traffic with much less thread pileups, etc.

Is there anything else we can do to reduce the overhead of MaxScale so to help reduce the bottlenecking on threads? As I mentioned to Gerry at MariaDB in another thread, as we are using MaxScale to loadbalance Galera, we do not need most of the Layer-7/Application-layer features of MaxScale, we just need round robbin balancing and to check if a host is "synced" only, no query inspection/rewriting, read/write splitting, etc.

Also, is there a recommended hardware config that we are maybe not using? Our host is a dedicated Dell R610 with 48GB of RAM, 2 x 1GBe NICs for frontend and backend traffic, and only MaxScale running on CentOS 6, x86_64.

Any help is appreciated thanks!

Tim

Comment by markus makela [ 2015-05-05 ]

It seems there isn't anything drastic that could be done. MaxScale currently does not have zero-copy functionality implemented in it.

You could try adding poll_sleep=<milliseconds> and non_blocking_polls=<number> to the [maxscale] section. These parameters control the amount of non-blocking polls the threads will do before doing a blocking poll and how long the timeout of this blocking poll is. Increasing the amount of non-blocking polls from the default 3 to a greater value might help with the bottlenecks.

Comment by Tim Vaillancourt [ 2015-05-06 ]

Thanks for looking at this again Markus, I'll give that a shot and get back to you.

We are also going to experiment with decentralizing MaxScale to be ran on "localhost" on our appservers vs dedicating hardware to MaxScale, this should spread the connection load over more CPUs, each MaxScale should use fewer threads and we will remove one network hop by using localhost, this should probably improve things. If you have any suggestions for that sort of use case, let us know! I am happy to share findings on that testing too.

Lastly: is zero-copy functionality something that is planned for the future? That sounds exciting, if there are any ETA/milestones you can share on that, that would be great (if not, no problem ).

Tim

Comment by Massimiliano Pinto (Inactive) [ 2015-05-06 ]

If MaxScale runs in the same box as the application, the UNIX domain socket may also lower resource usage:

[Read Connection Listener]
type=listener
service=Read Connection Router
protocol=MySQLClient
socket=/tmp/readconn.sock

Comment by markus makela [ 2015-05-06 ]

Moving MaxScale onto the app servers should be a pretty good solution to the thread pileup problem. I was going to suggest splitting the load over multiple MaxScale servers but that isn't quite as simple nor as effective as moving MaxScale to the app server is. This could possibly be something that we can suggest a bit more often.

Comment by Tim Vaillancourt [ 2015-05-06 ]

Great, thanks again everyone. Very good point on using UNIX domain sockets, we will try this instead of TCP if our driver/app is ok with it.

Tim

Generated at Thu Feb 08 03:56:23 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.