[MXS-2701] Worker load balance need improve Created: 2019-09-26 Updated: 2020-03-13 Resolved: 2020-02-14 |
|
| Status: | Closed |
| Project: | MariaDB MaxScale |
| Component/s: | Core |
| Affects Version/s: | 2.2.21, 2.3.12 |
| Fix Version/s: | 2.5.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | dapeng huang | Assignee: | Unassigned |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
Round-bin is not good enough, we encounter such a problem: An instance with 44 threads(64 cpu cores), consume only CPU:230%, unfortunately several sessions with heavy load assigned to some worker, and these worker's pipe is fully ,
|
| Comments |
| Comment by markus makela [ 2019-09-26 ] |
|
Have you tried 2.4? It uses SO_REUSEPORT to listen on the same port with all threads. For older releases we could increase the time a worker would wait when posting messages so that temporary blockages wouldn't cause errors. |
| Comment by dapeng huang [ 2019-09-27 ] |
|
@markus I haven't tried 2.4, i will try it latter; I think it is not bug , but can be improved, in this case, we can solve it by adding retry or assigning the task to another thread; But the load is still unbalance between threads, maybe in 2.4 or newer version, we can assign new connection task not in a round-robin way but consider the load of every threads, and add a housekeeper task to migrate some sessions to keep balance; |
| Comment by markus makela [ 2019-09-27 ] |
|
With short sessions the problem is less severe and I would imagine it only really becomes a problem when a part of the traffic is very long connections (e.g. connection pool) and a part of it is short connections. This problem might be relatively easy to solve in 2.4 where a worker that is under severe load could stop accepting new connections until the work is balanced equally. SO_REUSEPORT guarantees the connection is eventually accepted by a worker that is not as loaded as the rest. |
| Comment by dapeng huang [ 2019-09-27 ] |
|
Oh @markus makela you are correct, I will try it; |
| Comment by Johan Wikman [ 2019-09-27 ] |
|
I think it might be tricky for a worker to realize that it is under so much load that it should stop accepting new connections. And in principle (and admittedly quite unlikely) you would have the problem that once the worker realizes it is under heavy load and stops accepting new connections, the load might stop completely with the effect that the worker never gets a chance to start accepting again. Much simpler would be to consider the load at the accept phase and assign the new connection to a worker with the least amount of load. A quite different problem is how to deal with the load becoming unbalanced, which, as Markus wrote, requires as far as I can see that the traffic must be quite heterogeneous. Some long connections with heavy load that might end up on the same worker, and shorter connections that over time are spread out. Currently the assumption is that once created, a session and all its connections stay on the same worker. But I don't see any problem in principle in being able to re-balance the sessions across the workers at runtime, based upon the traffic of a session and the load of the workers. |
| Comment by markus makela [ 2019-09-27 ] |
|
Maybe measuring the time it takes to loop through all epoll events and measuring the deviance from the norm between all threads. Then at some level (one standard deviation, perhaps?) the worker would self-regulate the amount of new work it would accept. |
| Comment by Johan Wikman [ 2019-09-27 ] |
|
I just realized that every worker regularly wakes up from epoll_wait() due to the load calculation. So each thread is already all the time aware of its load and thus there will always be an opportunity for the worker to start accepting again. This would actually be very straightforward to add to 2.4. |
| Comment by dapeng huang [ 2019-10-21 ] |
|
I have tried SO_REUSEPORT, it 's not work; |
| Comment by Johan Wikman [ 2019-10-22 ] |
|
This problem will be fixed by |