[MXS-3808] Improve Rest API performance Created: 2021-10-07 Updated: 2022-03-04 Resolved: 2022-03-04 |
|
| Status: | Closed |
| Project: | MariaDB MaxScale |
| Component/s: | REST-API |
| Affects Version/s: | 6.1.0, 6.1.1, 6.1.2, 6.1.3 |
| Fix Version/s: | 2.5.20, 6.2.3, 22.08.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Phil Porada | Assignee: | markus makela |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | performance | ||
| Environment: |
Multiple maxscale nodes connecting to multiple mariadb 10.5.x nodes |
||
| Attachments: |
|
| Description |
|
We have the following application and database topology. On each maxscale node we deploy https://github.com/Vetal1977/maxctrl_exporter to scrape the rest api and turn its output into prometheus stats every 30 seconds. Under periods of high connection load to our maxscale instances, we find that the maxctrl_exporter is unable to scrape the rest api in a timely fashion. Attached is a screenshot showing dropped/missing stats for an example period. Would it be possible to add an admin thread that can be used to scrape the rest api and be guaranteed to return more timely?
|
| Comments |
| Comment by markus makela [ 2021-10-08 ] |
|
The REST API already runs on a separate thread but, depending on what is scraped, it can interact with the worker threads. We'd need more information to know why it appears to slow down. |
| Comment by markus makela [ 2021-10-11 ] |
|
SneakyPhil can you find out which endpoint is what causes this problem? A quick look at that exporter seems to reveal that it queries multiple endpoints and figuring out which one of them would greatly help us fix any inefficiencies in it. |
| Comment by Phil Porada [ 2021-10-29 ] |
|
I let this one sit and I'm sorry about that. A few days after I posted this issue we stopped using maxscale altogether. Not due to maxscale itself, but because our application design is unable to take advantage of maxscale's strengths. As for this exporter, the stat from the original attached image hits `/servers` and can fail to return data when maxscale is under high session creation/deletion load. Here's another picture showing that not all stats are dropped. |
| Comment by markus makela [ 2022-02-25 ] |
|
Having had the time to look into this closer, the /servers endpoint does indeed seem to be the worst offender and mostly due to the fact that it calculates the connection pool statistics by asking each thread for their locally cached versions. In addition, I believe the data ended up being generated twice on accident which doubled the amount of work for no good reason. Fixing this should cut the delay in roughly half as it only needs to ask for the data once. It also seems that in 2.5 a call to the /servers endpoint causes the connection pool to be cleared of stale connections. This is probably a remnant from the old maxadmin days when it was used mainly for testing and to get accurate counts but in practice it's not worth doing it. |
| Comment by markus makela [ 2022-03-01 ] |
|
I'm changing this to a bug since a few of the endpoints do some pretty inefficient stuff that's not really needed. |
| Comment by markus makela [ 2022-03-04 ] |
|
The /servers endpoint is now more efficient in how it collects data that is spread out to other threads. It also no longer purges the persistent connection pool as it is now done automatically ( |