[MXS-4161] MaxScale System Diagnostics Created: 2022-06-07 Updated: 2022-09-22 Resolved: 2022-09-22 |
|
| Status: | Closed |
| Project: | MariaDB MaxScale |
| Component/s: | Core |
| Affects Version/s: | None |
| Fix Version/s: | 22.08.2 |
| Type: | New Feature | Priority: | Major |
| Reporter: | Rob Schwyzer | Assignee: | Johan Wikman |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Sprint: | MXS-SPRINT-166 | ||||||||
| Description |
|
MaxScale should provides means using which it is easy to ascertain that the configuration is compatible with the resources available to MaxScale. Especially when MaxScale is running in a container, it is possible that the resources - cores and memory - are limited compared to what is available on the machine. If that is the case and the automatic configuration (in particular threads and query_classifier_cache_size) is reliead upon, then MaxScale may end up using far more resources than what is available, with crashes as the result. Original description While For example, we know that query_classifier_cache_size is pretty straightforward. We also know that each thread spawned by threads=N has its own cache which comes with it. We should be able to show a worst-case estimate of memory usage based on this information. Why?MaxScale has many automatic configuration parameters (such as threads=auto or query_classifier_cache_size defaulting to 15% of detected memory). In most cases, these work well and set sensible defaults. However, in some cases other factors obstruct MaxScale's ability to properly detect underlying resources and default memory allocation can be overzealous. This behavior is non-obvious to most customers, and because MaxScale does not immediately use these memory allocations when it first starts up, many customers end up complaining about "memory leaks" or other "memory problems" as connections come in and MaxScale begins actively using the memory its configurations allow it to. By showing customers clearly what MaxScale is configured to use, it will clue customers immediately in when something is wrong at a configuration level, so instead of customers immediately assuming a leak or other problem is occurring, customers will instead ask MariaDB why MaxScale is expecting to use so much more memory than the customer has allocated to the VM/node/etc, which creates a much more productive conversation with MariaDB. Starting PointThis request could be seen as all-encompassing which could make it difficult to implement properly. For example, calculating expected memory usage for various filters may be impossible or extremely difficult. Likewise for estimating memory usage based on a potentially infinite number of incoming connections. Initial feature delivery could dodge complicated situations by scoping appropriately and communicating that scoping to users. For connection-count-based estimations, a simple answer to start could be to let the customer enter that value so customers can see expected worst-case memory usage (excepting impact of filters) for various concurrent connection targets which customers could then use for planning purposes. Down the road, it may be feasible to harvest or prepopulate sensible values based on backend servers' max_connections or based on the configuration of max_routing_connections when that is enabled. |
| Comments |
| Comment by markus makela [ 2022-07-18 ] | ||||||||||||||||||||||||||
|
I changed this from Indicator to Estimation as this is what the issue seems more about. Indicator, at least to me, would mean indicating something that exists currently instead of giving an estimate of what could potentially be. | ||||||||||||||||||||||||||
| Comment by Johan Wikman [ 2022-08-01 ] | ||||||||||||||||||||||||||
|
Since C++ 17 there is in C++ a concept called polymorphic allocator that makes it straightforward to use different allocators in different contexts. In practice that would mean that MaxScale could use a dedicated allocator in any situation where we want to be able to report just how much memory is being used. It would also make it straightforward e.g. to pre-allocate memory to be used for a particular purpose. MaxScale uses C++ 17 and some initial work in this direction was made for 22.08 so that we then could take this into use in earnest in the release after that. Unfortunately, it turned out that although the compilers on platforms supported by MaxScale claim to support C++ 17, the support for polymorphic allocators is experimental on many of those, so we can't currently use the functionality. Something similar could be done without the C++ runtime support, but would be quite laborious. Currently it seems that the best approach is to make some preparations, but wait until all compilers on all platforms support this functionality in a non-experimental fashion. | ||||||||||||||||||||||||||
| Comment by Johan Wikman [ 2022-09-22 ] | ||||||||||||||||||||||||||
|
In MaxScale 22.08.2 maxctrl show maxscale will show a system object with information about the environment MaxScale is running in.
Of particular interest are machine.cores_virtual and machine.memory_available, as they show the cores and memory available to MaxScale and can thus be used for verifying that the configuration file parameters threads and query_classifier_cache_size have been set appropriately (their current value are shown in the maxscale subobject). The full documentation can be read here. The situation is also checked at startup and if MaxScale detects it is running in a constrained environment, the following kind of warnings will be logged:
|