[MXS-4161] MaxScale System Diagnostics Created: 2022-06-07  Updated: 2022-09-22  Resolved: 2022-09-22

Status: Closed
Project: MariaDB MaxScale
Component/s: Core
Affects Version/s: None
Fix Version/s: 22.08.2

Type: New Feature Priority: Major
Reporter: Rob Schwyzer Assignee: Johan Wikman
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MXS-3822 MaxScale Global Memory Use Indicator Closed
Sprint: MXS-SPRINT-166

 Description   

MaxScale should provides means using which it is easy to ascertain that the configuration is compatible with the resources available to MaxScale. Especially when MaxScale is running in a container, it is possible that the resources - cores and memory - are limited compared to what is available on the machine. If that is the case and the automatic configuration (in particular threads and query_classifier_cache_size) is reliead upon, then MaxScale may end up using far more resources than what is available, with crashes as the result.

Original description
================
MaxScale Allocated Memory Usage Estimation

While MXS-3822 will tackle showing customers the current memory usage MaxScale is causing, we should also add an additional feature which shows customers "allocated" memory usage. This would be more of an estimate, but would give users direct feedback on how their MaxScale configuration creates the potential for memory usage.

For example, we know that query_classifier_cache_size is pretty straightforward. We also know that each thread spawned by threads=N has its own cache which comes with it. We should be able to show a worst-case estimate of memory usage based on this information.

Why?

MaxScale has many automatic configuration parameters (such as threads=auto or query_classifier_cache_size defaulting to 15% of detected memory). In most cases, these work well and set sensible defaults. However, in some cases other factors obstruct MaxScale's ability to properly detect underlying resources and default memory allocation can be overzealous. This behavior is non-obvious to most customers, and because MaxScale does not immediately use these memory allocations when it first starts up, many customers end up complaining about "memory leaks" or other "memory problems" as connections come in and MaxScale begins actively using the memory its configurations allow it to.

By showing customers clearly what MaxScale is configured to use, it will clue customers immediately in when something is wrong at a configuration level, so instead of customers immediately assuming a leak or other problem is occurring, customers will instead ask MariaDB why MaxScale is expecting to use so much more memory than the customer has allocated to the VM/node/etc, which creates a much more productive conversation with MariaDB.

Starting Point

This request could be seen as all-encompassing which could make it difficult to implement properly. For example, calculating expected memory usage for various filters may be impossible or extremely difficult. Likewise for estimating memory usage based on a potentially infinite number of incoming connections.

Initial feature delivery could dodge complicated situations by scoping appropriately and communicating that scoping to users.

For connection-count-based estimations, a simple answer to start could be to let the customer enter that value so customers can see expected worst-case memory usage (excepting impact of filters) for various concurrent connection targets which customers could then use for planning purposes. Down the road, it may be feasible to harvest or prepopulate sensible values based on backend servers' max_connections or based on the configuration of max_routing_connections when that is enabled.



 Comments   
Comment by markus makela [ 2022-07-18 ]

I changed this from Indicator to Estimation as this is what the issue seems more about. Indicator, at least to me, would mean indicating something that exists currently instead of giving an estimate of what could potentially be.

Comment by Johan Wikman [ 2022-08-01 ]

Since C++ 17 there is in C++ a concept called polymorphic allocator that makes it straightforward to use different allocators in different contexts. In practice that would mean that MaxScale could use a dedicated allocator in any situation where we want to be able to report just how much memory is being used. It would also make it straightforward e.g. to pre-allocate memory to be used for a particular purpose.

MaxScale uses C++ 17 and some initial work in this direction was made for 22.08 so that we then could take this into use in earnest in the release after that. Unfortunately, it turned out that although the compilers on platforms supported by MaxScale claim to support C++ 17, the support for polymorphic allocators is experimental on many of those, so we can't currently use the functionality.

Something similar could be done without the C++ runtime support, but would be quite laborious. Currently it seems that the best approach is to make some preparations, but wait until all compilers on all platforms support this functionality in a non-experimental fashion.

Comment by Johan Wikman [ 2022-09-22 ]

In MaxScale 22.08.2 maxctrl show maxscale will show a system object with information about the environment MaxScale is running in.

$ maxctrl show maxscale
...
├──────────────┼────────────────────────────────────────────────────────────────────────────┤
│ System       │ {                                                                          │
│              │     "machine": {                                                           │
│              │         "cores_available": 8,                                              │
│              │         "cores_physical": 8,                                               │
│              │         "cores_virtual": 4,                                                │
│              │         "memory_available": 20858544128,                                   │
│              │         "memory_physical": 41717088256                                     │
│              │     },                                                                     │
│              │     "maxscale": {                                                          │
│              │         "query_classifier_cache_size": 6257563238,                         │
│              │         "threads": 8                                                       │
│              │     },                                                                     │
│              │     "os": {                                                                │
│              │         "machine": "x86_64",                                               │
│              │         "nodename": "johan-P53s",                                          │
│              │         "release": "5.4.0-125-generic",                                    │
│              │         "sysname": "Linux",                                                │
│              │         "version": "#141~18.04.1-Ubuntu SMP Thu Aug 11 20:15:56 UTC 2022"  │
│              │     }                                                                      │
│              │ }                                                                          │
└──────────────┴────────────────────────────────────────────────────────────────────────────┘

Of particular interest are machine.cores_virtual and machine.memory_available, as they show the cores and memory available to MaxScale and can thus be used for verifying that the configuration file parameters threads and query_classifier_cache_size have been set appropriately (their current value are shown in the maxscale subobject).

The full documentation can be read here.

The situation is also checked at startup and if MaxScale detects it is running in a constrained environment, the following kind of warnings will be logged:

2022-09-22 10:28:47   warning: Number of threads set to 8, which is significantly more than the 4.00 virtual cores available to MaxScale. This may lead to worse performance and MaxScale using more resources than what is available.
2022-09-22 10:28:47   warning: It seems MaxScale is running in a constrained environment with less memory (19.43GiB) available in it than what is installed on the machine (38.85GiB). In this context, the query classifier cache size should be specified explicitly in the configuration file with 'query_classifier_cache_size' set to 15% of the available memory. Otherwise MaxScale may use more resources than what is available, which may cause it to crash.

Generated at Thu Feb 08 04:26:38 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.