[MXS-4586] transaction_replay_max_size default is 1GiB instead of 1MiB Created: 2023-04-18 Updated: 2023-05-26 Resolved: 2023-05-15 |
|
| Status: | Closed |
| Project: | MariaDB MaxScale |
| Component/s: | readwritesplit |
| Affects Version/s: | 2.5.25, 6.2.1, 22.08.5, 23.02.0, 23.02.1 |
| Fix Version/s: | 2.5.26, 6.4.7, 22.08.6, 23.02.2 |
| Type: | Bug | Priority: | Major |
| Reporter: | Eoin Kim | Assignee: | Niclas Antti |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | crash | ||
| Environment: |
Running in a VM - Ubuntu 20.04.6 LTS, 2 CPUs, 4GB RAM (8GB for production) |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Sprint: | MXS-SPRINT-181, MXS-SPRINT-182 | ||||||||
| Description |
|
The memory usage of MaxScale keeps growing and eventually swap is used and then MaxScale gets busted one night (or day). Issue observed on our pre-prod MaxScale - version 23.02.1. The same observed in prod (6.2.1). Currently, holding prod upgrade. Stack trace has been logged (attached). Next step is using jemalloc to profile the heap as Markus suggested. |
| Comments |
| Comment by markus makela [ 2023-04-18 ] | ||||||||||||||||
|
Could be the same problem that is currently reported as | ||||||||||||||||
| Comment by markus makela [ 2023-04-18 ] | ||||||||||||||||
|
Can you try if connection_keepalive=0s has any effect on the memory usage? This disables the keepalive feature and since some of the code in the stacktraces is related to it, it would help us rule out that part of the code. | ||||||||||||||||
| Comment by Eoin Kim [ 2023-04-19 ] | ||||||||||||||||
|
Note to myself - instruction from Markus. ------- Hi, If you can, I'd highly recommend registering on the MariaDB Jira and leaving a comment. It's a far more convenient place to store logs, configs and such compared to this mailing list. It'll also help anyone else that runs into this problem if the stacktrace is shared there and they have a matching one. If the problem happens constantly (seems like it does), you could try using jemalloc to profile the heap. Here's a Red Hat article on how to configure it on their system but based on the log messages, you're using Ubuntu. You'll probably have to adapt it a bit but I think adding the following into the /lib/systemd/system/maxscale.service file into the [Service] section should make it work after installing the libjemalloc2 package: Environment=MALLOC_CONF=prof:true,prof_leak:true,lg_prof_sample:19,prof_final:true,stats_print:true,prof_prefix:/var/log/maxscale/jeprof This should generate a file in /var/log/maxscale/ that contains the heap profiling information when MaxScale is stopped. The file shows if any memory leaks on shutdown and if it does, where the leak is located. This would greatly help us speed up fixing any problems that we aren't able to reproduce. Markus | ||||||||||||||||
| Comment by Eoin Kim [ 2023-04-19 ] | ||||||||||||||||
|
@markus makela I've set
and watched for a few hours so far. The memory usage keeps growing, hence no difference. I'll revert this setting and try using jemalloc. One thing to mention here - not only the memory usage grows, but also I can see more and more threads are shown up. MaxScale started with two threads, but now I can see more than 4 from time to time. Strange fact is they are mostly sleeping? | ||||||||||||||||
| Comment by markus makela [ 2023-04-19 ] | ||||||||||||||||
|
Thanks for trying it out, it at least helps rule out that part of the codebase as the reason. The jemalloc profile hopefully will reveal where all the memory is going. There's a thread pool in MaxScale that's used for miscellaneous tasks and it's expected that there will be more threads that are configured with the threads parameter. Most of these extra threads are idle for most of the time and can be ignored as they're used for rarely occurring things that need offloading (DNS lookups are one that cause the extra threads to be used). | ||||||||||||||||
| Comment by markus makela [ 2023-04-19 ] | ||||||||||||||||
|
For analyzing the heap profile, you'll need to install the libjemalloc-dev and binutils packages. Then run jeprof --show_bytes /usr/bin/maxscale /var/log/maxscale/jeprof.3603.0.f.heap (replace the trailing part after jeprof with the ones on your system) to generate the text output of the leak. You can also use jeprof --pdf /usr/bin/maxscale /var/log/maxscale/jeprof.3603.0.f.heap to generate a PDF version that you can upload here. Here's the jemalloc wiki entry for leak checking. | ||||||||||||||||
| Comment by markus makela [ 2023-04-24 ] | ||||||||||||||||
|
eoin Is it possible for you to install the debug binaries of MaxScale? They are compiled with ASAN and should produce a memory leak report if there is one. Once installed, you should see a memory leak report in /var/log/maxscale/maxscale.log the next time MaxScale is restarted. If you notice that it uses up more memory than normally and you can safely restart the process, I'd recommend doing that manually instead of letting the OOM-killer or systemd watchdog do it. This way we'll get a better report of the memory usage under normal usage instead of an aborted shutdown like would happen with the aforementioned alternatives. | ||||||||||||||||
| Comment by Eoin Kim [ 2023-04-26 ] | ||||||||||||||||
|
@markus makela I've installed the one you linked. One thing I noticed immediately is the number in VIRT memory section - 20.0T? The log file (maxscale.log) started recording a bit of warning messages including some SQL query looking like strings. But this doesn't show a memory leak report as you mentioned. maxscale.log I'll revert the package version and settings to use jemalloc. | ||||||||||||||||
| Comment by markus makela [ 2023-04-26 ] | ||||||||||||||||
|
The VIRT is probably due to ASAN (see here) and I think it's expected. It doesn't really use that much memory. The log didn't seem to contain anything at this point which indicates that ASAN probably didn't have enough time to generate the dump it used due to the signal 6 that killed the MaxScale process (systemd watchdog probably). However, the log still shows it to be MaxScale 23.02.1. Can you double check that the package actually installed? Another sanity check to do would be to run ldd /usr/bin/maxscale to check that it links against libasan. | ||||||||||||||||
| Comment by Eoin Kim [ 2023-04-26 ] | ||||||||||||||||
|
Hmm.... the installation didn't complain about anything so I assumed the package version downgraded okay. | ||||||||||||||||
| Comment by markus makela [ 2023-04-26 ] | ||||||||||||||||
|
The fact that it used 20T of virtual memory suggests that ASAN was indeed enabled on the binaries. This just means that the forceful shutdown didn't end up generating an ASAN report when it should've. If it's possible for you to test with the packages again, you could:
I think this should generate a leak check dump in the MaxScale log. Another thing to note is that using LD_PRELOAD with ASAN doesn't seem to work and it should generate an error like this:
This would also confirm that ASAN is actually installed even in the 23.02 packages but it means you'll have to install the non-debug packages to do the jemalloc heap profiling. | ||||||||||||||||
| Comment by Eoin Kim [ 2023-04-26 ] | ||||||||||||||||
|
@markus makela Thanks for that. I must have jumbled some log messages when I was chopping off necessary parts. That LD_PRELOAD error must be related to my maxscale override.
When you say - once it is obvious that it leaks memory, where should I look at? RES or SHR? Or, somewhere else? The problem is it might require waiting for a long time to confirm the memory leak. | ||||||||||||||||
| Comment by markus makela [ 2023-04-26 ] | ||||||||||||||||
|
My apologies, I must've looked at the wrong log file. For some reason clicking on the link in your comment took me to the old one. I took a look at the new one and it doesn't seem to have generated any errors. This means that if there is a leak, it didn't occur during that time. Another possibility is that it isn't really a memory leak but some form of over-usage of memory that clears itself up when all connections have been closed. The warnings in the log are an artifact of sorts in a part of the code and can be disabled by adding log_warning=false under the [maxscale] section. They aren't really relevant warnings in this case. I think you can look at RES when considering memory usage. If you have the OOM report for when it ran out of memory without triggering the watchdog, you could use that as an estimate of how much it ended up using memory. Then using that number you could treat a third or half of it as the limit after which we'd be certain it's using excessive amounts of memory. | ||||||||||||||||
| Comment by markus makela [ 2023-04-26 ] | ||||||||||||||||
|
To also rule out the query classifier cache in MaxScale, can you see if query_classifier_cache_size=0 lowers the memory usage? | ||||||||||||||||
| Comment by Eoin Kim [ 2023-04-27 ] | ||||||||||||||||
|
@markus makela I've set the query classifier cache as you said, and watched for a few hours so far. I didn't notice any differences. I'm having a difficulty with reproducing the issue. For some reason, the system is behaving very well at this stage. At least, the MaxScale is currently running with jemalloc activated. Hopefully, I can get some news soon and post the result here. | ||||||||||||||||
| Comment by markus makela [ 2023-04-27 ] | ||||||||||||||||
|
Thanks for reporting it back. I'd say it's likely that some action done by a client application is what triggers it and it just hasn't happened yet which is why it seems stable now. Disabling the query classifier cache helps rule it out as a source of memory usage so for future tests it might be smart to keep it turned off. | ||||||||||||||||
| Comment by Johan Wikman [ 2023-05-02 ] | ||||||||||||||||
|
eoin If you manage to reproduce this, it'd be quite helpful if you in that case could run MaxScale with log_info being enabled, as that would provide us with the SQL that the clients execute. You can turn log_info on at runtime with maxctrl alter logging log_info=true. | ||||||||||||||||
| Comment by markus makela [ 2023-05-08 ] | ||||||||||||||||
|
eoin I documented the jemalloc instructions into the MaxScale wiki. Could you try using them to enable jemalloc heap profiling? This should help us figure out why your MaxScale instance is using too much memory. If you would like interactive help, you can find us on the MariaDB community Slack as well as the MariaDB.org Zulip. | ||||||||||||||||
| Comment by Eoin Kim [ 2023-05-09 ] | ||||||||||||||||
|
markus makela I've enabled jemalloc heap profiling on April 26. The problem is MaxScale didn't crash for some reason so far. I'll see what I can do. | ||||||||||||||||
| Comment by markus makela [ 2023-05-09 ] | ||||||||||||||||
|
Very interesting. How is the memory usage, is it still growing? | ||||||||||||||||
| Comment by Eoin Kim [ 2023-05-09 ] | ||||||||||||||||
|
markus makela It is interesting indeed. In the past, the memory kept growing up, didn't reduce. But now I can see the memory usage is actually fluctuating. Usage figure goes up and down. On htop page, VIRT memory size hit 20.5G and doesn't grow up anymore nor reduce. However, MaxScale didn't crash so far. I am wondering if I need to give more loads to MaxScale. | ||||||||||||||||
| Comment by markus makela [ 2023-05-09 ] | ||||||||||||||||
|
20.5G is still a lot of memory and I wouldn't expect MaxScale to use that much. If you can change the jemalloc configuration to this, we'd be able to get periodic heap dumps to see where all that memory is going:
The steps to produce a PDF report from the heap files is documented in this KB article. | ||||||||||||||||
| Comment by Eoin Kim [ 2023-05-11 ] | ||||||||||||||||
|
markus makela I've made the change as you said - it generated a lot of heap files, and it's still keep going. The number doesn't seem to calm down in my case. Since the service restarted 22 hours ago, its VIRT memory size value increased to 11.2G. But MaxScale didn't crash. I've followed the instruction written in the KB article to generate the pdf file. Hope this is useful. | ||||||||||||||||
| Comment by markus makela [ 2023-05-11 ] | ||||||||||||||||
|
Yes, this seems to be very helpful. Do you use client-side connection pools of some sorts or other very long-lived connections? You can see how long the sessions have been open with maxctrl show sessions. Another helpful bit of information would be a rough estimate of how many concurrent clients there are going through MaxScale. Which version of MaxScale were you using? | ||||||||||||||||
| Comment by Eoin Kim [ 2023-05-11 ] | ||||||||||||||||
|
There are multiple applications involved here. I know that one application uses a connection pool and closes idle connetions when finished - this is not written by my company. Unfortunately, all other applications written by my company don't close the database connections when they finish their jobs - I'm not a developer, so all I can do is requesting dev team to re-write their code MaxScale closes idle connections after 2 minutes - I've set connection_timeout = 120s in maxscale.cnf file. So, I think all connections should be disconnected if idle longer than 2 minutes? But it sounds like I'm wrong. I just typed maxctrl show sessions command as you said and filtered with Connected keyword. Some connections are older than 15 minutes. Regarding concurrent clients number, it's very difficult to tell with our pre-production environment as not many clients are always up and running. Currently, it's around 50 connections - production has more than 330. | ||||||||||||||||
| Comment by markus makela [ 2023-05-11 ] | ||||||||||||||||
|
Thank you for the information. The reason why the sessions can have a long lifetime even with connection_timeout = 120s is because it's not a hard limit on the connection's lifetime. If the connection has been active enough to not be idle for a full 120 seconds, it won't get closed. Regardless of this, MaxScale shouldn't keep consuming ever growing amounts of memory even if the connections have very long lifetimes. 300 concurrent connections is quite moderate and should not cause this sort of memory usage. So far the heap report has shown that there's some data that's being stored somewhere that I suspect should be released earlier than it is. The report shows no obvious memory leaks and it looks like if all the client connections were to be closed closed, memory usage should drop down. | ||||||||||||||||
| Comment by Eoin Kim [ 2023-05-11 ] | ||||||||||||||||
|
markus makela Hmm... okay. So something is not released earlier means there's a slowness in hardware i.e., slow disk? Also, forgot to answer your last question - 23.02.1 is used for pre-prod. Production is staying with 6.2.1. | ||||||||||||||||
| Comment by markus makela [ 2023-05-11 ] | ||||||||||||||||
|
No, I'm referring to some internal buffers in MaxScale that might not being released early enough but are eventually freed when the client connections close. So far it looks like it might be the outbound network buffers where data is temporarily collected is what's taking up the space. So far this is only a theory and I've yet to reproduce it. If it's possible, can you see how large the memory usage shown in maxctrl show session is in MaxScale 23.02? The Memory section should show the total memory usage of the sessions in bytes. If it's below 100kB per session, then the theory about the memory being buffered there would be wrong and it's something else that's taking up space. I'd recommend upgrading from 6.2.1 to the latest version 6 patch release which is version 6.4.6 at the time of writing. There have been several critical fixes since the release of 6.2.1 and some of them can result in hangs, possible memory leakage and even crashes. | ||||||||||||||||
| Comment by Eoin Kim [ 2023-05-12 ] | ||||||||||||||||
|
markus makela I've extracted the result of maxctrl show sessions command. result.txt A very strange thing is happening. I've reverted all systemd change to disable jemalloc setup. Currently, these two lines are the only one I overwrote as I don't want to get up in the middle of the night.
Ever since I reverted the systemd setup, its memory usage already soared and MaxScale crashed early this morning. The memory usage is high again, and swap is used. This is really weird. How come it didn't happen so far.....
One more strange thing is I can't find the dump file..... | ||||||||||||||||
| Comment by markus makela [ 2023-05-12 ] | ||||||||||||||||
|
A quick test you can do would be to see if disabling transaction_replay has any effect. The transaction replay feature does require some additional memory and disabling that would rule it out as a cause of the problems. This should be a relatively easy test to perform. Another thing to test after that: if you are able to switch the router from readwritesplit to readconnroute, this would help us figure out if the problem is related to a core part in MaxScale or in some functionality used by readwritesplit. To do this, change the router parameter in the service from router = readwritesplit to router = readconnroute and then add the router_options = master to the service. This causes all requests to be routed to the master node of the cluster which should end up with similar behavior to that of readwritesplit with the exception that if a network error occurs, the connection is not recovered. If neither of these tests has any effect of memory usage, the problem must be in a core part of MaxScale. In this case, you could add session_trace=200 under the [maxscale] section. This causes an in-memory debugging log to be activated that can be extracted with maxctrl api get --pretty sessions data[].attributes.log. It's also visible in maxctrl show sessions but it's rather verbose so the aforementioned alternative command would make it easier to share. This approach should be less resource intensive compared to turning on log_info and thus it should be a more practical method of extracting information in pre-production environments. | ||||||||||||||||
| Comment by Eoin Kim [ 2023-05-12 ] | ||||||||||||||||
|
markus makela Okay, maybe I didn't wait long enough but it looks like less memory is used in both cases you suggested. Case 1:
Case 2: | ||||||||||||||||
| Comment by markus makela [ 2023-05-12 ] | ||||||||||||||||
|
Case 1: If you can continue watching the memory usage with transaction_replay disabled for a while longer, we'd know how high it ends up climbing. If the memory usage ends up being significantly lower, I suspect this is where the memory is being used and the limits on the stored transaction size (1MiB by default) aren't being respected or are ineffective in preventing it. Case 2: Yes, replication between the nodes does still happen. The readconnroute simply directs all traffic to the node labeled as Master. | ||||||||||||||||
| Comment by markus makela [ 2023-05-12 ] | ||||||||||||||||
|
We noticed that the transaction replay max size limit is documented to be 1MiB but it looks like it actually ends up being 1GiB instead. Can you try if adding transaction_replay_max_size=1Mi solves the memory usage problem? The default should also be wrong in 6.2.1 so you can also try it there and see if explicitly setting it to 1MiB solves the problem there as well. | ||||||||||||||||
| Comment by Eoin Kim [ 2023-05-12 ] | ||||||||||||||||
|
markus makela This is looking quite promising. I've set transaction_replay = true with transaction_replay_max_size = 1Mi and the result looks positive. It's been almost 4 hours since I made the change and restarted the service. Compared to disabling transaction_replay, memory usage is higher but it didn't grow continuously. It hit some point and moved up and down around it. So, that might be something. | ||||||||||||||||
| Comment by markus makela [ 2023-05-12 ] | ||||||||||||||||
|
In terms of memory, how much is this? Is it around the 1GB mark for VIRT and/or RES? | ||||||||||||||||
| Comment by Eoin Kim [ 2023-05-12 ] | ||||||||||||||||
|
In htop command, the memory usage graph was mucking around 950M. VIRT was around 1630M, and RES was smaller - around 570M. I don't know if those figures are looking good to you or not, but it didn't bust at least, which is a good thing. I've restarted the service to record the initial memory usage in htop (it was around 270M) and I'll let it run during the weekend and check on Monday morning to see how much it grows. | ||||||||||||||||
| Comment by markus makela [ 2023-05-12 ] | ||||||||||||||||
|
That's roughly on par what our testing seems to have revealed. The number of threads affects how much is in use as well as the query classifier cache. By default it uses up to 15% of memory to cache information related to parsed queries which with 4GB amounts to roughly an extra 600MB of memory. With other sorts of buffers being used, your memory usage looks relatively normal if you aren't using query_classifier_cache-size=0. | ||||||||||||||||
| Comment by Eoin Kim [ 2023-05-15 ] | ||||||||||||||||
|
Checking the htop command at this stage (today is Monday here, around noon), MaxScale didn't crash. Memory usage increased to 1.33G. VIRT size is currently showing 1795M with RES size of 1132M. I didn't touch query_classifier_cache_size setup, hence I believe it's using around 615M ish? If this result is expected, all good. If not, please let me know. | ||||||||||||||||
| Comment by markus makela [ 2023-05-15 ] | ||||||||||||||||
|
This seems more like a reasonable amount of memory considering the configuration and the number of clients. If it's possible, I'd recommend keeping an eye on the memory usage for a short while longer just in case it keeps creeping upwards. I think that we found one part of the problem (default value of transaction_replay_max_size being wrong) but I assume the client applications you are using do not do multi-gigabyte transactions which means that there's something wrong with the transaction boundary detection in MaxScale. Would it be possible for you to enable log_info on the non-production server and capture a few minutes worth of logs generated by the traffic. This should let us see why the boundary detection is broken by inspecting the SQL being sent through MaxScale. If you do capture the traffic, remeber to look at the log files first and make sure there's no sensitive information (usernames, plaintext passwords etc.) in the SQL being logged. | ||||||||||||||||
| Comment by Eoin Kim [ 2023-05-15 ] | ||||||||||||||||
|
markus makela log_info was enabled for a few minutes, and file attached. maxscale.log.zip The log is from a non-prod system. Production is dealing with more traffic, say about 10 times more. This might be able to give you a bit of rough guess? | ||||||||||||||||
| Comment by markus makela [ 2023-05-15 ] | ||||||||||||||||
|
OK, looks like the applications disable autocommit and then proceed to do most of their work in implicit transactions. This has the unfortunate side-effect of storing all of the needless commands like pings and things like SELECT NOW() in the transaction buffer which explains why it kept growing when the default value was 1GB. Large batch inserts would also explain why the memory usage would sometimes spike very quickly. Now that the limits are manually configured to 1MB, they appear to work and the storage of excessively large transactions is not done. So far I haven't actually seen a problem in transaction boundary detection and I would say that the increase in memory usage was only by the broken default value. | ||||||||||||||||
| Comment by Eoin Kim [ 2023-05-15 ] | ||||||||||||||||
|
Awesome, I'll keep the value of transaction_replay_max_size to 1Mi. Thanks for your help, much appreciated. | ||||||||||||||||
| Comment by markus makela [ 2023-05-15 ] | ||||||||||||||||
|
I'll close this as Fixed now that the default value is correctly set to 1MiB. If you observe further linear increases in memory usage, we can open a separate Jira item for those. Thank you very much for your cooperation in the investigation! |