[MXS-3840] Memory leak in 2.4.17? Created: 2021-10-27  Updated: 2022-03-15  Resolved: 2022-03-15

Status: Closed
Project: MariaDB MaxScale
Component/s: QueryClassifier
Affects Version/s: 2.4.17
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Jeff Smelser Assignee: markus makela
Resolution: Fixed Votes: 0
Labels: None
Environment:

Kubernetes


Attachments: PNG File image-2021-10-27-10-44-35-458.png    
Issue Links:
Relates
relates to MXS-4008 Query classifier cache does not prope... Closed
Sprint: MXS-SPRINT-145, MXS-SPRINT-146

 Description   

We were using 2.5.15 and having good success with it. Unfortunatly, we needed to downgrade to 2.4.17 because we need to use the the weighty keyword. When we downgraded, it seems like we have a memory leak going on.

Since downgrading. we added the weighby keyword and that's has been working. but memory just grows. On rare occasion, I will see memory go down but it usually goes right back up. We have set the limit to 1g and its still running out of memory. We plan to run add to 2g to see while we debug but concerned that wont be enough.

I am looking for a bit of help to figure out whats taking all this memory. When I look at qc_cache_size it looks like its 300K per thread. (4)
│ QC cache size │ 299956 │

We don't have the cache even set up. [Cache]

So I cant figure out where memory is even going?

Here is our memory footprint for maxscale. (yes we have 7 versions running)

The drops are going to be when its OOMKilled when it goes over 1g.



 Comments   
Comment by markus makela [ 2021-10-27 ]

2.5 has writeq_high_water and writeq_low_water on by default. I'd recommend adding those with the default values from 2.5 into the configuration. This should rule out sudden bursts of network traffic as a cause as usually that causes memory usage to spike up rapidly.

Comment by Jeff Smelser [ 2021-10-28 ]

So I did that and I also lowered the qc_cache_size to 1k. I observed if I set it to 100M, the memory still skyrockets up way past 100M. (I expect it to go over some, but it was in the 500M to 1g range. If I lower it via maxctrl, qc_cache goes down in the maxctrl show threads, but the memory never comes back. If I start it up with 1k, it behaves a lot better. Maybe there is a memory cleanup bug in the 2.4 series?

Comment by markus makela [ 2021-10-29 ]

What values did you use for the watermark parameters? I'd start with writeq_high_water=16Mi and writeq_low_water=8ki as those are the defaults from 2.5.

As for memory not coming back, that's usually how the memory allocator works: it keeps an internal cache of the memory and only frees memory if it's large enough and not in use.

As for memory leaks, it's possible that there is one in 2.4. Since the memory usage ramps up to its maximum value within an hour, this suggests that something other than a steady leak of memory is going on. Were all the MaxScale instances in use over that whole period or only some of them?

Comment by markus makela [ 2021-11-23 ]

jsmelser_recharge any updates?

Comment by Jeff Smelser [ 2022-03-14 ]

sorry, I forgot all about this. I ended up lower the memory values pretty low and it stopped going up so much. We fixed this. Sorry, forgot to respond here so we can close it out.

Comment by markus makela [ 2022-03-15 ]

Thanks, I'll go ahead and close this issue.

Generated at Thu Feb 08 04:24:19 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.