[MXS-4451] Memory issue in Maxscale Created: 2022-12-15  Updated: 2023-01-31  Resolved: 2023-01-17

Status: Closed
Project: MariaDB MaxScale
Component/s: Core
Affects Version/s: 22.08.3
Fix Version/s: 22.08.4

Type: Bug Priority: Major
Reporter: Naresh Chandra Assignee: Johan Wikman
Resolution: Fixed Votes: 0
Labels: None

Attachments: PNG File image-2022-12-15-12-35-10-953.png    
Issue Links:
Relates
relates to MXS-4463 Allow base read buffer size to be tuned Closed
relates to MXS-4496 Limit resultset read size Closed

 Description   

Team,

After upgrading the maxscale to 22.08.3 version then we are getting OOM issues, previously with 6.3 is working fine but after upgrading to 22 version then we are getting the Memory issues.

Attached the system log file:

Dec 14 08:17:43 test123 systemd: Starting CheckMK (10.154.66.132:45202)...
Dec 14 08:17:46 test123 kernel: vmtoolsd invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
Dec 14 08:17:46 test123 kernel: vmtoolsd cpuset=/ mems_allowed=0
Dec 14 08:17:46 test123 kernel: CPU: 0 PID: 817 Comm: vmtoolsd Kdump: loaded Tainted: P            E  ------------   3.10.0-1160.76.1.el7.x86_64 #1
Dec 14 08:17:46 test123 kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
Dec 14 08:17:46 test123 kernel: Call Trace:
Dec 14 08:17:46 test123 kernel: [<ffffffff915865c9>] dump_stack+0x19/0x1b
Dec 14 08:17:46 test123 kernel: [<ffffffff91581668>] dump_header+0x90/0x229
Dec 14 08:17:46 test123 kernel: [<ffffffff90f06a42>] ? ktime_get_ts64+0x52/0xf0
Dec 14 08:17:46 test123 kernel: [<ffffffff90f5e0ff>] ? delayacct_end+0x8f/0xb0
Dec 14 08:17:46 test123 kernel: [<ffffffff90fc25ad>] oom_kill_process+0x2cd/0x490
Dec 14 08:17:46 test123 kernel: [<ffffffff90fc1f9d>] ? oom_unkillable_task+0xcd/0x120
Dec 14 08:17:46 test123 kernel: [<ffffffff90fc2c9a>] out_of_memory+0x31a/0x500
Dec 14 08:17:46 test123 kernel: [<ffffffff90fc9894>] __alloc_pages_nodemask+0xad4/0xbe0
Dec 14 08:17:46 test123 kernel: [<ffffffff910193b8>] alloc_pages_current+0x98/0x110
Dec 14 08:17:46 test123 kernel: [<ffffffff90fbe057>] __page_cache_alloc+0x97/0xb0
Dec 14 08:17:46 test123 kernel: [<ffffffff90fc1000>] filemap_fault+0x270/0x420
Dec 14 08:17:46 test123 kernel: [<ffffffffc048891e>] __xfs_filemap_fault+0x7e/0x1d0 [xfs]
Dec 14 08:17:46 test123 kernel: [<ffffffffc0488b1c>] xfs_filemap_fault+0x2c/0x30 [xfs]
Dec 14 08:17:46 test123 kernel: [<ffffffff90fee7da>] __do_fault.isra.61+0x8a/0x100
Dec 14 08:17:46 test123 kernel: [<ffffffff90feed8c>] do_read_fault.isra.63+0x4c/0x1b0
Dec 14 08:17:46 test123 kernel: [<ffffffff90ff65d0>] handle_mm_fault+0xa20/0xfb0
Dec 14 08:17:46 test123 kernel: [<ffffffff91594653>] __do_page_fault+0x213/0x500
Dec 14 08:17:46 test123 kernel: [<ffffffff91594975>] do_page_fault+0x35/0x90
Dec 14 08:17:46 test123 kernel: [<ffffffff91590778>] page_fault+0x28/0x30
Dec 14 08:17:46 test123 kernel: Mem-Info:
Dec 14 08:17:46 test123 kernel: active_anon:3496846 inactive_anon:433447 isolated_anon:0#012 active_file:0 inactive_file:292 isolated_file:77#012 unevictable:2 dirty:0 writeback:0 unstabl
e:0#012 slab_reclaimable:7359 slab_unreclaimable:14930#012 mapped:8263 shmem:22191 pagetables:10646 bounce:0#012 free:33813 free_pcp:25 free_cma:0
Dec 14 08:17:46 test123 kernel: Node 0 DMA free:15840kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB unst
able:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Dec 14 08:17:46 test123 kernel: lowmem_reserve[]: 0 2830 15851 15851
Dec 14 08:17:46 test123 kernel: Node 0 DMA32 free:63996kB min:12056kB low:15068kB high:18084kB active_anon:2233292kB inactive_anon:558460kB active_file:52kB inactive_file:444kB unevictable:4kB isolated(anon):0kB isolated(file):128kB present:3129152kB managed:2898720kB mlocked:4kB dirty:0kB writeback:0kB mapped:4236kB shmem:58472kB slab_reclaimable:5260kB slab_unreclaimable:8248kB kernel_stack:1056kB pagetables:7712kB unstable:0kB bounce:0kB free_pcp:4kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:787 all_unreclaimable? yes
Dec 14 08:17:46 test123 kernel: lowmem_reserve[]: 0 0 13020 13020
Dec 14 08:17:46 test123 kernel: Node 0 Normal free:55416kB min:55456kB low:69320kB high:83184kB active_anon:11754092kB inactive_anon:1175328kB active_file:0kB inactive_file:724kB unevictable:4kB isolated(anon):0kB isolated(file):180kB present:13631488kB managed:13332928kB mlocked:4kB dirty:0kB writeback:0kB mapped:28816kB shmem:30292kB slab_reclaimable:24176kB slab_unreclaimable:51440kB kernel_stack:5632kB pagetables:34872kB unstable:0kB bounce:0kB free_pcp:96kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1291 all_unreclaimable? yes
Dec 14 08:17:46 test123 kernel: lowmem_reserve[]: 0 0 0 0
Dec 14 08:17:46 test123 kernel: Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15840kB
Dec 14 08:17:46 test123 kernel: Node 0 DMA32: 644*4kB (UEM) 112*8kB (UEM) 174*16kB (UEM) 124*32kB (UE) 86*64kB (UEM) 52*128kB (UEM) 30*256kB (UEM) 13*512kB (UEM) 5*1024kB (UEM) 3*2048kB (EM) 4*4096kB (EM) = 64368kB
Dec 14 08:17:46 test123 kernel: Node 0 Normal: 2435*4kB (UEM) 808*8kB (UE) 200*16kB (UEM) 223*32kB (UEM) 120*64kB (UEM) 38*128kB (UE) 27*256kB (UEM) 19*512kB (UEM) 0*1024kB 0*2048kB 0*4096kB = 55724kB
Dec 14 08:17:46 test123 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Dec 14 08:17:46 test123 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Dec 14 08:17:46 test123 kernel: 23873 total pagecache pages
Dec 14 08:17:46 test123 kernel: 1259 pages in swap cache
Dec 14 08:17:46 test123 kernel: Swap cache stats: add 935897, delete 934645, find 111696/121421
Dec 14 08:17:46 test123 kernel: Free swap  = 0kB
Dec 14 08:17:46 test123 kernel: Total swap = 2097148kB
Dec 14 08:17:46 test123 kernel: 4194157 pages RAM
Dec 14 08:17:46 test123 kernel: 0 pages HighMem/MovableOnly
Dec 14 08:17:46 test123 kernel: 132269 pages reserved
Dec 14 08:17:46 test123 kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
Dec 14 08:17:46 test123 kernel: [  623]     0   623    32362    13375      70       36             0 systemd-journal
Dec 14 08:17:46 test123 kernel: [  650]     0   650    68076        0      31      630             0 lvmetad
Dec 14 08:17:46 test123 kernel: [  657]     0   657    11552        2      23      323         -1000 systemd-udevd
Dec 14 08:17:46 test123 kernel: [  768]     0   768     2148        1      10       29             0 falcond
Dec 14 08:17:46 test123 kernel: [  769]     0   769   370371     2068      88     1519             0 falcon-sensor
Dec 14 08:17:46 test123 kernel: [  782]     0   782    13883       28      27       84         -1000 auditd
Dec 14 08:17:46 test123 kernel: [  816]     0   816    42037        1      38      353             0 VGAuthService
Dec 14 08:17:46 test123 kernel: [  817]     0   817    70323      128      56      200             0 vmtoolsd
Dec 14 08:17:46 test123 kernel: [  819]   999   819   155727      156      65     1287             0 polkitd
Dec 14 08:17:46 test123 kernel: [  823]     0   823     5422       42      16       42             0 irqbalance
Dec 14 08:17:46 test123 kernel: [  826]    81   826    17197       77      38      101          -900 dbus-daemon
Dec 14 08:17:46 test123 kernel: [  842]     0   842    31598       10      18      148             0 crond
Dec 14 08:17:46 test123 kernel: [  862]     0   862    27552        1      10       32             0 agetty
Dec 14 08:17:46 test123 kernel: [ 1113]     0  1113    28851        1      13       49             0 start-puppet-ag
Dec 14 08:17:46 test123 kernel: [ 1114]     0  1114   146607      122     103     3740             0 tuned
Dec 14 08:17:46 test123 kernel: [ 1117]     0  1117    62786     8770      78      840             0 puppet
Dec 14 08:17:46 test123 kernel: [ 1118]     0  1118     4822        8      13       90             0 nessus-service
Dec 14 08:17:46 test123 kernel: [ 1120]     0  1120    57109      136      65      965             0 snmpd
Dec 14 08:17:46 test123 kernel: [ 1122]     0  1122    28250       14      57      245         -1000 sshd
Dec 14 08:17:46 test123 kernel: [ 1123]     0  1123   144085     7016     136      246             0 rsyslogd
Dec 14 08:17:46 test123 kernel: [ 1125]     0  1125    13177        1      32      105             0 oddjobd
Dec 14 08:17:46 test123 kernel: [ 1133]     0  1133     6261        0      17       59             0 xinetd
Dec 14 08:17:46 test123 kernel: [ 7873]    38  7873     9068       44      23      121             0 ntpd
Dec 14 08:17:46 test123 kernel: [29819]     0 29819    22615       20      43      262             0 master
Dec 14 08:17:46 test123 kernel: [29828]    89 29828    24239       24      47      272             0 qmgr
Dec 14 08:17:46 test123 kernel: [94619]   997 94619  4560470  3875456    8382   356757             0 maxscale
Dec 14 08:17:46 test123 kernel: [115911]     0 115911     4972       67      13        0             0 nessusd
Dec 14 08:17:46 test123 kernel: [115912]     0 115912   163982    17371     111    18457             0 nessus-agent-mo
Dec 14 08:17:46 test123 kernel: [63839]     0 63839     9231       23      23       65             0 systemd-logind
Dec 14 08:17:46 test123 kernel: [63850]     0 63850   115610      517     171      162             0 winbindd
Dec 14 08:17:46 test123 kernel: [63854]     0 63854   123985      459     187      314             0 winbindd
Dec 14 08:17:46 test123 kernel: [63914]     0 63914   116237      548     162      133             0 winbindd
Dec 14 08:17:46 test123 kernel: [63915]     0 63915   115611      501     165      175             0 winbindd
Dec 14 08:17:46 test123 kernel: [63971]     0 63971   115611      491     166      194             0 winbindd
Dec 14 08:17:46 test123 kernel: [64664]    89 64664    24194        9      48      268             0 pickup
Dec 14 08:17:46 test123 kernel: [66104]     0 66104    28421      167      13        0             0 check_mk_agent
Dec 14 08:17:46 test123 kernel: [66302]     0 66302    28314       64      11        0             0 tenable_agent.s
Dec 14 08:17:46 test123 kernel: [66312]     0 66312    28314       64      10        0             0 tenable_agent.s
Dec 14 08:17:46 test123 kernel: [66313]     0 66313     8775      538      19        0             0 nessuscli
Dec 14 08:17:46 test123 kernel: [66314]     0 66314    28196       29      10        0             0 grep
Dec 14 08:17:46 test123 kernel: [66315]     0 66315    28404       39      11        0             0 awk
Dec 14 08:17:46 test123 kernel: Out of memory: Kill process 94619 (maxscale) score 924 or sacrifice child
Dec 14 08:17:46 test123 kernel: Killed process 94619 (maxscale), UID 997, total-vm:18241880kB, anon-rss:15501824kB, file-rss:0kB, shmem-rss:0kB
Dec 14 08:17:46 test123 systemd: Started CheckMK (10.154.66.132:45202).
Dec 14 08:17:48 test123 systemd: maxscale.service: main process exited, code=killed, status=9/KILL
Dec 14 08:17:48 test123 systemd: Unit maxscale.service entered failed state.
Dec 14 08:17:48 test123 systemd: maxscale.service failed.
Dec 14 08:17:48 test123 systemd: maxscale.service holdoff time over, scheduling restart.
Dec 14 08:17:48 test123 systemd: Stopped MariaDB MaxScale Database Proxy.
Dec 14 08:17:48 test123 systemd: Starting MariaDB MaxScale Database Proxy...
Dec 14 08:17:48 test123 maxscale: The systemd watchdog is Enabled. Internal timeout = 30s
Dec 14 08:17:48 test123 maxscale: Using up to 2.32GiB of memory for query classifier cache
Dec 14 08:17:48 test123 systemd: Started MariaDB MaxScale Database Proxy.



 Comments   
Comment by markus makela [ 2022-12-15 ]

What was going on during the time when that spike in memory usage happened?

Comment by Naresh Chandra [ 2022-12-15 ]

I am not sure but suddenly went high spike and restarted the service. I am sure there is not much load on this maxscale server.

Comment by Naresh Chandra [ 2023-01-12 ]

Markus, Any update on this issue? Seems from 22 version we are getting memory leak issues; it was working fine in the previous version like Masxcale 6 version.

We did not change any configuration and everything same for Maxscale 6 and Maxscale 22 version.

Comment by markus makela [ 2023-01-13 ]

naresh.chandra@copart.com can you try adding writeq_high_water=16 and writeq_low_water=1 under the [maxscale] section? We've seen some improvement in memory usage when this is lowered. If you can report back whether this change helps, we'll know if you are facing the same problem.

Comment by Naresh Chandra [ 2023-01-13 ]

Markus,

What is the impact if we change those parameters? will it impact any write/read performance?

Comment by Johan Wikman [ 2023-01-16 ]

naresh.chandra@copart.com writeq_high_water and writeq_low_water affect the behaviour when MaxScale can read faster from the server than it can write to the client. In such a situation those parameters affect how much data MaxScale ends up buffering on behalf of the client and when MaxScale starts to read again from the server, when it has stopped doing that.

Duo to other changes between 6 and 22.08 the latter ends up buffering more data than what the former does, which may be the explanation why you encounter OOMs with 22.08 but not with 6.3. This will also affect the performance and the QPS will drop.

So, with the suggested change above, you may no longer experience OOMs and the QPS will go up.

Comment by Naresh Chandra [ 2023-01-17 ]

Hi Johan Wikman,

Thanks for the update.

Whatever changes between 6 and 22, it should always improve performance and stability. But we are always seeing many issues with the newer releases, instead of improvements and stability. Sometimes it was working fine with older versions but not newer versions, we feel that older versions are better than newer versions. Sometimes, we can't change parameters on the fly in the production environment, as we don't know what impact with the changes given.?

As an end user, we are expecting same stability and better performance instead of new issues.

Comment by Johan Wikman [ 2023-01-17 ]

Hi naresh.chandra@copart.com

Obviously the goal is to continuously improve the stability, performance and overall quality. However, when a new major version is introduced (6, 22.08), it is possible that there initially will be unforeseen issues, as e.g. larger architectural improvements may have been made than ever are made in a maintenance release. Since our test-environment is not identical with every production environment where MaxScale will be used, it is possible that something may be missed that we later have to address.

There has been almost 20 maintenance releases of 6, but so far only 3 of 22.08. So, while 6 is now quite mature, 22.08 is still in the beginning of its life-span.

Comment by Naresh Chandra [ 2023-01-17 ]

Hi Johan,

Thanks for the clarification. I agree that Major version has architectural changes/improvements. I will try to change the parameters and let you know.

Comment by Johan Wikman [ 2023-01-17 ]

The defaults of writeq_high_water and writeq_low_water has been chanced to 64kB/1kB. The earlier defaults of 16M/8kB may in certain circumstances lead to MaxScale using excessive amounts of memory. The change also improves the performance.

Generated at Thu Feb 08 04:28:44 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.