[MCOL-1311] Memory leak in PrimProc is followed by restarting of MCS system in the occurrence of next memory processing on PM(s) Created: 2018-03-28  Updated: 2018-04-11  Resolved: 2018-04-09

Status: Closed
Project: MariaDB ColumnStore
Component/s: PrimProc
Affects Version/s: 1.1.3
Fix Version/s: Icebox

Type: Bug Priority: Major
Reporter: Zdravelina Sokolovska (Inactive) Assignee: Unassigned
Resolution: Not a Bug Votes: 0
Labels: None
Environment:

CentOS 7.4



 Description   

Memory leak in PrimProc is followed by restarting of MCS system in the occurrence of next memory processing on PM(s)

PrimProc process does not deliberate the used memory and it appears
subsequently to be killed by OS and MCS system to be restarted .

it's observed memory leak in the PrimProc process on PM(s) Nodes
after querying with the TPC-DS query set .
Finally after all queries are being finished PrimProc remained with more than 70% used memory .

The problem with restarting MCS appears when other more memory consuming processes
are running afterwards on PM(s)

For example :
Start parallel cpimport bulk data load of several columnstore tables – that's followed by columnstore restating, breaking the load and frequently by remained table locks after the columnstore recovering which is breaking additionally the load and needs manual clean up.
The same operation pass successfully if PrimProc was not get 72.0 %MEM before.

how to repeat :
load TPC-DS schema and 1TB data
run the TPC-DS power test with the supported by MCS query set.
after test is being finished successfully verify that the PrimProc on PM(s) did not returned memory

note: at appears that the ExeMgr on UM also gets more memory during some query processing --more than 50% , but it's returned back and finally test finished with 2-3% in ExeMgr

PM1
top - 02:02:15 up 2 days, 26 min,  0 users,  load average: 10.44, 6.97, 5.27
Tasks: 179 total,   1 running, 178 sleeping,   0 stopped,   0 zombie
%Cpu(s):  4.8 us,  3.2 sy,  0.0 ni, 92.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 61849300 total, 16365836 free, 45155268 used,   328196 buff/cache
KiB Swap:  1048572 total,  1048572 free,        0 used. 16191216 avail Mem
 
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 1107 root      19  -1 44.104g 0.041t  12524 S   6.7 72.0 715:29.67 PrimProc
 1014 root      20   0 2617644  21232  11560 S   6.7  0.0  20:23.51 ProcMon
  999 root      20   0  461708  19076  15640 S   0.0  0.0   0:01.07 workernode
 1141 root      20   0 1181320  17724   8920 S   6.7  0.0  53:12.81 ProcMgr
  882 root      20   0  562396  16588   5904 S   0.0  0.0   0:18.88 tuned
 1123 root      20   0  429028  14292   9920 S   0.0  0.0   0:00.10 WriteEngin+
  641 polkitd   20   0  536236  14100   4632 S   0.0  0.0   0:00.55 polkitd
  919 root      20   0  896840  13596   8072 S   0.0  0.0   0:05.20 controller+
  956 root      20   0  380812  12436   8296 S   0.0  0.0   0:22.57 ServerMoni+
  467 root      20   0   45024  11088  10764 S   0.0  0.0   0:01.76 systemd-jo+



 Comments   
Comment by David Hall (Inactive) [ 2018-03-28 ]

PrimProc uses (by default) 70% of memory for cache. Cache is not released (or it wouldn't be cache).

Comment by Andrew Hutchings (Inactive) [ 2018-04-09 ]

As David mentioned this is normal usage for PrimProc as it needs it for a block cache which fills as queries are being processed until it hits the ceiling (70% by default) and then manages itself to stop growing (an LRU cache). The usage is pretty much spot on the expected amount.

ExeMgr uses the memory for joins and aggregates so that memory is freed after every query.

Closing this as !Bug

Generated at Thu Feb 08 02:27:47 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.