[MCOL-379] system down and high memory alarm set after install Created: 2016-10-27  Updated: 2023-10-26  Resolved: 2017-05-31

Status: Closed
Project: MariaDB ColumnStore
Component/s: ?
Affects Version/s: 1.0.4
Fix Version/s: 1.0.10, 1.1.0

Type: Bug Priority: Minor
Reporter: David Hill (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

saw on the aws systems and the corp system


Sprint: 2017-11

 Description   

did a new install and logged into the mcsadmin console and saw these 2 alarms. need to investigate why the alarm is issues and why the module is black

system is up and the memory usage is normal

AlarmID = 7
Brief Description = MEMORY_USAGE_HIGH
Alarm Severity = CRITICAL
Time Issued = Thu Oct 27 16:52:43 2016
Reporting Module =
Reporting Process = ServerMonitor
Reported Device = Local-Memory

AlarmID = 23
Brief Description = SYSTEM_DOWN_MANUAL
Alarm Severity = CRITICAL
Time Issued = Thu Oct 27 16:53:10 2016
Reporting Module =
Reporting Process = ProcessManager
Reported Device = System



 Comments   
Comment by David Hall (Inactive) [ 2016-10-31 ]

I'm seeing the SYSTEM_DOWN_MANUAL, but not the MEMORY_USAGE_HIGH. Just a datapoint.

Comment by David Hill (Inactive) [ 2016-12-04 ]

fixed the empty "Reporting Module"

AlarmID = 7
Brief Description = MEMORY_USAGE_HIGH
Alarm Severity = CRITICAL
Time Issued = Sun Dec 4 20:29:37 2016
Reporting Module = pm1
Reporting Process = ServerMonitor
Reported Device = Local-Memory

commit 0aa0ecfb56bf88a00c7b0ce1d4fe2182a807109a
Author: david hill <david.hill@mariadb.com>
Date: Sun Dec 4 12:27:48 2016 -0600

MCOL-379 - Fixed empty Reporting Module

oamapps/alarmmanager/alarmmanager.cpp | 2 +-

Comment by David Hill (Inactive) [ 2017-03-28 ]

seeing the MEMORY_USAGE_HIGH on google cloud centos 7 instances in testing 1.0.8 and 1.1.0AlarmID = 7
Brief Description = MEMORY_USAGE_HIGH
Alarm Severity = CRITICAL
Time Issued = Tue Mar 28 15:26:01 2017
Reporting Module = pm1
Reporting Process = ServerMonitor
Reported Device = Local-Memory

mcsadmin> getsystemres
getsystemresourceusage Tue Mar 28 15:27:56 2017

System Resource Usage per Module

Module 'pm1' Resource Usage

CPU: 0% Usage
Mem: 15400764k total, 696368k used, 2679084k cache, 4% Usage
Swap: 0 k total, 0k used, 0% Usage
Top CPU Process Users: systemd 0%
Top Memory Process Users: mysqld 2%, PrimProc 1%, python 1%, python 1%, workernode 1%
Disk Usage: / 49%

Comment by David Hill (Inactive) [ 2017-05-26 ]

fix checked in and tested on 1.0.10. will check into 1.1.0

as part of the testing, I used the following script to get memory usage up over 90% to test that the alarms gets report and clear when script exits.
Also fix does away with the invalid critical memory alarm and logs at startup.

here is the test script to use memory and how to use it, the value to enter will vary based on the amount of memory.

[root@ip-172-30-0-56 ~]# cat memUser.sh
#!/usr/bin/env python

import sys
import time

if len(sys.argv) != 2:
print "usage: fillmem <number-of-megabytes>"
sys.exit()

count = int(sys.argv[1])

megabyte = (0,) * (1024 * 1024 / 8)

data = megabyte * count

while True:
time.sleep(1)

[root@ip-172-30-0-56 ~]# ./memUser.sh 13800

results from the test showing after teh script starts and stops:

mcsadmin> getSystemMemoryUsers
getsystemmemoryusers Fri May 26 21:03:25 2017

System Process Top Memory Users per Module

Module 'pm1' Top Memory Users (in bytes)

Process Memory Used Memory Usage %
----------------- ----------- --------------
python 1413655 89
mysqld 18079 2
PrimProc 3102 1
python 1827 1
workernode 1779 1

mcsadmin> getSystemMemory
getsystemmemory Fri May 26 21:03:28 2017

System Memory Usage per Module (in K bytes)

Module Mem Total Mem Used Cache Mem Usage % Swap Total Swap Used Swap Usage %
------ --------- -------- ------- ----------- ---------- --------- ------------
pm1 16004820 14686744 1130216 91 2097148 140 0
mcsadmin> getactivealarm
getactivealarms Fri May 26 21:03:33 2017

Active Alarm List:

AlarmID = 7
Brief Description = MEMORY_USAGE_HIGH
Alarm Severity = CRITICAL
Time Issued = Fri May 26 21:03:33 2017
Reporting Module = pm1
Reporting Process = ServerMonitor
Reported Device = Local-Memory

mcsadmin> getSystemMemoryUsers
getsystemmemoryusers Fri May 26 21:03:43 2017

System Process Top Memory Users per Module

Module 'pm1' Top Memory Users (in bytes)

Process Memory Used Memory Usage %
----------------- ----------- --------------
mysqld 18079 2
PrimProc 3102 1
python 1827 1
workernode 1779 1
DMLProc 1762 1

mcsadmin> getSystemMemory
getsystemmemory Fri May 26 21:03:46 2017

System Memory Usage per Module (in K bytes)

Module Mem Total Mem Used Cache Mem Usage % Swap Total Swap Used Swap Usage %
------ --------- -------- ------- ----------- ---------- --------- ------------
pm1 16004820 525912 1130240 3 2097148 140 0
mcsadmin> getactivealarm
getactivealarms Fri May 26 21:03:48 2017

Active Alarm List:

mcsadmin>

Comment by David Hill (Inactive) [ 2017-05-26 ]

develop-1.0 commit

commit ccbdb07007dfde95c610fcead008b62e30d7411f
Author: david hill <david.hill@mariadb.com>
Date: Fri May 26 10:52:17 2017 -0500

MCOL-379 - fix false critical memory usage alarm

oam/install_scripts/post-install | 2 +-
oamapps/serverMonitor/memoryMonitor.cpp | 41 +++++++++++++++++++++++++++++++----------
procmgr/processmanager.cpp | 5 ++++-
utils/common/cgroupconfigurator.cpp | 39 +++++++++++++++++++++++++--------------

Comment by David Hill (Inactive) [ 2017-05-26 ]

fixed in 1.1.0

commit 9c7434ba52dc4ef8a27d4c7b33d4b0e53981174e
Author: David Hill <david.hill@mariadb.com>
Date: Fri May 26 16:28:34 2017 -0500

MCOL-379 - fix false critical mem alarm

oam/install_scripts/post-install | 2 +-
oamapps/serverMonitor/memoryMonitor.cpp | 41 +++++++++++++++++++++++++++++++----------
procmgr/processmanager.cpp | 3 ++-
utils/common/cgroupconfigurator.cpp | 39 +++++++++++++++++++++++++--------------
4 files changed, 59 insertions, 26 deletions

Comment by David Hill (Inactive) [ 2017-05-30 ]

additional code changes made to make it more versatile. Original code worked on centos 6, but no the newer OSs. change to work on all OS, since the proc/meminfo is different between centos 6 and centos 7

develop commit

commit 2cc5fc7195a76ce6ff475ba85f67123ec94b9fbe
Author: david hill <david.hill@mariadb.com>
Date: Tue May 30 15:12:50 2017 -0500

MCOL-379 - changed to make the check for mem available for dynamic

utils/common/cgroupconfigurator.cpp

develop-1.0

commit 22191d908890ecf6dc4e911d70c14b8336bc693a
Author: david hill <david.hill@mariadb.com>
Date: Tue May 30 14:33:21 2017 -0500

MCOL-379 - changed to make the check for mem available for dynamic

utils/common/cgroupconfigurator.cpp

Comment by Daniel Lee (Inactive) [ 2017-05-31 ]

Build verified: Github source 1.0.0 and 1.1.0

1.1.0

[root@localhost mariadb-columnstore-server]# git show
commit 349cae544b6bc71910267a3b3b0fa3fb57b0a587
Merge: bd13090 2ecb85c
Author: benthompson15 <ben.thompson@mariadb.com>
Date: Thu May 4 16:06:16 2017 -0500

[root@localhost mariadb-columnstore-engine]# git show
commit 2cc5fc7195a76ce6ff475ba85f67123ec94b9fbe
Author: david hill <david.hill@mariadb.com>
Date: Tue May 30 15:12:50 2017 -0500

1.0.0

[root@localhost mariadb-columnstore-server]# git show
commit 478209c9d58e0c34d0a177b39b42ed865ad30ccf
Author: David Hill <david.hill@mariadb.com>
Date: Thu May 18 15:11:05 2017 -0500

[root@localhost mariadb-columnstore-engine]# git show
commit 22191d908890ecf6dc4e911d70c14b8336bc693a
Author: david hill <david.hill@mariadb.com>
Date: Tue May 30 14:33:21 2017 -0500

mcsadmin> getsystemmemory
getsystemmemory Wed May 31 19:14:13 2017

System Memory Usage per Module (in K bytes)

Module Mem Total Mem Used Cache Mem Usage % Swap Total Swap Used Swap Usage %
------ --------- -------- ------- ----------- ---------- --------- ------------
um1 3982488 458596 2738652 11 1572860 0 0
pm1 3982488 354184 2964364 8 1572860 0 0
pm2 3982488 347344 2899124 8 1572860 0 0

No alarms issued

Generated at Thu Feb 08 02:20:37 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.