Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-3470

ServerMonitor hung, running at 100% cpu and not responding

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Minor
    • Resolution: Won't Fix
    • Affects Version/s: 1.2.2
    • Fix Version/s: N/A
    • Component/s: oam
    • Labels:
      None
    • Environment:
      1um 3pm with local query enabled

      Description

      Customer reporting that ServerMonitor on PM1 continually gets hung where it shows running 100% cpu usage and fails to respond to mcsadmin commands. A restartSystem will resolve the issue, but it will eventually get back into the same state.

      mcsadmin> getModuleCpuUsers pm1
      getmodulecpuusers Tue Sep 3 09:33:57 2019

      Failed to get Top CPU Users: API Failure return in getTopProcessCpuUsers API

      top from PM1:

      PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
      70427 root 20 0 230136 34488 8940 S 100.0 0.0 3921:54 ServerMonitor
      7814 root 39 19 0 0 0 S 3.0 0.0 110:44.50 kipmi0
      120599 root 20 0 608500 48744 2912 S 2.0 0.0 567:48.58 ProcMgr

      gdb of ServerMonitor:

      (gdb) bt
      #0 0x00007f2f69a314ed in __lll_lock_wait () from /lib64/libpthread.so.0
      #1 0x00007f2f69a2cdcb in _L_lock_883 () from /lib64/libpthread.so.0
      #2 0x00007f2f69a2cc98 in pthread_mutex_lock () from /lib64/libpthread.so.0
      #3 0x00005560d23ffa52 in msgProcessor () at /data/buildbot/bb-worker/centos7/mariadb-columnstore-engine/oamapps/serverMonitor/msgProcessor.cpp:144
      #4 0x00005560d23dd113 in main (argc=<optimized out>, argv=<optimized out>) at /data/buildbot/bb-worker/centos7/mariadb-columnstore-engine/oamapps/serverMonitor/main.cpp:325
      (gdb) info threads
      Id Target Id Frame
      5 Thread 0x7f2f62fff700 (LWP 70503) "ServerMonitor" 0x00007f2f69a2ed12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
      4 Thread 0x7f2f61fff700 (LWP 70504) "ServerMonitor" 0x00007f2f6a8ff366 in alarmmanager::operator>> (input=..., alarm=...) at /data/buildbot/bb-worker/centos7/mariadb-columnstore-engine/oamapps/alarmmanager/alarm.cpp:121
      3 Thread 0x7f2f60fff700 (LWP 74505) "ServerMonitor" 0x00007f2f69a314ed in __lll_lock_wait () from /lib64/libpthread.so.0
      2 Thread 0x7f2f607fe700 (LWP 74506) "ServerMonitor" 0x00007f2f69a314ed in __lll_lock_wait () from /lib64/libpthread.so.0

      • 1 Thread 0x7f2f6e0c88c0 (LWP 70427) "ServerMonitor" 0x00007f2f69a314ed in __lll_lock_wait () from /lib64/libpthread.so.0

        Attachments

          Activity

            People

            Assignee:
            Unassigned Unassigned
            Reporter:
            hill David Hill (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.