Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-5796

PrimProc CPU Usage at 100% (Causing Stuck Queries)

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 23.02.8
    • None
    • PrimProc
    • None
    • S3 Cohesity, NFS, Rhel 8

    Description

      The root cause remains undetermined, as the issue occurs sporadically. The customer cluster has been experiencing this problem for over a year. While the behavior is similar to what's described in https://jira.mariadb.org/browse/MCOL-5565 (which was initially suspected to be the issue but has since been resolved), an analysis by the Columnstore Engineering team suggests that the underlying cause is different.

      Business Environment details:

      • cpimport runs every hour
      • caldropartition runs every night

      TOP Processes

          PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
      3917453 mysql     20   0 4879.0g 290.6g  11464 S  6290  38.5  99658:27 /usr/bin/PrimProc
      3915726 mysql     20   0   18.1g   2.4g  19004 S   9.5   0.3   2781:52 /usr/sbin/mariadbd
      3917430 mysql     20   0  958208 748796 744732 S   0.0   0.1  63:49.48 /usr/bin/workernode DBRM_Worker2
      3917700 mysql     20   0  237404   9172   5408 S   0.0   0.0   5:09.47 /usr/bin/WriteEngineServer
      

      Identified Warnings:

      The certificate /usr/share/columnstore/cmapi/cmapi_server/self-signed.crt for cmapi https is expired.
      There is 1 zombie process.
      Iptables rules exist.
      QueryStats Enabled = N
      HashJoin AllowDiskBasedJoin = N
      Errors found in crit logs: reading compression header. Check for possible data file corruption.
      Unknown ref item error found in error log. Mariadb server version may not be fully compatible with columnstore version.
      There are 3 symbolic links found in /var/lib/columnstore.
      

      Other identified issues:

      HUGE NUMBER OF CONNECT RETRY IN MARIADB LOGS (1 million entries):
      ClientRotator caught exception: InetStreamSocket::connect: connect() error: Connection refused to: InetStreamSocket: sd: 76 inet: 127.0.0.1 port: 8601
       
      EXECMGR UNRESPONSIVE?
      Could not get a ExeMgr connection.
      joblist[153385]: 18.015363 |0|0|0| C 05 CAL0000: /home/jenkins/workspace/Build-Package/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX_ON_ES_BACKUP_DEBUGSOURCE/storage/columnstore/columnstore/dbcon/execplan/clientrotator.cpp @ 379 Could not get a ExeMgr connection.      %%10%%
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              patrizio.tamorri Patrizio Tamorri
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.