Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-5285

SkySQL OOM Crash? Memory not being released? testing

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • 6.3.1
    • 23.02.2
    • PrimProc
    • None
    • SkySQL AWS 32x 128 single node

    Description

      Currently theres a customer whose memory appears not to be released in skysql.

      The current work around is that RDBA/ SRE has to manually mcsShutdown and mcsStart every couple days. However the customer often has to file a ticket saying its crashed and to restart before the scheduled stop/start to clear memory.

      Link to Logs & Configs in comment below

      Attachments

        Issue Links

          Activity

            Yeah..

            The query log shows a very rich set of OLTP + queries. I don't think it is possible to try each one inhouse and see the leak.
            however we do see that exemgr process % of memory grows by 5-6% of total memory every 24h.

            We also know that exemgr was refactored and merged into primproc in 22.08.x . It is unknown if this change fixes the problem.
            The question is should we try now or wait till Sky upgrades the image .

            toddstoffel allen.herrera gdorman

            alexey.vorovich alexey vorovich (Inactive) added a comment - Yeah.. The query log shows a very rich set of OLTP + queries. I don't think it is possible to try each one inhouse and see the leak. however we do see that exemgr process % of memory grows by 5-6% of total memory every 24h. We also know that exemgr was refactored and merged into primproc in 22.08.x . It is unknown if this change fixes the problem. The question is should we try now or wait till Sky upgrades the image . toddstoffel allen.herrera gdorman
            leonid.fedorov Leonid Fedorov added a comment - - edited

            I created the profiling allocator shared object

            wget https://github.com/jemalloc/jemalloc/releases/download/5.3.0/jemalloc-5.3.0.tar.bz2
            ./configure --disable-fill --with-jemalloc-prefix="" --enable-shared --enable-prof
            make 
            

            it's attached to the issue, can be downloaded here: jemalloc

            leonid.fedorov Leonid Fedorov added a comment - - edited I created the profiling allocator shared object wget https://github.com/jemalloc/jemalloc/releases/download/5.3.0/jemalloc-5.3.0.tar.bz2 ./configure --disable-fill --with-jemalloc-prefix="" --enable-shared --enable-prof make it's attached to the issue, can be downloaded here: jemalloc
            leonid.fedorov Leonid Fedorov added a comment - - edited

            this profiling allocator should be installed on one node with this steps.

            mkdir /heap_profile
            --- put shared object in this directory, with the name jemalloc.so
            chmod 777 /heap_profile
            

            then edit

            /usr/lib/systemd/system/mcs-primproc.service 
            

            and replace line

            ExecStart=/usr/bin/env bash -c "LD_PRELOAD=$(ldconfig -p | grep -m1 libjemalloc | awk '{print $1}') exec /usr/bin/PrimProc" 
            

            with

            ExecStart=/usr/bin/env bash -c "MALLOC_CONF=prof:true,prof_leak:true,lg_prof_sample:19,prof_final:true,stats_print:true,abort:false,abort_conf:false,prof_prefix:/heap_profile/PrimProc_heap_profile LD_PRELOAD=/heap_profile/jemalloc.so exec /usr/bin/PrimProc"
            

            reload systemctl config

            systemctl daemon-reload
            

            and restart primproc service

            service mcs-primproc restart
            

            After some payload there should be generated /heap_profile/*.profile files. with heap usage information. We want them for inspection

            leonid.fedorov Leonid Fedorov added a comment - - edited this profiling allocator should be installed on one node with this steps. mkdir /heap_profile --- put shared object in this directory, with the name jemalloc.so chmod 777 /heap_profile then edit /usr/lib/systemd/system/mcs-primproc.service and replace line ExecStart=/usr/bin/env bash -c "LD_PRELOAD=$(ldconfig -p | grep -m1 libjemalloc | awk '{print $1}') exec /usr/bin/PrimProc" with ExecStart=/usr/bin/env bash -c "MALLOC_CONF=prof:true,prof_leak:true,lg_prof_sample:19,prof_final:true,stats_print:true,abort:false,abort_conf:false,prof_prefix:/heap_profile/PrimProc_heap_profile LD_PRELOAD=/heap_profile/jemalloc.so exec /usr/bin/PrimProc" reload systemctl config systemctl daemon-reload and restart primproc service service mcs-primproc restart After some payload there should be generated /heap_profile/*.profile files. with heap usage information. We want them for inspection

            alan.mologorsky lets convert the instructions above from leonid.fedorov to one applicable to an existing docker container which DOES not have systemd.

            Rough outline , that I am asking you to expand and try it

            • run 63x container
            • mcs cluster stop
            • stop cmapi-server and mariadb
            • Instructions for setting MALLOC_CONF
            • start cmapi and mariadb
            • mcs cluster start
            • run workload
            • collect profile

            everyone understands that this is non-persistent setup and will not survive pod restart. This is just the frst step

            leonid.fedorov pls edit your instructions to note the location of shred object. maybe create jmalloc_test folder on https://cspkg.s3.amazonaws.com/

            alexey.vorovich alexey vorovich (Inactive) added a comment - alan.mologorsky lets convert the instructions above from leonid.fedorov to one applicable to an existing docker container which DOES not have systemd. Rough outline , that I am asking you to expand and try it run 63x container mcs cluster stop stop cmapi-server and mariadb Instructions for setting MALLOC_CONF start cmapi and mariadb mcs cluster start run workload collect profile everyone understands that this is non-persistent setup and will not survive pod restart. This is just the frst step leonid.fedorov pls edit your instructions to note the location of shred object. maybe create jmalloc_test folder on https://cspkg.s3.amazonaws.com/

            mkdir /heap_profile
             
            chmod 777 /heap_profile
             
            curl -o  /heap_profile/jemalloc.so https://jira.mariadb.org/secure/attachment/67568/libjemalloc.so-1.2
             
            sed -i 's@$MCS_INSTALL_BIN/PrimProc@MALLOC_CONF=prof:true,prof_leak:true,lg_prof_sample:19,lg_prof_interval:33,prof_final:true,stats_print:true,abort:false,abort_conf:false,prof_prefix:/heap_profile/PrimProc_heap_profile LD_PRELOAD=/heap_profile/jemalloc.so $MCS_INSTALL_BIN/PrimProc@g' /usr/share/columnstore/cmapi/mcs_node_control/custom_dispatchers/container.sh
            

            leonid.fedorov Leonid Fedorov added a comment - mkdir /heap_profile   chmod 777 /heap_profile   curl -o /heap_profile/jemalloc.so https://jira.mariadb.org/secure/attachment/67568/libjemalloc.so-1.2   sed -i 's@$MCS_INSTALL_BIN/PrimProc@MALLOC_CONF=prof:true,prof_leak:true,lg_prof_sample:19,lg_prof_interval:33,prof_final:true,stats_print:true,abort:false,abort_conf:false,prof_prefix:/heap_profile/PrimProc_heap_profile LD_PRELOAD=/heap_profile/jemalloc.so $MCS_INSTALL_BIN/PrimProc@g' /usr/share/columnstore/cmapi/mcs_node_control/custom_dispatchers/container.sh

            People

              leonid.fedorov Leonid Fedorov
              allen.herrera Allen Herrera
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.