Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-4043

Memory leaks Part 1. & errors -> need a valgrind pass

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 1.4.4, 1.5.3
    • 1.5.3
    • ExeMgr
    • None

    Description

      I ran our regression test on our 1.5 RC, and at the end of it, ExeMgr was using 45GB of virt mem, and about 2.6GB of real mem when it was idle. Gagan compared 1.4 & 1.5, found mem usage to be similar. Provoked an investigation.

      I attached the valgrind log from running VG on ExeMgr, on our latest 1.4.4 RC (enterprise server @ 6c7e73d7, engine @ 8ab3cc301). For this test I ran all of working_tpch1, and working_tpch1_compareLogOnly, except for MCOL_3503.sql. moda() has leaks not recorded in this log because I found that early and commented out the test that uses it the most to see how much of a difference it made in the final mem usage.

      At the end of that test, ExeMgr was using around 1.2GB when idle. The leaks reported in the log do not add up to anywhere close to 1.2GB. My suspicion is that all of these little bits of mem over time are causing a good deal of mem fragmentation. It could also be that a good deal of mem still has a reference, so VG doesn't consider it leaked. Need to start fixing these little things we have in the log, then reevaluate.

      Jun 22 update: Testing @ the columnstore-1.5.2-1 tag, I noticed test211 makes ExeMgr allocate around 1GB/min that it doesn't use. To be clear, the mem usage shows up as virtual mem usage, not resident mem usage. Resident mem usage also appears to be climbing but with a much lower slope.

      When a particular patch merges easily into 1.2, be sure to include it.

      Attachments

        Issue Links

          Activity

            The DynamicParseTreeVec added to track parsetree objects that are causing leaks is a hard kludge. Perhaps the code should actually be doing a deep copy of the parsetree, instead of storing a pointer,

            David.Hall David Hall (Inactive) added a comment - The DynamicParseTreeVec added to track parsetree objects that are causing leaks is a hard kludge. Perhaps the code should actually be doing a deep copy of the parsetree, instead of storing a pointer,

            Server 10.4 (community) commit: 205b0ce6ad21dbafe8def505307b4922398db5b2
            Engine develop-1.4 commit: 8f92f24b02f01822214f0f140a7ad0cd774bd392

            Attached are the Exemgr logs under valgrind on these commits (plus the patch from https://github.com/mariadb-corporation/mariadb-columnstore-engine/pull/1297): exemgr_valgrind.log. Same tests under working_tpch1 and working_tpch1_compareLogOnly were run for this log.

            memleaks due to MODA functions are now gone, after David's fix.
            memleaks in ordering::OrderByData::OrderByData() are now gone.
            Mentions of 'definitely lost' bytes for joblist::simpleScalarFilterToParseTree() are down from 14 to 1. Not sure how that single leak is still there.

            David.Hall has expressed a legitimate concern on the memleak fix in https://github.com/mariadb-corporation/mariadb-columnstore-engine/pull/1297 where we are storing pointers to parsetree objects for later deletion. This is because pointers to these parsetree objects are used in creating the job steps. We should revisit this design so that we don't have to use this 'hacky' leak fix.

            tntnatbry Gagan Goel (Inactive) added a comment - Server 10.4 (community) commit: 205b0ce6ad21dbafe8def505307b4922398db5b2 Engine develop-1.4 commit: 8f92f24b02f01822214f0f140a7ad0cd774bd392 Attached are the Exemgr logs under valgrind on these commits (plus the patch from https://github.com/mariadb-corporation/mariadb-columnstore-engine/pull/1297): exemgr_valgrind.log . Same tests under working_tpch1 and working_tpch1_compareLogOnly were run for this log. memleaks due to MODA functions are now gone, after David's fix. memleaks in ordering::OrderByData::OrderByData() are now gone. Mentions of 'definitely lost' bytes for joblist::simpleScalarFilterToParseTree() are down from 14 to 1. Not sure how that single leak is still there. David.Hall has expressed a legitimate concern on the memleak fix in https://github.com/mariadb-corporation/mariadb-columnstore-engine/pull/1297 where we are storing pointers to parsetree objects for later deletion. This is because pointers to these parsetree objects are used in creating the job steps. We should revisit this design so that we don't have to use this 'hacky' leak fix.

            From the PR, queries to reproduce the problem and verify the sol'n:

            {{drop table if exists cs1;
            create table cs1 (a int)engine=columnstore;
            insert into cs1 values (123);
            select * from cs1 where a=(select a from cs1);
            select * from cs1 having a=(select a from cs1);
            select * from cs1 where a=(select a from cs1) and a!=(select a from cs1);
            select * from cs1 where a=(select a from cs1) or a!=(select a from cs1);
            select * from (select * from cs1 where a=(select a from cs1)) aa;
            select * from cs1 aa where exists (select * from cs1 bb where bb.a=(select a from cs1));
            select * from cs1 aa where exists (select * from cs1 bb where aa.a=bb.a and bb.a=(select a from cs1));}}

            pleblanc Patrick LeBlanc (Inactive) added a comment - From the PR, queries to reproduce the problem and verify the sol'n: {{drop table if exists cs1; create table cs1 (a int)engine=columnstore; insert into cs1 values (123); select * from cs1 where a=(select a from cs1); select * from cs1 having a=(select a from cs1); select * from cs1 where a=(select a from cs1) and a!=(select a from cs1); select * from cs1 where a=(select a from cs1) or a!=(select a from cs1); select * from (select * from cs1 where a=(select a from cs1)) aa; select * from cs1 aa where exists (select * from cs1 bb where bb.a=(select a from cs1)); select * from cs1 aa where exists (select * from cs1 bb where aa.a=bb.a and bb.a=(select a from cs1));}}

            I merged what's there so far, assigning back to Gagan for what's left.

            pleblanc Patrick LeBlanc (Inactive) added a comment - I merged what's there so far, assigning back to Gagan for what's left.

            This merge corrects some but not all issues with memory. We will close this, and MCOL-4150 will continue on.

            gdorman Gregory Dorman (Inactive) added a comment - This merge corrects some but not all issues with memory. We will close this, and MCOL-4150 will continue on.

            Additional queries to reproduce the leaks in ordering::OrderByData::OrderByData() and verify the solution:

            create table cs1 (a int, b bigint)engine=columnstore;
            insert into cs1 values (123, 1234000);
            select rank() over (partition by a,b order by a,b desc) as ranking from cs1;
            

            tntnatbry Gagan Goel (Inactive) added a comment - Additional queries to reproduce the leaks in ordering::OrderByData::OrderByData() and verify the solution: create table cs1 (a int , b bigint )engine=columnstore; insert into cs1 values (123, 1234000); select rank() over (partition by a,b order by a,b desc ) as ranking from cs1;

            People

              tntnatbry Gagan Goel (Inactive)
              pleblanc Patrick LeBlanc (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.