[MCOL-4043] Memory leaks Part 1. & errors -> need a valgrind pass Created: 2020-06-05 Updated: 2020-11-12 Resolved: 2020-07-07 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | ExeMgr |
| Affects Version/s: | 1.4.4, 1.5.3 |
| Fix Version/s: | 1.5.3 |
| Type: | Bug | Priority: | Major |
| Reporter: | Patrick LeBlanc (Inactive) | Assignee: | Gagan Goel (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
I ran our regression test on our 1.5 RC, and at the end of it, ExeMgr was using 45GB of virt mem, and about 2.6GB of real mem when it was idle. Gagan compared 1.4 & 1.5, found mem usage to be similar. Provoked an investigation. I attached the valgrind log from running VG on ExeMgr, on our latest 1.4.4 RC (enterprise server @ 6c7e73d7, engine @ 8ab3cc301). For this test I ran all of working_tpch1, and working_tpch1_compareLogOnly, except for MCOL_3503.sql. moda() has leaks not recorded in this log because I found that early and commented out the test that uses it the most to see how much of a difference it made in the final mem usage. At the end of that test, ExeMgr was using around 1.2GB when idle. The leaks reported in the log do not add up to anywhere close to 1.2GB. My suspicion is that all of these little bits of mem over time are causing a good deal of mem fragmentation. It could also be that a good deal of mem still has a reference, so VG doesn't consider it leaked. Need to start fixing these little things we have in the log, then reevaluate. Jun 22 update: Testing @ the columnstore-1.5.2-1 tag, I noticed test211 makes ExeMgr allocate around 1GB/min that it doesn't use. To be clear, the mem usage shows up as virtual mem usage, not resident mem usage. Resident mem usage also appears to be climbing but with a much lower slope. When a particular patch merges easily into 1.2, be sure to include it. |
| Comments |
| Comment by David Hall (Inactive) [ 2020-06-19 ] | |||
|
The DynamicParseTreeVec added to track parsetree objects that are causing leaks is a hard kludge. Perhaps the code should actually be doing a deep copy of the parsetree, instead of storing a pointer, | |||
| Comment by Gagan Goel (Inactive) [ 2020-06-19 ] | |||
|
Server 10.4 (community) commit: 205b0ce6ad21dbafe8def505307b4922398db5b2 Attached are the Exemgr logs under valgrind on these commits (plus the patch from https://github.com/mariadb-corporation/mariadb-columnstore-engine/pull/1297): exemgr_valgrind.log memleaks due to MODA functions are now gone, after David's fix. David.Hall has expressed a legitimate concern on the memleak fix in https://github.com/mariadb-corporation/mariadb-columnstore-engine/pull/1297 where we are storing pointers to parsetree objects for later deletion. This is because pointers to these parsetree objects are used in creating the job steps. We should revisit this design so that we don't have to use this 'hacky' leak fix. | |||
| Comment by Patrick LeBlanc (Inactive) [ 2020-06-24 ] | |||
|
From the PR, queries to reproduce the problem and verify the sol'n: {{drop table if exists cs1; | |||
| Comment by Patrick LeBlanc (Inactive) [ 2020-06-24 ] | |||
|
I merged what's there so far, assigning back to Gagan for what's left. | |||
| Comment by Gregory Dorman (Inactive) [ 2020-07-07 ] | |||
|
This merge corrects some but not all issues with memory. We will close this, and | |||
| Comment by Gagan Goel (Inactive) [ 2020-07-08 ] | |||
|
Additional queries to reproduce the leaks in ordering::OrderByData::OrderByData() and verify the solution:
|