[MCOL-1750] Thread stack memory leak in ThreadPool Created: 2018-09-27  Updated: 2020-08-25  Resolved: 2019-01-02

Status: Closed
Project: MariaDB ColumnStore
Component/s: ExeMgr
Affects Version/s: 1.1.5, 1.1.6
Fix Version/s: 1.1.7, 1.2.0

Type: Bug Priority: Critical
Reporter: Andrew Hutchings (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 2
Labels: None

Attachments: File test.sh    
Issue Links:
Duplicate
is duplicated by MCOL-1507 ExeMgr over using memory causing swap... Closed
PartOf
Sprint: 2018-18, 2018-19, 2018-20, 2018-21

 Description   

When ThreadPool self-prunes after a thread is idle for 10 minutes that thread is ended but not joined. This leaves a large amount of stack behind (8MB per thread in my machine). This turns into a large memory leak very quickly, especially hitting the limits of VIRT allocation.

How to reproduce:

1. Run the attach script (requires tpch1 lineitem table)
2. Get the pid of ExeMgr (25001 in this example)
3. Get the output of ulimit -s (8192 in this example)
4. Run the following replacing the values above where appropriate:

pmap 25001 | grep 8192 | wc -l

5. Run the following replacing the values above where appropriate:

ls /proc/25001/task | wc -l

6. You should note the two numbers are very similar. Only 1 or two apart
7. Wait > 10 minutes
8. Do steps 4 and 5 again. You should notice the two numbers are now very different. This indicates a thread stack memory leak.



 Comments   
Comment by Andrew Hutchings (Inactive) [ 2018-09-27 ]

Patch broke cpimport. Need to fix

Comment by Andrew Hutchings (Inactive) [ 2018-09-28 ]

Switched to the purge thread method in ThreadPool because we need to have the main thread stay until all the child threads have finished due to the use conditional variables and mutexes. Without using joinable threads cpimport would crash at the end of every execution when the main thread exits before the child threads.

Comment by Daniel Lee (Inactive) [ 2019-01-02 ]

Build verified: 1.1.7-1, 2.2.1-1

1.1.7-1
Git Commits
server=8220579
engine=88da1b6
Regression passed

Generated at Thu Feb 08 02:31:06 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.