[MCOL-5285] SkySQL OOM Crash? Memory not being released? testing Created: 2022-10-31 Updated: 2023-11-17 Resolved: 2023-03-07 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | PrimProc |
| Affects Version/s: | 6.3.1 |
| Fix Version/s: | 23.02.2 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Allen Herrera | Assignee: | Leonid Fedorov |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
SkySQL AWS 32x 128 single node |
||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Description |
|
Currently theres a customer whose memory appears not to be released in skysql. The current work around is that RDBA/ SRE has to manually mcsShutdown and mcsStart every couple days. However the customer often has to file a ticket saying its crashed and to restart before the scheduled stop/start to clear memory. Link to Logs & Configs in comment below |
| Comments |
| Comment by alexey vorovich (Inactive) [ 2022-11-18 ] | ||||||||
|
Yeah.. The query log shows a very rich set of OLTP + queries. I don't think it is possible to try each one inhouse and see the leak. We also know that exemgr was refactored and merged into primproc in 22.08.x . It is unknown if this change fixes the problem. | ||||||||
| Comment by Leonid Fedorov [ 2022-12-28 ] | ||||||||
|
I created the profiling allocator shared object
it's attached to the issue, can be downloaded here: jemalloc | ||||||||
| Comment by Leonid Fedorov [ 2022-12-28 ] | ||||||||
|
this profiling allocator should be installed on one node with this steps.
then edit
and replace line
with
reload systemctl config
and restart primproc service
After some payload there should be generated /heap_profile/*.profile files. with heap usage information. We want them for inspection | ||||||||
| Comment by alexey vorovich (Inactive) [ 2022-12-28 ] | ||||||||
|
alan.mologorsky lets convert the instructions above from leonid.fedorov to one applicable to an existing docker container which DOES not have systemd. Rough outline , that I am asking you to expand and try it
everyone understands that this is non-persistent setup and will not survive pod restart. This is just the frst step leonid.fedorov pls edit your instructions to note the location of shred object. maybe create jmalloc_test folder on https://cspkg.s3.amazonaws.com/ | ||||||||
| Comment by Leonid Fedorov [ 2023-02-10 ] | ||||||||
|
|