[MCOL-5187] OOM happening when querying large datasets and using distinct Created: 2022-08-10  Updated: 2023-12-15

Status: Stalled
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: 22.08.1, 6.4.6
Fix Version/s: 23.10

Type: Bug Priority: Critical
Reporter: Kugimiya Assignee: Roman
Resolution: Unresolved Votes: 1
Labels: SQL, query, triage
Environment:

ububts 20.04 LTS


Issue Links:
PartOf
is part of MCOL-5250 Disk-based DISTINCT Stalled
Problem/Incident
causes MCOL-5431 Remove dataset sorting from WFS. Open
Relates
relates to MCOL-4626 Columnstore cluster becomes non opera... Stalled
relates to MCOL-5430 Window Functions in projection with G... Open

 Description   

I set TotalUmMemory to 25%, but ExeMgr use far beyond usage memory.
Finally ExeMgr was killed by oom-killer.
I tried two sql on large datasets.

1. select distinct columnA, columnB, columnC from table.
2. select columnA, columnB, columnC from table group by columnA, columnB, columnC.

1. is OOM,
but 2. is MCS-2003: Aggregation/Distinct memory limit is exceeded.

I think 1. and 2. are similar query.
I think that 1. should return same message (MCS-2003: Aggregation/Distinct memory limit is exceeded.)



 Comments   
Comment by Roman [ 2022-08-19 ]

Thx for your suggestion. We will look into implementing it.
JFYI SELECT DISTINCT is not a simple rename for GROUP BY and it is completely different comparing to GROUP BY processing-wise. I suggest you to prefer GROUP BY than DISTINCT.
Did you try to enable disk-based aggregation to manage with the error message?

Comment by Kugimiya [ 2022-08-21 ]

Thanks for your answer.

Yes, I did.
I already done disk-based aggregation. When I use disk-based aggregation, Both sqls can be processed.

DISTINCT problem is using memory without getting an error.
I set 25% to TotalUmMemory, but ExeMgr use far beyond usage memory.
DISTINCT implementation ignores TotalUmMemory.

I think that DISTINCT implementation should check TotalUmMemory.
What do you think?

Comment by Roman [ 2022-08-22 ]

It surely must obey the limitations of TotalUmMemory. I will look into the difference b/w distinct and group by in terms of memory consumption limitation.

Comment by JiraAutomate [ 2023-12-15 ]

Automated message:
----------------------------
Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

Generated at Thu Feb 08 02:56:00 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.