Details
-
New Feature
-
Status: In Progress (View Workflow)
-
Major
-
Resolution: Unresolved
-
22.08.1
-
2021-17, 2022-22, 2022-23, 2023-4, 2023-5, 2023-6, 2023-7, 2023-8, 2023-10, 2023-11, 2025-6, 2025-9
Description
As of 22.08.01 MCS does DISTINCT processing TupleAnnexStep. This step leverages hashmap for the purpose. This solution is simple but it:
- lacks scalability
- can't leverage disk-based capabilities of RowStorage class used by GROUP BY
- ResourceManager that accounts RAM consumption doesn't counts the hashmap
This issue is about a new DISTINCT implementation(presumably based on RowStorage) that:
- can do external DISTINCT spilling on disk if necessary,
- ResourceManager counts the implemenation RAM consumption
- scales(this might be tricky since DISTINCT processing overlaps with ORDER BY)
sprint 2025-7:
- continue testing/ confirming edge cases
Attachments
Issue Links
- causes
-
MCOL-5804 Disk-based ORDER BY
-
- Open
-
- includes
-
MCOL-5187 OOM happening when querying large datasets and using distinct
-
- Closed
-
-
MCOL-5541 Disk-based distinct :Create a separate jobstep for handling ConstantColumn
-
- Stalled
-
- relates to
-
MCOL-6001 Regain context of disk based distinct
-
- Closed
-
-
MCOL-6123 Improve performance of rewritten DISTINCT queries
-
- Open
-
- mentioned on