Details
-
New Feature
-
Status: Stalled (View Workflow)
-
Major
-
Resolution: Unresolved
-
22.08.1
-
2021-17, 2022-22, 2022-23, 2023-4, 2023-5, 2023-6, 2023-7, 2023-8, 2023-10, 2023-11
Description
As of 22.08.01 MCS does DISTINCT processing TupleAnnexStep. This step leverages hashmap for the purpose. This solution is simple but it:
- lacks scalability
- can't leverage disk-based capabilities of RowStorage class used by GROUP BY
- ResourceManager that accounts RAM consumption doesn't counts the hashmap
This issue is about a new DISTINCT implementation(presumably based on RowStorage) that:
- can do external DISTINCT spilling on disk if necessary,
- ResourceManager counts the implemenation RAM consumption
- scales(this might be tricky since DISTINCT processing overlaps with ORDER BY)