Details
-
New Feature
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
This is a followup to MCOL-5627.
MCOL-5627 fix introduces a performance degradation in disk joins. This is due to less precise splitting of data because of memory constraints. We keep more of "small side" and more of corresponding "large side" data in single file and this makes hash join to behave more like nested loop join.
MCOL-5627 contains a hint on how to solve this problem. We can prepend RGData with a vector of uin32_t hashes for each row in the RowGroup and use these to filter out large side RGData's that are not needed for currently processed small side.
Attachments
Issue Links
- relates to
-
MCOL-5627 Memory oversubscription issues in 23.10.
-
- Closed
-