[MCOL-5643] Make disk joins faster and more memory efficient Created: 2024-01-22  Updated: 2024-01-24

Status: Open
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Sergey Zefirov Assignee: Sergey Zefirov
Resolution: Unresolved Votes: 0
Labels: None


 Description   

This is a followup to MCOL-5627.

MCOL-5627 fix introduces a performance degradation in disk joins. This is due to less precise splitting of data because of memory constraints. We keep more of "small side" and more of corresponding "large side" data in single file and this makes hash join to behave more like nested loop join.

MCOL-5627 contains a hint on how to solve this problem. We can prepend RGData with a vector of uin32_t hashes for each row in the RowGroup and use these to filter out large side RGData's that are not needed for currently processed small side.


Generated at Thu Feb 08 02:59:23 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.