[MCOL-5588] Make GROUP BY and joins parallel on cluster - Jira

XML

Word

Printable

Details

Type: New Feature
Status: Open (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: 23.10
Component/s: None
Labels:
- Performance
- rm_perf

Sprint:
2023-11, 2023-12, 2024-1, 2024-2

Description

It is possible to use SUMMA-like algorithms for joins and GROUP BY functionality.

https://cseweb.ucsd.edu/classes/sp11/cse262-a/Lectures/262-pres1-hal.pdf - a neat description of what SUMMA does. Basically, each node holds part of source matrix, computes part of resulting matrix, accumulates results that belong to node and distributes parts of another source matrix and partial computation results to other nodes.

As we do not have neat mesh definition, we can use ring or ring with skips. We also do not need to perform partial result broadcast (except for ROLLUPs).

Main node can sent different parts of data into different worker nodes for these nodes to compute what data belongs to them and what data need to be sent further. Worker node can also compute hashes of rows so that otherr nodes not need to.

So, this is main overview of the algorithm. I will expand on actual details in comments.

Attachments

Activity

People

Assignee:: Sergey Zefirov

Reporter:: Sergey Zefirov

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 2023-10-10 13:16

Updated:: 2024-08-02 13:27

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.