Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-5588

Make GROUP BY and joins parallel on cluster

    XMLWordPrintable

Details

    • New Feature
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • None
    • 23.10
    • None
    • 2023-11, 2023-12, 2024-1

    Description

      It is possible to use SUMMA-like algorithms for joins and GROUP BY functionality.

      https://cseweb.ucsd.edu/classes/sp11/cse262-a/Lectures/262-pres1-hal.pdf - a neat description of what SUMMA does. Basically, each node holds part of source matrix, computes part of resulting matrix, accumulates results that belong to node and distributes parts of another source matrix and partial computation results to other nodes.

      As we do not have neat mesh definition, we can use ring or ring with skips. We also do not need to perform partial result broadcast (except for ROLLUPs).

      Main node can sent different parts of data into different worker nodes for these nodes to compute what data belongs to them and what data need to be sent further. Worker node can also compute hashes of rows so that otherr nodes not need to.

      So, this is main overview of the algorithm. I will expand on actual details in comments.

      Attachments

        Activity

          People

            sergey.zefirov Sergey Zefirov
            sergey.zefirov Sergey Zefirov
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.