Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-2262

Design efficient methods for interaction b/w MDB and engines with parallel query execution

    XMLWordPrintable

    Details

    • Type: New Feature
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 23.02
    • Component/s: None
    • Labels:
      None

      Description

      The problem:
      In the current state MDB interact inefficiently with engines that:

      • uses parallel execution internally
      • could potentially return a huge amount of records

      Firstly current behavior could add a significant amount of time to the query execution, e.g. it takes 6 seconds before CS returns 15,000,000 records of the trivial INNER JOIN to the MDB's plugin however it takes approximately 50 seconds before MDB starts to return the actual result set to the client.
      Secondly the parallel execution engine itself could process the queries that reference different engines together with parallel execution engine tables faster then MDB. But now there is no way how to pushdown query's parameters to this parallel execution engine (intermediate results from other engine both from subqueries or JOIN expressions), e.g. CS now uses a backside mariadb client connection to get the intermediate result that potentially adds a significant amount of time to this query execution.

      The goals in descending order of importance:
      There must be way to retrieve millions or billions of records from the engine in efficient way to send them back to the client
      There must be a way to pushdown query's parameters(intermediate results from other engine both from subqueries or JOIN expressions)

      The ways:
      There is a way to pushdown a range of values used in filtering using MRR technic. MRR is single threaded though. There could be a mechanism to save intermediate results as in memory temporary table or even assign a separate thread to project the intermediate result and then provide parallel execution engine with access methods. There could be a number of tables/threads to pushdown the intermediate results in a multhi-threaded way. The tables could be potentially spilt on disk to deal with a significant amount of records in the intermediate result.
      The additional research must be done to outline the way to cope with problem 2. The result of the research will be here.

      The milestones:
      to be done

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              drrtuy Roman
              Reporter:
              drrtuy Roman
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:

                  Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.