[MCOL-2262] Design efficient methods for interaction b/w MDB and engines with parallel query execution - Jira

XML

Word

Printable

Details

Type: New Feature
Status: Closed (View Workflow)
Priority: Major
Resolution: Won't Do
Affects Version/s: None
Fix Version/s: 23.10.0
Component/s: Columnstore Select Handler
Labels:
None

Epic Link:
ColumnStore Performance Improvements

Description

The problem:
In the current state MDB interact inefficiently with engines that:

uses parallel execution internally
could potentially return a huge amount of records

Firstly current behavior could add a significant amount of time to the query execution, e.g. it takes 6 seconds before CS returns 15,000,000 records of the trivial INNER JOIN to the MDB's plugin however it takes approximately 50 seconds before MDB starts to return the actual result set to the client.
Secondly the parallel execution engine itself could process the queries that reference different engines together with parallel execution engine tables faster then MDB. But now there is no way how to pushdown query's parameters to this parallel execution engine (intermediate results from other engine both from subqueries or JOIN expressions), e.g. CS now uses a backside mariadb client connection to get the intermediate result that potentially adds a significant amount of time to this query execution.

The goals in descending order of importance:
There must be way to retrieve millions or billions of records from the engine in efficient way to send them back to the client
There must be a way to pushdown query's parameters(intermediate results from other engine both from subqueries or JOIN expressions)

The ways:
There is a way to pushdown a range of values used in filtering using MRR technic. MRR is single threaded though. There could be a mechanism to save intermediate results as in memory temporary table or even assign a separate thread to project the intermediate result and then provide parallel execution engine with access methods. There could be a number of tables/threads to pushdown the intermediate results in a multhi-threaded way. The tables could be potentially spilt on disk to deal with a significant amount of records in the intermediate result.
The additional research must be done to outline the way to cope with problem 2. The result of the research will be here.

The milestones:
to be done

Attachments

Issue Links

is part of

MCOL-1097 MariaDB ColumnStore Generic Engine phase 2

Closed

relates to

MDEV-6096 Ideas about parallel query execution

Open

MDEV-21291 Support Parallel Query Execution

Closed

Activity

People

Assignee:: Roman

Reporter:: Roman

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2019-03-28 10:15

Updated:: 2024-10-03 15:52

Resolved:: 2023-10-25 12:57

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.