[MDEV-6096] Ideas about parallel query execution - Jira

XML

Word

Printable

Details

Type: Task
Status: Open (View Workflow)
Priority: Major
Resolution: Unresolved
Fix Version/s: None
Component/s: None
Labels:
- optimizer

Description

Some ideas about using multiple threads to run a query.

== Position at N% of table/index ==
Consider queries

select sum(a) from tbl group by non_key_col

select sum(a) from tbl where key between C1 and C2 group by non_key_col

If we want to run these with N threads, we need to give 1/Nth of table to each thread. (An alternative is to run one "reader" thread and distribute work to multiple compute threads. The problem with this is that reading from the table won't be parallel. This will put a cap on the performance.)

In order to do that, we will need storage engine calls that do

"position at N% in the table"
"position at N% in the index range between [C1 and C2]".

these calls would also let us build equi-height histograms based on sampling.

== General execution ==
There are many works about converting SQL into MapReduce jobs. Are they relevant to this task? The difference seems to be in the Map phase - we assume that source data is equi-distant to all worker threads.

== Evaluation ==
It would be nice to assess how much speedup we will get. In order to get an idea, we could break the query apart and run the parts manually. The merge step could also be done manually in some cases (by writing to, and reading from temporary tables).

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

SpderBigAsk.png
302 kB
2022-05-10 19:13

Issue Links

duplicates

MDEV-18368 MySQL already can do parallel queries, when MariaDB

Closed

MDEV-21291 Support Parallel Query Execution

Closed

relates to

MCOL-2262 Design efficient methods for interaction b/w MDB and engines with parallel query execution

Closed

MDEV-18705 Parallel index range scan

Open

MDEV-26157 Prototype OpenMP in addressing parallel queries and other operations in code

Open

MDEV-27717 Parallel execution on partitions in scans where multiple partitions are needed

Open

MDEV-5004 Support parallel read transactions on the same snapshot

Open

MDEV-33446 optimizer is wrong

Open

links to

postgresql parallel query

(3 relates to, 1 links to)

Activity

People

Assignee:: Unassigned

Reporter:: Sergei Petrunia

Votes:: 12 Vote for this issue

Watchers:: 18 Start watching this issue

Dates

Created:: 2014-04-14 23:31

Updated:: 2024-05-08 00:46

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.