[MDEV-6096] Research - Ideas about parallel query execution - Jira

Details

Type: Task
Status: Open (View Workflow)
Priority: Major
Resolution: Unresolved
Fix Version/s: 13.0
Component/s: Server
Labels:
- OPTIMIZER26
- optimizer

Description

Developing parallel query execution in MariaDB would enable the engine to exploit modern multi-core hardware by breaking down large or complex queries into smaller sub-tasks that can be processed concurrently.

This approach promises to:

Improve Performance and Scalability: By executing aggregates, joins, and sorts across multiple threads or nodes, query response times can scale linearly with available cores—vital for handling ever-growing data volumes.

Enhance Resource Utilization: Parallel execution minimizes CPU idle time and better leverages disk and memory bandwidth, ensuring higher throughput on mixed OLTP/analytical workloads.

Will use more resources/query and increase cost when run on cloud (Overhead of running multiple threads. Much more memory usage. Threads accessing same resource/pages increases lock conflicts etc. When running on hard disk where data does not fit into memory, the impact can be very high, in many cases making the query slower when using parallel query.

Strengthen Competitiveness: As competitors like Oracle and PostgreSQL already provide parallel processing, integrating parallel query capabilities into the core server will be crucial for MariaDB to remain competitive and match the performance of other OLTP databases.

Mostly usable for analytical queries when there are very few users accessing MariaDB.

This ticket is to research how parallel query can be approached, and the acceptance criteria is creation of the necessary stories to being the feature into the server.

Previous ideas -
Some ideas about using multiple threads to run a query.

== Position at N% of table/index ==
Consider queries

select sum(a) from tbl group by non_key_col

More details about implementation (By Monty):

Prerequisites for using parallel query:

Pure SELECT queries
All storage engines has to support parallel query (we need a shared read view)
No usage of variable assignment (@a:= value)
Original query is expected to take more than 'Parallel_query_min_cost'*1000
min cost. (1000 == 1 second ?)
New plan should not be slower (in user time) than original plan.
Note that processing cost for parallel query can be much higher.
There is one table for which we will do a large table or range scan
There are less than 'parallel_max_running_threads' threads processing
queries. This includes normal users queries and worker threads.
Value between 2-'number of cores'*4
Variable 'parallel_worker_threads' will define how many worker threads
will be used to execute the query. Value between 2-'number of cores'*4

Requirements:

Parallel query should work on queries with one more more tables.
In case of group by, each worker will execute the group by for
it's own row set.

Workflow

Optimize query normally
Check prerequisites; if not fulfilled, execute normally.
Allocate 'MIN(parallel_max_running_threads - current_running_threads,
parallel_worker_threads). If <= 1, execute normally.
Tables before chosen one will be executed normally
If chosen table is the first non const table
Split range into # parts, give each worker thread one part to process
If chosen table is not first
Tables before chosen one will be executed normally
Rows will be sent to a sink, who will distribute rows to # worker threads
Final merge: For both cases above
If the execution uses a temporary table for result, sort rows of
all tables in one go (if needed) and output results to user.
If not, store (or sum up in case of group by) the final row combination
from each thread into a global temporary table, sort the table
(if needed) and output result

Worker details:

Each worker will have it's own thread and THD
All used tables will be cloned. We have to ensure that all tables has
same row/transaction visibility as original thread.
Each worker will gets it's own clone/version of:
JOIN
JOIN_TAB
Items (expressions).
First version will use clone (already works)
Future versions will change items to be re-entrant and no clone is
needed for items.
Temporary tables
Execution up to final merge will work normally (no code changes for this
part).
Each worker should be visible in process list.
Kill of one worker or main thread will kill the whole parallel query

Storage engine and related interfaces:

Need a way to partition a table scan and range scan into # parts
Need a way to share transaction context (changes needed in all engines).
This should be a very small, easy to do extension

Some optimizations that probably has to be disabled:

Sorting first table as part of ORDER BY
Optionally sort first table and use sink to distribute rows.
In this case the final 'sort' can be a merge sort as each worker thread
will produce rows in the correct.

Extension for filesort;

Need to add support to sort multiple tables in one go.
Sorting with merge sort is easy. Retrieval of original row from correct
temporary table for output can be done by having 'worker number (1-255)
last in each sort key.

Other things:

Cache for worker threads and worker THD's. This cache will have
Parallel_max_running_threads entries.
Should have a developer starting testing old agreed to solution for
re-entrant items.

select sum(a) from tbl where key between C1 and C2 group by non_key_col

If we want to run these with N threads, we need to give 1/Nth of table to each thread. (An alternative is to run one "reader" thread and distribute work to multiple compute threads. The problem with this is that reading from the table won't be parallel. This will put a cap on the performance.)

In order to do that, we will need storage engine calls that do

"position at N% in the table"
"position at N% in the index range between [C1 and C2]".

these calls would also let us build equi-height histograms based on sampling.

== General execution ==
There are many works about converting SQL into MapReduce jobs. Are they relevant to this task? The difference seems to be in the Map phase - we assume that source data is equi-distant to all worker threads.

== Evaluation ==
It would be nice to assess how much speedup we will get. In order to get an idea, we could break the query apart and run the parts manually. The merge step could also be done manually in some cases (by writing to, and reading from temporary tables). In theory we could get a maximum possible performance speedup proportional to the number of threads. We need performance tests to find the overhead. Best guess (Monty) is 70% of the maximum.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

SpderBigAsk.png
2022-05-10 19:13
302 kB
Larry Adams

Issue Links

duplicates

MDEV-18368 MySQL already can do parallel queries, when MariaDB

Closed

MDEV-21291 Support Parallel Query Execution

Closed

relates to

MCOL-2262 Design efficient methods for interaction b/w MDB and engines with parallel query execution

Closed

MDEV-18705 Parallel index range scan

Open

MDEV-26157 Prototype OpenMP in addressing parallel queries and other operations in code

Open

MDEV-27717 Parallel execution on partitions in scans where multiple partitions are needed

Open

MDEV-5004 Support parallel read transactions on the same snapshot

Open

MDEV-33446 optimizer is wrong

Open

links to

postgresql parallel query

(3 relates to, 1 links to)

Research - Ideas about parallel query execution

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration