Details
-
Task
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
Description
Developing parallel query execution in MariaDB would enable the engine to exploit modern multi-core hardware by breaking down large or complex queries into smaller sub-tasks that can be processed concurrently.
This approach promises to:
Improve Performance and Scalability: By executing aggregates, joins, and sorts across multiple threads or nodes, query response times can scale linearly with available cores—vital for handling ever-growing data volumes.
Enhance Resource Utilization: Parallel execution minimizes CPU idle time and better leverages disk and memory bandwidth, ensuring higher throughput on mixed OLTP/analytical workloads.
- Will use more resources/query and increase cost when run on cloud (Overhead of running multiple threads. Much more memory usage. Threads accessing same resource/pages increases lock conflicts etc. When running on hard disk where data does not fit into memory, the impact can be very high, in many cases making the query slower when using parallel query.
Strengthen Competitiveness: As competitors like Oracle and PostgreSQL already provide parallel processing, integrating parallel query capabilities into the core server will be crucial for MariaDB to remain competitive and match the performance of other OLTP databases.
- Mostly usable for analytical queries when there are very few users accessing MariaDB.
This ticket is to research how parallel query can be approached, and the acceptance criteria is creation of the necessary stories to being the feature into the server.
Previous ideas -
Some ideas about using multiple threads to run a query.
== Position at N% of table/index ==
Consider queries
select sum(a) from tbl group by non_key_col
|
More details about implementation (By Monty):
Prerequisites for using parallel query:
- Pure SELECT queries
- All storage engines has to support parallel query (we need a shared read view)
- No usage of variable assignment (@a:= value)
- Original query is expected to take more than 'Parallel_query_min_cost'*1000
min cost. (1000 == 1 second ?) - New plan should not be slower (in user time) than original plan.
- Note that processing cost for parallel query can be much higher.
- There is one table for which we will do a large table or range scan
- There are less than 'parallel_max_running_threads' threads processing
queries. This includes normal users queries and worker threads.
Value between 2-'number of cores'*4 - Variable 'parallel_worker_threads' will define how many worker threads
will be used to execute the query. Value between 2-'number of cores'*4
Requirements:
- Parallel query should work on queries with one more more tables.
- In case of group by, each worker will execute the group by for
it's own row set.
Workflow
- Optimize query normally
- Check prerequisites; if not fulfilled, execute normally.
- Allocate 'MIN(parallel_max_running_threads - current_running_threads,
parallel_worker_threads). If <= 1, execute normally. - Tables before chosen one will be executed normally
- If chosen table is the first non const table
- Split range into # parts, give each worker thread one part to process
- If chosen table is not first
- Tables before chosen one will be executed normally
- Rows will be sent to a sink, who will distribute rows to # worker threads
- Final merge: For both cases above
- If the execution uses a temporary table for result, sort rows of
all tables in one go (if needed) and output results to user. - If not, store (or sum up in case of group by) the final row combination
from each thread into a global temporary table, sort the table
(if needed) and output result
Worker details:
- Each worker will have it's own thread and THD
- All used tables will be cloned. We have to ensure that all tables has
same row/transaction visibility as original thread. - Each worker will gets it's own clone/version of:
- JOIN
- JOIN_TAB
- Items (expressions).
- First version will use clone (already works)
- Future versions will change items to be re-entrant and no clone is
needed for items. - Temporary tables
- Execution up to final merge will work normally (no code changes for this
part). - Each worker should be visible in process list.
- Kill of one worker or main thread will kill the whole parallel query
Storage engine and related interfaces:
- Need a way to partition a table scan and range scan into # parts
- Need a way to share transaction context (changes needed in all engines).
This should be a very small, easy to do extension
Some optimizations that probably has to be disabled:
- Sorting first table as part of ORDER BY
- Optionally sort first table and use sink to distribute rows.
- In this case the final 'sort' can be a merge sort as each worker thread
will produce rows in the correct.
Extension for filesort;
- Need to add support to sort multiple tables in one go.
- Sorting with merge sort is easy. Retrieval of original row from correct
temporary table for output can be done by having 'worker number (1-255)
last in each sort key.
Other things:
- Cache for worker threads and worker THD's. This cache will have
Parallel_max_running_threads entries. - Should have a developer starting testing old agreed to solution for
re-entrant items.
select sum(a) from tbl where key between C1 and C2 group by non_key_col
|
If we want to run these with N threads, we need to give 1/Nth of table to each thread. (An alternative is to run one "reader" thread and distribute work to multiple compute threads. The problem with this is that reading from the table won't be parallel. This will put a cap on the performance.)
In order to do that, we will need storage engine calls that do
- "position at N% in the table"
- "position at N% in the index range between [C1 and C2]".
these calls would also let us build equi-height histograms based on sampling.
== General execution ==
There are many works about converting SQL into MapReduce jobs. Are they relevant to this task? The difference seems to be in the Map phase - we assume that source data is equi-distant to all worker threads.
== Evaluation ==
It would be nice to assess how much speedup we will get. In order to get an idea, we could break the query apart and run the parts manually. The merge step could also be done manually in some cases (by writing to, and reading from temporary tables). In theory we could get a maximum possible performance speedup proportional to the number of threads. We need performance tests to find the overhead. Best guess (Monty) is 70% of the maximum.
Attachments
Issue Links
- duplicates
-
MDEV-18368 MySQL already can do parallel queries, when MariaDB
-
- Closed
-
-
MDEV-21291 Support Parallel Query Execution
-
- Closed
-
- relates to
-
MCOL-2262 Design efficient methods for interaction b/w MDB and engines with parallel query execution
-
- Closed
-
-
MDEV-18705 Parallel index range scan
-
- Open
-
-
MDEV-26157 Prototype OpenMP in addressing parallel queries and other operations in code
-
- Open
-
-
MDEV-27717 Parallel execution on partitions in scans where multiple partitions are needed
-
- Open
-
-
MDEV-5004 Support parallel read transactions on the same snapshot
-
- Open
-
-
MDEV-33446 optimizer is wrong
-
- Open
-
- links to