Details
-
Task
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
Description
Developing parallel query execution in MariaDB would enable the engine to exploit modern multi-core hardware by breaking down large or complex queries into smaller sub-tasks that can be processed concurrently.
This approach promises to:
Improve Performance and Scalability: By executing aggregates, joins, and sorts across multiple threads or nodes, query response times can scale linearly with available cores—vital for handling ever-growing data volumes.
Enhance Resource Utilization: Parallel execution minimizes CPU idle time and better leverages disk and memory bandwidth, ensuring higher throughput on mixed OLTP/analytical workloads.
Strengthen Competitiveness: As competitors like Oracle and PostgreSQL already provide parallel processing, integrating parallel query capabilities into the core server will be crucial for MariaDB to remain competitive and match the performance of other OLTP databases.
This ticket is to research how parallel query can be approached, and the acceptance criteria is creation of the necessary stories to being the feature into the server.
Previous ideas -
Some ideas about using multiple threads to run a query.
== Position at N% of table/index ==
Consider queries
select sum(a) from tbl group by non_key_col
|
select sum(a) from tbl where key between C1 and C2 group by non_key_col
|
If we want to run these with N threads, we need to give 1/Nth of table to each thread. (An alternative is to run one "reader" thread and distribute work to multiple compute threads. The problem with this is that reading from the table won't be parallel. This will put a cap on the performance.)
In order to do that, we will need storage engine calls that do
- "position at N% in the table"
- "position at N% in the index range between [C1 and C2]".
these calls would also let us build equi-height histograms based on sampling.
== General execution ==
There are many works about converting SQL into MapReduce jobs. Are they relevant to this task? The difference seems to be in the Map phase - we assume that source data is equi-distant to all worker threads.
== Evaluation ==
It would be nice to assess how much speedup we will get. In order to get an idea, we could break the query apart and run the parts manually. The merge step could also be done manually in some cases (by writing to, and reading from temporary tables).
Attachments
Issue Links
- duplicates
-
MDEV-18368 MySQL already can do parallel queries, when MariaDB
-
- Closed
-
-
MDEV-21291 Support Parallel Query Execution
-
- Closed
-
- relates to
-
MCOL-2262 Design efficient methods for interaction b/w MDB and engines with parallel query execution
-
- Closed
-
-
MDEV-18705 Parallel index range scan
-
- Open
-
-
MDEV-26157 Prototype OpenMP in addressing parallel queries and other operations in code
-
- Open
-
-
MDEV-27717 Parallel execution on partitions in scans where multiple partitions are needed
-
- Open
-
-
MDEV-5004 Support parallel read transactions on the same snapshot
-
- Open
-
-
MDEV-33446 optimizer is wrong
-
- Open
-
- links to