In the current state MDB interact inefficiently with engines that:
- uses parallel execution internally
- could potentially return a huge amount of records
Firstly current behavior could add a significant amount of time to the query execution, e.g. it takes 6 seconds before CS returns 15,000,000 records of the trivial INNER JOIN to the MDB's plugin however it takes approximately 50 seconds before MDB starts to return the actual result set to the client.
Secondly the parallel execution engine itself could process the queries that reference different engines together with parallel execution engine tables faster then MDB. But now there is no way how to pushdown query's parameters to this parallel execution engine (intermediate results from other engine both from subqueries or JOIN expressions), e.g. CS now uses a backside mariadb client connection to get the intermediate result that potentially adds a significant amount of time to this query execution.
The goals in descending order of importance:
There must be way to retrieve millions or billions of records from the engine in efficient way to send them back to the client
There must be a way to pushdown query's parameters(intermediate results from other engine both from subqueries or JOIN expressions)
There is a way to pushdown a range of values used in filtering using MRR technic. MRR is single threaded though. There could be a mechanism to save intermediate results as in memory temporary table or even assign a separate thread to project the intermediate result and then provide parallel execution engine with access methods. There could be a number of tables/threads to pushdown the intermediate results in a multhi-threaded way. The tables could be potentially spilt on disk to deal with a significant amount of records in the intermediate result.
The additional research must be done to outline the way to cope with problem 2. The result of the research will be here.
to be done