The last time I tested spider, even though a table was broken into many pieces, the query execution through the various spider tables was still serial. One table on one server, followed by another table on another server, then finally to combine all the results. So, though there was a claim of parallelization, the queries did not actually occur in parallel, they simply leveraged different server for different pieces.
I prefer the map reduce approach as it is the most simple to decompose the queries into parts and to take action on them. Say we have a queries like the two below (partitioned and union example)
SELECT cluster, COUNT
AS jobs, MAX(cpuTime) AS maxCPU, SUM(memory) AS memory, AVG(memory) AS avgMemory
FROM partitionted_table
WHERE last_updated BETWEEN ? AND ? — partition by range timestamp. Yes yes, just wishful thinking 
GROUP BY cluster
or
SELECT cluster, SUM(jobs) AS jobs, MAX(maxCPU) AS maxCPU, SUM(memory) AS memory, SUM(total_memory)/SUM(jobs) AS avgMemory
FROM (
SELECT cluster, COUNT
AS jobs, MAX(cpuTime) AS maxCPU, SUM(memory) AS memory, SUM(memory) AS total_memory
FROM unpartitioned_tableA
WHERE last_updated BETWEEN ? AND ?
GROUP BY cluster
UNION ALL
SELECT cluster, COUNT
AS jobs, MAX(cpuTime) AS maxCPU, SUM(memory) AS memory, SUM(memory) AS total_memory
FROM unpartitioned_tableB
WHERE last_updated BETWEEN ? AND ?
GROUP BY cluster
) AS rs
GROUP BY cluster;
You sould notice how I achieved the same outcome in the UNION case by simply abstracting the AVG() into it's parts in the union sub-queries. This same abstraction can be done with other consolidation functions and also works if you build a UNION using the PARTITION keyword on the resulting UNION queries. Consolidation functions like SUM(), MIN(), MAX() and AVG() are easy. Others not so easy.
Each of these cases can be run in a map reduce fashion:
For the partitioned case:
1) Determine the partitions affected by the range query (on a timestamp, again just wishful thinking)
2) Use the appropriate abstraction functions to change the form of the query using whatever logic is available to abstract
3) Launch X parallel queries and store the results in an internal temporary table by PARTITION.
4) Reduce the set using again the abstracted functions from the temporary table.
5) Present the results to the user
For the union case
1) Determine any abstractions that are not already defined as in my case above to allow the unions to be independent of one another
2) Move the WHERE and GROUP BY clauses under each specific UNION
3) Launch X parallel queries and store the results in an internal temporary table by UNION
4) Reduce the set using again the abstracted function from the temporary table.
5) Present the results to the user
In each of these cases, you can gain an X times speedup by simply using the consolidation function abstractions from a library of common abstractions and launching X threads. It's almost linear acceleration (not the kind I learned back in my Electrical Engineering days).
In a single server case, you can simply have this map reduce algorithm buried into the query logic to search it for opportunities to parallelize, and then control the number of threads by configuration.
In a MaxScale case, you could do this before the queries are sent to the various servers and handle the parallelization right in MaxScale. Again, the number of threads could be handled by configuration.
I was planning on writing this in Cacti for certain table cases like for log searches or network flow analysis since it did not exist in either MySQL or MariaDB natively, but it keeps getting pushed down my priority list as there is so much more to do in Cacti. I continue to have hopes that this can be done internally by my preferred tool... MariaDB 
Be well.
Larry
All tasks which include paralell execution menan ability to clone structures or make them reenterant.
So it shoud include as prerequisite reentrant Items (MDEV-6897) or non-existing mdev about cloning any structure (at least items).
Cloning is simplier especially if cheat and make it via parser (and we already have bugs where it is needed) but reentrant Items is way more perspective thing (can solve more problems, especially for new arhitectures as ARM).