This issue covers the feature MCS lacks to pass query #2 from TPC-H test suit.
Here are some preliminary info.
Join in MCS can be run in two modes:
- centralized at UM
- distributed at PP
The next paragraph describes distributed processing at PP. The current Join processing model has two types of Join sides: small-side table and large-side table(it doesn't correlate with the cardinality of the tables). Small side tables are broadcasted to all PP to build a hashmap/-s and large side is processed row by row using small side hash maps.
Consider the query
This query's last WHERE expression filters on the maximum dim1.c3 value for the equi-JOIN condition f.c1 = dim1.c1 AND dim1.c2 = dim2.c2. The semantics of the the query is a equi-JOIN result where every record has dim1.c3 is the maximum for every combination f1.c1 = dim1.c1.
Let's consider more generic variant where max(dim1.c3) can be an expression of a single aggregate function of one of JOINED tables, e.g. min(dim1.c3)+1 or (if avg(dim1.c3) == 5 then X ELSE Y). Let's say conditions of a Correlated SubQuery are fully correlated thus they are the same as at one level above this CSQ, namely WHERE f.c1 = dim1.c1 AND dim1.c2 = dim2.c2. To find a correlated maximum of dim1.c3 for a given f.c1 = dim1.c1 MCS needs to have a GROUP BY operator results. Let's consider GROUP BY properties in the context of MCS code. GROUP BY works with dim1 contents that satisfies simple filters of the query(there is no such in this example). It is important to note that PP will have a full dim1 available at this point b/c all small side tables are broadcasted to all PP. The key column for this GROUP BY is dim1.c1 and the result of this aggregation is max(dim1.c3). In the context of MCS needs to construct UMRowAggregation for input RowGroup of dim1.c1 and dim1.c3. RowAggregation output RowGroup has a single column for max(dim1.c3). This RowAggregation can be constructed on the fly in parallel with PP constructing the hashmap for the top-level JOIN.
If PP has such UMRowAggregation instance available it will be able to look into UMRowAggregation using f1.c1 as a GROUP BY key value traversing large side table and applying an expression on top of UMRowAggregation result.
The comments will describe state of art implementations from the open source databases.