[MCOL-5195] Correlated subquery for equi/non-equi scalar filter and join condition (TPC-H q2, q17) Created: 2022-08-16 Updated: 2023-02-14 Resolved: 2023-01-24 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | ExeMgr, PrimProc |
| Affects Version/s: | 22.08.1 |
| Fix Version/s: | 23.02.1 |
| Type: | Task | Priority: | Major |
| Reporter: | Roman | Assignee: | Denis Khalikov |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Sprint: | 10.0.20 | ||||||||||||
| Assigned for Review: | |
||||||||||||
| Assigned for Testing: | |
||||||||||||
| Description |
|
This issue covers the feature MCS lacks to pass query #2 from TPC-H test suit. Here are some preliminary info.
Consider the query
This query's last WHERE expression filters on the maximum dim1.c3 value for the equi-JOIN condition f.c1 = dim1.c1 AND dim1.c2 = dim2.c2. The semantics of the the query is a equi-JOIN result where every record has dim1.c3 is the maximum for every combination f1.c1 = dim1.c1. Let's consider more generic variant where max(dim1.c3) can be an expression of a single aggregate function of one of JOINED tables, e.g. min(dim1.c3)+1 or (if avg(dim1.c3) == 5 then X ELSE Y). Let's say conditions of a Correlated SubQuery are fully correlated thus they are the same as at one level above this CSQ, namely WHERE f.c1 = dim1.c1 AND dim1.c2 = dim2.c2. To find a correlated maximum of dim1.c3 for a given f.c1 = dim1.c1 MCS needs to have a GROUP BY operator results. Let's consider GROUP BY properties in the context of MCS code. GROUP BY works with dim1 contents that satisfies simple filters of the query(there is no such in this example). It is important to note that PP will have a full dim1 available at this point b/c all small side tables are broadcasted to all PP. The key column for this GROUP BY is dim1.c1 and the result of this aggregation is max(dim1.c3). In the context of MCS needs to construct UMRowAggregation for input RowGroup of dim1.c1 and dim1.c3. RowAggregation output RowGroup has a single column for max(dim1.c3). This RowAggregation can be constructed on the fly in parallel with PP constructing the hashmap for the top-level JOIN. The comments will describe state of art implementations from the open source databases. |
| Comments |
| Comment by Roman [ 2022-08-19 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Here is the Postgres 14 query plan for the original TPC-H q2 without indices.
As you can see Subplan 1 does an aggregation on top of number of hash joins with the top nested loop join. It is worth to mention that the plan will surely change with the data ingested. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Roman [ 2022-08-19 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Here is the plan for MariaDB 10.9.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Lee (Inactive) [ 2023-01-24 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
This ticket covers support for queries 2 and 17, which have been verified and closed |