[MDEV-5470] Cost-based subquery item pushdown must take into account semi-joins Created: 2013-12-19 Updated: 2014-04-09 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major |
| Reporter: | Sergei Petrunia | Assignee: | Sergei Petrunia |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Description |
|
Cost-based subquery item pushdown is limited by semi-joins. Pasting from an earlier email to maria-developers@: Hi! The below is an extension of the skype discussion. I wrote down the limitation <contents> == 1. SJ-Materialization == The general rule is: If the condition is originally attached inside an SJM nest, one should The reason for the rule is that field values for inner tables are only The only possible exception is when the condition depends only on columns that == 2. FirstMatch == Consider a query select ... and a join order ot1 FirstMatch(it1, it2) nt1 nt2 It is possible to move cond(otX) to table itX or ntX. There is a problem with moving condition attached to itX (will call it Consider join execution execution: +- it1.row2 – it2.row2 FirstMatch check is at point (F2). At that point, we know that ot1.row1 If SUBQ_COND is moved to table ntX, then there may be a situation where A suggestion by Igor: at point (F2) we should check the last return value It is actually more complicated: what if there is COND(ot1, nt1) such that == 3. Duplicate Elimination == Duplicate Elimination removes duplicate matches using a temporary table. Let's again use an example. select ... and a join order (DUPS-W-START) it1 ot1 it2 (DUPS-W-WEEDOUT) nt1 nt2 at DUPS-W-START, all rows are deleted from the temp. table. Basic idea: If a condition SUBQ_COND(itX, ...) is moved to the right across Example: Suppose, SUBQ_COND(it2) is moved to nt1, and join execution it1.row1 — ot1.row1 — it2.row1 – DUPS-W-WEEDOUT – nt1
(the same diagram with a few more steps) it1.row1 — ot1.row1 — it2.row1 – DUPS-W-WEEDOUT – nt1 it1.row2 — ot1.row1 — it2.row2 – DUPS-W-WEEDOUT
== 4. Loose Scan == Loose Scan is similar to First Match - it does short-cutting (but also makes Short-cutting imposes a limitation similar to FirstMatch - conditions that |
| Comments |
| Comment by Sergei Petrunia [ 2013-12-27 ] |
|
On Dec, 19th, I have committed a patch that limits moving of predicates as described in the above spec. The patch only affects the execution part (where Item* objects are wrapped and positioned in join_tab[i]->select_cond. The optimizer part (which makes the join optimizer take into account cost of attached predicates) works as if there were no limits. |
| Comment by Sergei Petrunia [ 2013-12-27 ] |
|
The second part of the task is to account for predicate moving limits in the join optimizer. The basic idea is: when the optimizer calculates join prefix cost, it should take into account the limits on predicate pushdown that are imposed by semi-join strategies. |
| Comment by Sergei Petrunia [ 2013-12-27 ] |
|
The apparent solution would be to amend JOIN::static_pushdown_cost() so that it doesn't try to position expensive predicates where it is not allowed. A deficiency of this approach (and a possible problem): static_pushdown_cost() is called after the join order has been constructed, and after the choice has been made about which semi-join strategy to use. Suppose, there is a certain join order J_ORD1, for which there are two possible semi-join strategies, SJ1 and SJ2. Suppose, the optimizer picks SJ1 (without taking static pushdown cost into account). it will see that this choice is expensive, but there will be no way to return to consider {J_ORD1, SJ2}. |
| Comment by Sergei Petrunia [ 2013-12-27 ] |
|
I am not sure if the above problem affects the primary example of MDEV-83. I suppose it might, because I remember a discussion with Timour about the example of MDEV-83, where the problem was that semi-join optimizer chose to use Materialization, and that was a particularly bad choice, because Materialization enumerates all possible matches that subquery could have, which causes more invocations of the expensive scalar-context subquery that was attached to one of the tables inside the SJ-Materialization nest. IIRC the problem was solved by taking into account the cost of expensive scalar-context subquery in the join optimizer. |
| Comment by Sergei Petrunia [ 2013-12-27 ] |
|
Another possible angle (discussed with Timour, w/o answer): What does current MDEV-83 code do for SJ-Materialization nests? Does optimize_semijoin_nests() include the cost of attached subquery predicates? If it does, then SJ-Materialization is heavily penalized, because advance_sj_state() will compare SJ-Materialization costs which include expensive-predicate-cost, with e.g. FirstMatch costs which do not include expensive-predicate-cost. |
| Comment by Sergei Petrunia [ 2013-12-27 ] |
|
Did some debugging to answer the question in the last comment: No, optimize_semijoin_nests() does not include costs of expensive predicates into costs of SJ-Materialization. This way, the concern of SJ-Materialization being unfairly penalized by advance_sj_state() is invalid. However, the concern about semi-join strategy choice remains. Current code will have this problem: |
| Comment by Sergei Petrunia [ 2013-12-27 ] |
|
A possible solution for this is to take into account expensive-predicate-cost Suppose there EXPENSIVE_COND(tX, tY, tZ). Then, the optimizer should:
This approach will let static-condition-pushdown to affect the choice of Possible issues:
|