Details
-
Task
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Cost-based subquery item pushdown is limited by semi-joins. Pasting from an earlier email to maria-developers@:
Hi!
The below is an extension of the skype discussion. I wrote down the limitation
imposed by FirstMatch, and also covered other Semi-join strategies.
<contents>
1. SJ-Materialization
2. FirstMatch
3. Duplicate Elimination
4. Loose Scan
</contents>
== 1. SJ-Materialization ==
The general rule is:
If the condition is originally attached inside an SJM nest, one should
not move it outside. Conditions that are originally outside should not be
moved inisde the SJM nest.
The reason for the rule is that field values for inner tables are only
available inside the SJM nest. Correspondingly, values of fields of outer
tables are not available at materialization step.
The only possible exception is when the condition depends only on columns that
are in the IN-equality. Then, it could be moved. I don't think this is worth
handling for the first milestone.
== 2. FirstMatch ==
Consider a query
select ...
from ot1, nt1, nt2
where expr(ot1) in (select ... from it1, it2, ... where ...)
and a join order
ot1 FirstMatch(it1, it2) nt1 nt2
It is possible to move cond(otX) to table itX or ntX.
There is a problem with moving condition attached to itX (will call it
SUBQ_COND. this ocndition depends on table itX and maybe on ot1) to table ntX.
Consider join execution execution:
ot1.row0
ot1.row1 + it1.row1 – it2.row1
+- it1.row2 – it2.row2 > (F1)->- nt1.row1
(F2)-<-
FirstMatch check is at point (F2). At that point, we know that ot1.row1
has had a match in the subquery, and the execution returns to table ot1.
If SUBQ_COND is moved to table ntX, then there may be a situation where
we will get to point (F2) and assume that the subquery had a match, while
actually it hasn't.
A suggestion by Igor: at point (F2) we should check the last return value
of SUBQ_COND (actually, of all conditions that were moved like SUBQ_COND was)
It is actually more complicated: what if there is COND(ot1, nt1) such that
COND(ot1.ro1, nt1.row1) = FALSE? Then, we will never evaluate
SUBQ_COND(ot1.row1, ...).
What if never evaluate SUBQ_COND(ot1.row1, ...) but before we have evaluated
SUBQ_COND(ot1.row0, ...)? We will have to track what we have evaluated for
the current row of ot1.row1.
== 3. Duplicate Elimination ==
Duplicate Elimination removes duplicate matches using a temporary table.
In order to work correctly, filtering needs to be performed before the weedout
is done.
Let's again use an example.
select ...
from ot1, nt1, nt2
where expr(ot1) in (select ... from it1, it2, ... where ...)
and a join order
(DUPS-W-START) it1 ot1 it2 (DUPS-W-WEEDOUT) nt1 nt2
at DUPS-W-START, all rows are deleted from the temp. table.
at DUPS-W-WEEDOUT, we take ot1.rowid and attempt to write it into the temp
table.
Basic idea: If a condition SUBQ_COND(itX, ...) is moved to the right across
the (DUPS-W-WEEDOUT) point, wrong query results will occur.
Example: Suppose, SUBQ_COND(it2) is moved to nt1, and join execution
proceeds as follows:
it1.row1 — ot1.row1 — it2.row1 – DUPS-W-WEEDOUT – nt1
- We reach DUPS-W-WEEDOUT with (it1.row1, ot1.row1, it2.row1).
- The temp. table is empty, so rowid(ot1.row1) is written into temp.table,
and Weedout doesn't weed it out. - At nt1, SUBQ_COND(it2.row1) is evaluated and found to be false.
join execution returns back. - Suppose, tables it2 and ot1 have no more matching rows, so execution returns
to it1.
(the same diagram with a few more steps)
it1.row1 — ot1.row1 — it2.row1 – DUPS-W-WEEDOUT – nt1
it1.row2 — ot1.row1 — it2.row2 – DUPS-W-WEEDOUT
- We reach DUPS-W-WEEDOUT with (it1.row2, ot1.tow1, it2.rowX).
- The temptable already has rowid(ot1.row1). The record combination is filtered
out. if SUBQ_COND(it2.row2)=TRUE, then the result will miss a row.
== 4. Loose Scan ==
Loose Scan is similar to First Match - it does short-cutting (but also makes
forward jumps on the driving table).
Short-cutting imposes a limitation similar to FirstMatch - conditions that
depend on subquery's tables cannot be moved out of LooseScan's range.
Attachments
Issue Links
- relates to
-
MDEV-83 Cost-based choice for the pushdown of subqueries to joined tables
- Stalled