Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-5470

Cost-based subquery item pushdown must take into account semi-joins

    XMLWordPrintable

    Details

    • Type: Task
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Cost-based subquery item pushdown is limited by semi-joins. Pasting from an earlier email to maria-developers@:

      Hi!

      The below is an extension of the skype discussion. I wrote down the limitation
      imposed by FirstMatch, and also covered other Semi-join strategies.

      <contents>
      1. SJ-Materialization
      2. FirstMatch
      3. Duplicate Elimination
      4. Loose Scan
      </contents>

      == 1. SJ-Materialization ==

      The general rule is:

      If the condition is originally attached inside an SJM nest, one should
      not move it outside. Conditions that are originally outside should not be
      moved inisde the SJM nest.

      The reason for the rule is that field values for inner tables are only
      available inside the SJM nest. Correspondingly, values of fields of outer
      tables are not available at materialization step.

      The only possible exception is when the condition depends only on columns that
      are in the IN-equality. Then, it could be moved. I don't think this is worth
      handling for the first milestone.

      == 2. FirstMatch ==

      Consider a query

      select ...
      from ot1, nt1, nt2
      where expr(ot1) in (select ... from it1, it2, ... where ...)

      and a join order

      ot1 FirstMatch(it1, it2) nt1 nt2

      It is possible to move cond(otX) to table itX or ntX.

      There is a problem with moving condition attached to itX (will call it
      SUBQ_COND. this ocndition depends on table itX and maybe on ot1) to table ntX.

      Consider join execution execution:
      ot1.row0
      ot1.row1 + it1.row1 – it2.row1

      +- it1.row2 – it2.row2 > (F1)->- nt1.row1
      (F2)-<-

      FirstMatch check is at point (F2). At that point, we know that ot1.row1
      has had a match in the subquery, and the execution returns to table ot1.

      If SUBQ_COND is moved to table ntX, then there may be a situation where
      we will get to point (F2) and assume that the subquery had a match, while
      actually it hasn't.

      A suggestion by Igor: at point (F2) we should check the last return value
      of SUBQ_COND (actually, of all conditions that were moved like SUBQ_COND was)

      It is actually more complicated: what if there is COND(ot1, nt1) such that
      COND(ot1.ro1, nt1.row1) = FALSE? Then, we will never evaluate
      SUBQ_COND(ot1.row1, ...).
      What if never evaluate SUBQ_COND(ot1.row1, ...) but before we have evaluated
      SUBQ_COND(ot1.row0, ...)? We will have to track what we have evaluated for
      the current row of ot1.row1.

      == 3. Duplicate Elimination ==

      Duplicate Elimination removes duplicate matches using a temporary table.
      In order to work correctly, filtering needs to be performed before the weedout
      is done.

      Let's again use an example.

      select ...
      from ot1, nt1, nt2
      where expr(ot1) in (select ... from it1, it2, ... where ...)

      and a join order

      (DUPS-W-START) it1 ot1 it2 (DUPS-W-WEEDOUT) nt1 nt2

      at DUPS-W-START, all rows are deleted from the temp. table.
      at DUPS-W-WEEDOUT, we take ot1.rowid and attempt to write it into the temp
      table.

      Basic idea: If a condition SUBQ_COND(itX, ...) is moved to the right across
      the (DUPS-W-WEEDOUT) point, wrong query results will occur.

      Example: Suppose, SUBQ_COND(it2) is moved to nt1, and join execution
      proceeds as follows:

      it1.row1 — ot1.row1 — it2.row1 – DUPS-W-WEEDOUT – nt1

      • We reach DUPS-W-WEEDOUT with (it1.row1, ot1.row1, it2.row1).
      • The temp. table is empty, so rowid(ot1.row1) is written into temp.table,
        and Weedout doesn't weed it out.
      • At nt1, SUBQ_COND(it2.row1) is evaluated and found to be false.
        join execution returns back.
      • Suppose, tables it2 and ot1 have no more matching rows, so execution returns
        to it1.

      (the same diagram with a few more steps)

      it1.row1 — ot1.row1 — it2.row1 – DUPS-W-WEEDOUT – nt1

      it1.row2 — ot1.row1 — it2.row2 – DUPS-W-WEEDOUT

      • We reach DUPS-W-WEEDOUT with (it1.row2, ot1.tow1, it2.rowX).
      • The temptable already has rowid(ot1.row1). The record combination is filtered
        out. if SUBQ_COND(it2.row2)=TRUE, then the result will miss a row.

      == 4. Loose Scan ==

      Loose Scan is similar to First Match - it does short-cutting (but also makes
      forward jumps on the driving table).

      Short-cutting imposes a limitation similar to FirstMatch - conditions that
      depend on subquery's tables cannot be moved out of LooseScan's range.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              psergey Sergei Petrunia
              Reporter:
              psergey Sergei Petrunia
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Dates

                Created:
                Updated: