Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-4170

Dynamic pushdown of subquery predicates

    XMLWordPrintable

Details

    • Task
    • Status: Stalled (View Workflow)
    • Major
    • Resolution: Unresolved
    • None
    • None

    Description

      Task setting

      MDEV-83 decides which is the best join_tab where a subquery should be
      pushed in order to reduce the number of times the subquery is evaluated.
      This is done statically, at optimization time. This approach however
      may result in performance regressions. When a subquery predicate is
      sufficiently selective, it may turn out that its high selectivity may
      reduce the partial join result sizes of subsequent joins. This reduction
      may outweigh the reduced cost of subquery executions.

      Since it is currently impossible to estimate the selectivity of a
      subquery predicate, it is not possible to pick at optimization time
      the right place in the join order such that it takes into account both
      the cost and the selectivity of the subquery predicate, and the number
      of times it will be evaluated.

      This task will implement a way to decide dynamically at query execution
      time where in the join plan to evaluate a subquery predicate. The decision
      will be based on measuring at runtime what is the predicate selectivity
      based on some sample (% of result rows).

      What exactly can be moved

      We can't move the subquery predicates themselves. For example, in Q20 the WHERE clause has form

      ps_availqty > (select sum(l_quantity) ...) AND ...

      Apparently, it is not possible to move the subquery, we can only move the whole "ps_availqty > (...)" condition.

      Making this condition formal: A non-trivial WHERE clause has form of

      cond1 AND cond2 AND ... AND condN

      Here, individual cond_i (we call them "AND-parts of WHERE") can be attached from one join_tab or another.

      Conclusion: we should wrap/move AND-parts of WHERE. AND-parts of WHERE can be arbitrary items (Item_func_xxx, etc), but we will consider moving only those items that have one or more subquery items within them.

      How to achieve movable conditions

      We will need to have certain conditions "jump" from one join_tab to another (possibly multiple times) in course of query execution. We intend to achieve it as follows:

      • The movable condition is wrapped inside a trigger condition (an Item_func_trig_cond object (or a similar object))
      • The same Item_func_trig_cond object is attached to each join_tab where we could potentially put it.

      Switching condition on/off is to be done as follows:

      • JOIN object will have a JOIN::curent_join_tab which is a pointer to the join_tab that we're currently evaluating condition for.
      • Movable predicate will have a "JOIN_TAB *active_join_tab" member which will say where the condition should be attached
      • Item_func_trig_cond (or its knock-off class) will do:

        if (active_join_tab == join->current_join_tab)
           return wrapped_item->val_int();
        else
           return 1;

      Thus, there will be no "moving" of predicates during execution.

      When to wrap with triggers dynamic pushdown conditions

      It can be done after make_join_select() call. The procedure is as follows:

      for ($T = each join_tab in query plan)
      {
        for ($I= each AND-part of $T->select_cond)  // note: we scan top-level AND only, w/o recursion
        {
          if ($I->with_subquery && $I->is_expensive && !$I->is_constant)
          {
             // Make $I movable
             $WI= Item_func_trig_cond2($I);
          }
        }
      }

      TODO: take care of semi-join nests and outer joins.

      When and under what conditions to perform dynamic pushdown

      • Predicate selectivity should be estimated based on some number of predicate
        executions S. This number S may be reached much before the query produced
        even one result row. Since some of the latest partial joins may not be executed,
        there may be no dynamic statistics for these partial joins. In this case we will
        use the partial join size estimates produced by the optimizer.
      • TODO:
        • What is the exact condition under which the dynamic pushdown condition is moved?
        • Care should be taken in the future when a query plan employes blocked join algorithms.
          If a subquery predicate is moved after a blocked join in the plan, and it turns out
          the move was based on incorrect selectivity estimates, the whole pass of filling the
          join buffers will not be able to use the moved condition. If in fact the condition
          was selective the join buffer will have to process a lot more rows that it would if
          the condition was attached earlier in the JOIN plan. With blocked joins, moving a
          condition in the plan has effect only for the subsequent block refill, but not for
          the current.

      The solution is to make blocked execution to cooperate with dynamic condition pushdown.
      Initially run the join with very small block sizes to collect statistics. Once there
      is sufficient statistics, move the predicate accordingly, and continue join execution
      with its normal block size.

      EXPLAIN output

      The dynamic pushdown triggers will be created before EXPLAIN prints the query plan.
      Based on static optimizer analysis, one of the triggers will be chosen as the best
      starting point. EXPLAIN will skip all other triggers except the active one.
      For the active one EXPLAIN EXTENDED will print instead of "Using where" - "Dynamic where".

      Future work will implement printout of the actual predicate pushdown state during query
      execution for SHOW EXPLAIN.

      Implementation details

      The selectivities and costs needed to decide where in the join plan to evaluate a subquery are:

      • Add the following predicate counters to the trigger condition that wraps the moveable AND-part:
        • the total number of subquery executions
        • the number of times subquery was TRUE
        • the number of times the whole pushdown predicate was executed for the JOIN_TAB it is currently attached
        • the number of times the whole pushdown predicate attached to a particular join returned TRUE
          The above counters allow to compute the selectivity of the subquery, and the whole pushdown predicate.
      • add counters to count the sizes of the intermediate join results (one counter per JOIN_TAB)
      • compute the actual number of times the subquery will be executed, thus its total cost

      The decision where to execute the subquery will be done based on the selectivity of the subquery, and the actual cardinality of the partial joins.

      Attachments

        Activity

          People

            psergei Sergei Petrunia
            timour Timour Katchaounov (Inactive)
            Votes:
            2 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.