[MDEV-4170] Dynamic pushdown of subquery predicates Created: 2013-02-13  Updated: 2015-11-17

Status: Stalled
Project: MariaDB Server
Component/s: None
Fix Version/s: None

Type: Task Priority: Major
Reporter: Timour Katchaounov (Inactive) Assignee: Sergei Petrunia
Resolution: Unresolved Votes: 2
Labels: optimizer


 Description   

Task setting

MDEV-83 decides which is the best join_tab where a subquery should be
pushed in order to reduce the number of times the subquery is evaluated.
This is done statically, at optimization time. This approach however
may result in performance regressions. When a subquery predicate is
sufficiently selective, it may turn out that its high selectivity may
reduce the partial join result sizes of subsequent joins. This reduction
may outweigh the reduced cost of subquery executions.

Since it is currently impossible to estimate the selectivity of a
subquery predicate, it is not possible to pick at optimization time
the right place in the join order such that it takes into account both
the cost and the selectivity of the subquery predicate, and the number
of times it will be evaluated.

This task will implement a way to decide dynamically at query execution
time where in the join plan to evaluate a subquery predicate. The decision
will be based on measuring at runtime what is the predicate selectivity
based on some sample (% of result rows).

What exactly can be moved

We can't move the subquery predicates themselves. For example, in Q20 the WHERE clause has form

ps_availqty > (select sum(l_quantity) ...) AND ...

Apparently, it is not possible to move the subquery, we can only move the whole "ps_availqty > (...)" condition.

Making this condition formal: A non-trivial WHERE clause has form of

cond1 AND cond2 AND ... AND condN

Here, individual cond_i (we call them "AND-parts of WHERE") can be attached from one join_tab or another.

Conclusion: we should wrap/move AND-parts of WHERE. AND-parts of WHERE can be arbitrary items (Item_func_xxx, etc), but we will consider moving only those items that have one or more subquery items within them.

How to achieve movable conditions

We will need to have certain conditions "jump" from one join_tab to another (possibly multiple times) in course of query execution. We intend to achieve it as follows:

  • The movable condition is wrapped inside a trigger condition (an Item_func_trig_cond object (or a similar object))
  • The same Item_func_trig_cond object is attached to each join_tab where we could potentially put it.

Switching condition on/off is to be done as follows:

  • JOIN object will have a JOIN::curent_join_tab which is a pointer to the join_tab that we're currently evaluating condition for.
  • Movable predicate will have a "JOIN_TAB *active_join_tab" member which will say where the condition should be attached
  • Item_func_trig_cond (or its knock-off class) will do:

  if (active_join_tab == join->current_join_tab)
     return wrapped_item->val_int();
  else
     return 1;

Thus, there will be no "moving" of predicates during execution.

When to wrap with triggers dynamic pushdown conditions

It can be done after make_join_select() call. The procedure is as follows:

for ($T = each join_tab in query plan)
{
  for ($I= each AND-part of $T->select_cond)  // note: we scan top-level AND only, w/o recursion
  {
    if ($I->with_subquery && $I->is_expensive && !$I->is_constant)
    {
       // Make $I movable
       $WI= Item_func_trig_cond2($I);
    }
  }
}

TODO: take care of semi-join nests and outer joins.

When and under what conditions to perform dynamic pushdown

  • Predicate selectivity should be estimated based on some number of predicate
    executions S. This number S may be reached much before the query produced
    even one result row. Since some of the latest partial joins may not be executed,
    there may be no dynamic statistics for these partial joins. In this case we will
    use the partial join size estimates produced by the optimizer.
  • TODO:
    • What is the exact condition under which the dynamic pushdown condition is moved?
    • Care should be taken in the future when a query plan employes blocked join algorithms.
      If a subquery predicate is moved after a blocked join in the plan, and it turns out
      the move was based on incorrect selectivity estimates, the whole pass of filling the
      join buffers will not be able to use the moved condition. If in fact the condition
      was selective the join buffer will have to process a lot more rows that it would if
      the condition was attached earlier in the JOIN plan. With blocked joins, moving a
      condition in the plan has effect only for the subsequent block refill, but not for
      the current.

The solution is to make blocked execution to cooperate with dynamic condition pushdown.
Initially run the join with very small block sizes to collect statistics. Once there
is sufficient statistics, move the predicate accordingly, and continue join execution
with its normal block size.

EXPLAIN output

The dynamic pushdown triggers will be created before EXPLAIN prints the query plan.
Based on static optimizer analysis, one of the triggers will be chosen as the best
starting point. EXPLAIN will skip all other triggers except the active one.
For the active one EXPLAIN EXTENDED will print instead of "Using where" - "Dynamic where".

Future work will implement printout of the actual predicate pushdown state during query
execution for SHOW EXPLAIN.

Implementation details

The selectivities and costs needed to decide where in the join plan to evaluate a subquery are:

  • Add the following predicate counters to the trigger condition that wraps the moveable AND-part:
    • the total number of subquery executions
    • the number of times subquery was TRUE
    • the number of times the whole pushdown predicate was executed for the JOIN_TAB it is currently attached
    • the number of times the whole pushdown predicate attached to a particular join returned TRUE
      The above counters allow to compute the selectivity of the subquery, and the whole pushdown predicate.
  • add counters to count the sizes of the intermediate join results (one counter per JOIN_TAB)
  • compute the actual number of times the subquery will be executed, thus its total cost

The decision where to execute the subquery will be done based on the selectivity of the subquery, and the actual cardinality of the partial joins.



 Comments   
Comment by Sergei Petrunia [ 2013-02-14 ]

It seems,

  • Predicate counters should be kept in Item_func_trig_cond

Another concerns are

  • EXPLAIN
  • user-visible tracking of execution
Comment by Timour Katchaounov (Inactive) [ 2013-03-05 ]

= Task description
 
Let QEP is some query plan for query Q with tables T_1 ... T_n. Let's assume
that among other predicates the query contains a subquery predicate subq_j.
 
Depending on the cost and selectivity of subq_j, the total cost of the plan
depends on the join operator where subq_j is pushed. Let plan QEP_1 has a
subquery subq_j pushed to table T_m, and QEP_2 is exactly the same, but the
subquery is pushed instead to table T_q.
 
The costs of these two plans may differ significantly. After some number of
subquery executions L it is possible to estimate the cost and selectivity of
the subquery. When this number is reached, this task will:
- Compute predicate costs and selectivities and join fanouts based on
  execution statistics,
- Estimate the cost of all equivalent QEP_i where subq_j is pushed to
  each possible join,
- Compare the costs of alternative QEP_i plans, and select the plan with
  sufficiently lower cost than the current placement. A plan QEP_i will be
  considered to be cheaper than QEP_j if the ratio of the plan costs is less
  than some configurable cost ratio "CR":
  (QEP_i / QEP_j < CR)
- Dynamically "push" subq_j as prescribed by the best plan.
 
 
= Cost model parameters:
 
* C_cond_i
The total cost of the conditions pushed down to JOIN_TAB i.
This is the condition evaluated for each join record after applying the
selection pushed inside the table access operation (e.g. index range scan).
** Optimize-time
   The cost of any non-subquery predicate is the constant "1/5", meaning that
   the cost of checking the where clause is 1/5 of the cost of reading one
   table record.
   MWL#83 adds the cost of each subquery to the total plan cost, but there is
   no separate logic to compute the cost of a condition that contains a subquery.
** Execution-time
   Same as at optimization time.
 
* C_subq_j
Cost of subquery 'j' which may appear in some AND-part of select_cond (pushed
to join_tab 'i').
** Optimize-time
   The cost is JOIN::best_read.
** Execution-time
   TODO: It is not clear how to compute the actual execution cost in a way
   compatible with the optimize-time cost. Should it be the number of examined
   rows?
 
* C_access_i - Cost of table access operator 'i'.
Implemented as JOIN_TAB::read_time.
 
* S_cond_i - Selectivity of select_cond, the condition pushed to join_tab 'i'
** Optimize-time: a combination of range and ref estimates produced by the
   optimizer.
** Execution-time: S_cond_i = number_of_true_results / number_of_executions
 
* S_subq_i - Selectivity of the subquery pushed to join_tab 'i'.
** Optimize-time: Not available.
** Execution-time: S_subq_i = number_of_true_results / number_of_executions
 
* Fj_i
The fanout of join 'i' in the join plan. The "fanout" is the average number of
records of table T_i that match one record of the intermediate join
J_[i-1]. The fanout also takes into account the selectivity of the access
method for table T_i.
 
* Fj_1
The fanout of the first table is 1 (this is not a join, just a selection).
 
 
= Cost formulas:
 
1. Partial join result size |J_k|
 
The fanouts are relative estimates, they are not related to the actual sizes
of the partial join results. The size of a partial join result can be computed
by multipluing the number of rows produced by the first plan operator J_1, and
all subsequent fanouts:
 
|J_k| = |J_1| * MULT(1, i , Fj_i)
 
The corresponding implementation is in JOIN::get_partial_cost_and_fanout():
      record_count *= tab->records_read;
 
2. Query plan cost
 
2.1 Plan cost assuming that all conditions are cheap, and the selectivity of
the JOIN conditions not used by the table access method is not known.
 
Each select_cond has the cost: C_cond_i = (1/TIME_FOR_COMPARE).
The selectivity of all conditions is taken into account inside the estimate
for each fanout Fj_i.
 
Cost(J_1,...,J_k) = SUM(1, k, (C_access_i + |J_i| * C_cond_i))
 
The corresponding implementation is in JOIN::get_partial_cost_and_fanout():
      read_time += tab->read_time + record_count / (double) TIME_FOR_COMPARE;
 
2.2 Plan cost that takes into account subquery predicate costs, and predicate
selectivities.
 
In this case the cost of each pushdown condition depends on the subqueries it
contains. Assume:
- conjunctive conditions only
- one subquery only
 
Let |J_i| is the cardinality of the partial join after applying the access
method for table T_i. This is the number of times the pushdown condition cond_i
will be evaluated.
 
Let |PJ_i| is the cardinality of the partial join J_i after applying all pushed
conditions. The two cardinalities are related as follows:
 
|PJ_i| = |J_i| * S_cond_i
 
The cost of the plan is:
 
Cost(J_1,...,J_k) = SUM(1, k, (C_access_i + |PJ_i| * C_cond_i)) =
                  = SUM(1, k, (C_access_i + |J_i| * S_cond_i * C_cond_i))
 
If pushdown condition i contains a subquery subq_j in one of its AND-parts,
then its cost and selectivity are:
 
C_cond_i = C_subc_i + C_subq_j
S_cond_i = S_subc_i * S_subq_j
 
Notice that the selectivity and cost of the subuery is independent on the table
it is attached to, so it is not indexed with the table index.
 
Where subc_i is the conjunction of all AND-parts that do not contain the
subquery, and subq_i is the AND-part with the subquery.
 
 
3. Comparing the costs of two plans with alternative placement of a subquery
 
Let plan QEP_1 has a subquery subq_j pushed to table T_m, and QEP_2 is exactly
the same, but the subquery is pushed to table T_q. The table access costs are
the same for both plans. Let the sum of the access costs be: C_access = SUM(1,
k, (C_access_i)).
 
Cost(QEP_1) = C_access +
              (|J_1| * S_cond_1 * C_cond_1) +
              .......
	      (|J_m| * (S_subc_m * S_subq_j) * (C_subc_m + C_subq_j)) +
              .......
	      (|J_q| * S_cond_q * C_cond_q) +
              .......
	      (|J_k| * S_cond_k * C_cond_k)
 
Cost(QEP_2) = C_access +
              (|J_1| * S_cond_1 * C_cond_1) +
              .......
	      (|J_m| * S_cond_m * C_cond_m) +
              .......
	      (|J_q| * (S_subc_q * S_subq_j) * (C_subc_q + C_subq_j)) +
              .......
	      (|J_k| * S_cond_k * C_cond_k)
 
The costs above depend on the partial join cardinalities which are absolute
values. Before query execution is complete these cardinalities cannot be known
exactly. What can be known is an estimate of the join fanouts based on the
current number of rows retrieved by each partial join.
 
In order to make the two costs comparable during execution, they have to be
expressed in terms of the join fanouts Fj_i. Predicate costs and selectivity
can be estimated based on collected execution statistics at any time during
execution.
 
From point (1) above it follows we can divide the costs of both plans by |J_1|,
where |J_1| is optimizer estimate of the size of the result of the first query
plan operator. These resulting "relative" costs are sufficient for this task,
because we don't need the absolute cost values, we only need to be able to
compare the ratio of the costs of alternative placements of an expensive predicate.
The relative costs are:
 
RelCost(QEP_1) = (C_access / |J_1|) + 
              (|Fj_1| * S_cond_1 * C_cond_1) +
              .......
	      (|Fj_m| * (S_subc_m * S_subq_j) * (C_subc_m + C_subq_j)) +
              .......
	      (|Fj_q| * S_cond_q * C_cond_q) +
              .......
	      (|Fj_k| * S_cond_k * C_cond_k)
 
RelCost(QEP_2) = (C_access / |J_1|) + 
              (|Fj_1| * S_cond_1 * C_cond_1) +
              .......
	      (|Fj_m| * S_cond_m * C_cond_m) +
              .......
	      (|Fj_q| * (S_subc_q * S_subq_j) * (C_subc_q + C_subq_j)) +
              .......
	      (|Fj_k| * S_cond_k * C_cond_k)
 
 
4. Estimating join fanouts and predicate cost/selectivity during execution
 
After some number of subquery executions we assume that the collected
cost/selectivity/fanout statistics is representative. Predicate costs and
selectivities are:
  (number_of_true_results / number_of_executions).
 
Let CPJ_i is the current partial join result size. Then:
  Fj_i = CPJ_i / CPJ_[i-1]
where i > 1.
  Fj_1 = 1.
 

Comment by Sergei Petrunia [ 2013-03-05 ]

Meeting results:

  • optimize subquery predicates independently (process each one separately, ignoring existence of any "sibling", neighbor predicates)
  • for subquery cost, use the values produced by the optimizer (i.e. don't recalculate on runtime)
  • for subquery predicate selectivity, use runtime information
  • use #define for threshold value to move the subquery

After the above is done:

  • re-calculate subquery cost on runtime using the same cost formula as join optimizer uses.
Comment by Timour Katchaounov (Inactive) [ 2013-03-11 ]

TODO:
Consider reordering dynamically the conjuncts in a pushed condition according to their dynamic selectivity.
For this it is necessary to record the selectivity of all conjuncts.

Comment by Timour Katchaounov (Inactive) [ 2013-08-16 ]

When/how to move dynamic conditions earlier in the plan:

= Suppose that:

  • a join plan is: <t1, ... t_i, ..., t_j, ... t_n>, and
  • a dynamic condition D is attached to t_j, while it can be attached the earliest to table t_i, (i < j)

= Nested loops join (no blocking):
When D is evaluated for the first partial join result <t1, ... t_i, ..., t_j>, its value will not change for any subsequent partial join results with the same prefix <t1, ... t_i>. We can remember the result of D for this prefix.

  • If the result is FALSE, join execution should restart from the next prefix <t1, ... t_i>, there is no need to evaluate any of the continuations of the current <t1, ... t_i>.
    Move D to its best new position before restarting the join with the next prefix <t1, ... t_i>.
  • If the result is TRUE,
    Move D to its new best position, because it will not affect any of the continuations of the current <t1, ... t_i>, it will be TRUE for any of them.

= Blocked joins:

  • While filling the buffer of table t_i sample D for a certain % of the partial rows in t_i's buffer.
  • While sampling D, collect statistics of D's selectivity and filter the sampled partial records.
  • If the condition is sufficiently selective (e.g. >50%), re-evaluate it for the whole join cache, and reduce the size of the cache and the whole partial join.
  • Now D can be moved freely to the left where needed, because D will be true for all continuations of all records in t_i's cache.
Comment by Timour Katchaounov (Inactive) [ 2013-08-22 ]

The system variables to control this feature are:
optimizer_switch = "expensive_pred_dynamic_pushdown=on/off";
(notice that his feature interacts with MDEV-83, which is controlled by optimizer_switch="expensive_pred_static_pushdown=on/off")

The number of partial result rows after which to calculate a new optimal position for a predicate:
dynamic_pushdown_max_evals = 0 ... inf;

The ratio by which the new position must improve the total plan cost in order to move the predicate:
dynamic_pushdown_cost_ratio = 10;

Generated at Thu Feb 08 06:54:20 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.