= Task description
|
|
Let QEP is some query plan for query Q with tables T_1 ... T_n. Let's assume
|
that among other predicates the query contains a subquery predicate subq_j.
|
|
Depending on the cost and selectivity of subq_j, the total cost of the plan
|
depends on the join operator where subq_j is pushed. Let plan QEP_1 has a
|
subquery subq_j pushed to table T_m, and QEP_2 is exactly the same, but the
|
subquery is pushed instead to table T_q.
|
|
The costs of these two plans may differ significantly. After some number of
|
subquery executions L it is possible to estimate the cost and selectivity of
|
the subquery. When this number is reached, this task will:
|
- Compute predicate costs and selectivities and join fanouts based on
|
execution statistics,
|
- Estimate the cost of all equivalent QEP_i where subq_j is pushed to
|
each possible join,
|
- Compare the costs of alternative QEP_i plans, and select the plan with
|
sufficiently lower cost than the current placement. A plan QEP_i will be
|
considered to be cheaper than QEP_j if the ratio of the plan costs is less
|
than some configurable cost ratio "CR":
|
(QEP_i / QEP_j < CR)
|
- Dynamically "push" subq_j as prescribed by the best plan.
|
|
|
= Cost model parameters:
|
|
* C_cond_i
|
The total cost of the conditions pushed down to JOIN_TAB i.
|
This is the condition evaluated for each join record after applying the
|
selection pushed inside the table access operation (e.g. index range scan).
|
** Optimize-time
|
The cost of any non-subquery predicate is the constant "1/5", meaning that
|
the cost of checking the where clause is 1/5 of the cost of reading one
|
table record.
|
MWL#83 adds the cost of each subquery to the total plan cost, but there is
|
no separate logic to compute the cost of a condition that contains a subquery.
|
** Execution-time
|
Same as at optimization time.
|
|
* C_subq_j
|
Cost of subquery 'j' which may appear in some AND-part of select_cond (pushed
|
to join_tab 'i').
|
** Optimize-time
|
The cost is JOIN::best_read.
|
** Execution-time
|
TODO: It is not clear how to compute the actual execution cost in a way
|
compatible with the optimize-time cost. Should it be the number of examined
|
rows?
|
|
* C_access_i - Cost of table access operator 'i'.
|
Implemented as JOIN_TAB::read_time.
|
|
* S_cond_i - Selectivity of select_cond, the condition pushed to join_tab 'i'
|
** Optimize-time: a combination of range and ref estimates produced by the
|
optimizer.
|
** Execution-time: S_cond_i = number_of_true_results / number_of_executions
|
|
* S_subq_i - Selectivity of the subquery pushed to join_tab 'i'.
|
** Optimize-time: Not available.
|
** Execution-time: S_subq_i = number_of_true_results / number_of_executions
|
|
* Fj_i
|
The fanout of join 'i' in the join plan. The "fanout" is the average number of
|
records of table T_i that match one record of the intermediate join
|
J_[i-1]. The fanout also takes into account the selectivity of the access
|
method for table T_i.
|
|
* Fj_1
|
The fanout of the first table is 1 (this is not a join, just a selection).
|
|
|
= Cost formulas:
|
|
1. Partial join result size |J_k|
|
|
The fanouts are relative estimates, they are not related to the actual sizes
|
of the partial join results. The size of a partial join result can be computed
|
by multipluing the number of rows produced by the first plan operator J_1, and
|
all subsequent fanouts:
|
|
|J_k| = |J_1| * MULT(1, i , Fj_i)
|
|
The corresponding implementation is in JOIN::get_partial_cost_and_fanout():
|
record_count *= tab->records_read;
|
|
2. Query plan cost
|
|
2.1 Plan cost assuming that all conditions are cheap, and the selectivity of
|
the JOIN conditions not used by the table access method is not known.
|
|
Each select_cond has the cost: C_cond_i = (1/TIME_FOR_COMPARE).
|
The selectivity of all conditions is taken into account inside the estimate
|
for each fanout Fj_i.
|
|
Cost(J_1,...,J_k) = SUM(1, k, (C_access_i + |J_i| * C_cond_i))
|
|
The corresponding implementation is in JOIN::get_partial_cost_and_fanout():
|
read_time += tab->read_time + record_count / (double) TIME_FOR_COMPARE;
|
|
2.2 Plan cost that takes into account subquery predicate costs, and predicate
|
selectivities.
|
|
In this case the cost of each pushdown condition depends on the subqueries it
|
contains. Assume:
|
- conjunctive conditions only
|
- one subquery only
|
|
Let |J_i| is the cardinality of the partial join after applying the access
|
method for table T_i. This is the number of times the pushdown condition cond_i
|
will be evaluated.
|
|
Let |PJ_i| is the cardinality of the partial join J_i after applying all pushed
|
conditions. The two cardinalities are related as follows:
|
|
|PJ_i| = |J_i| * S_cond_i
|
|
The cost of the plan is:
|
|
Cost(J_1,...,J_k) = SUM(1, k, (C_access_i + |PJ_i| * C_cond_i)) =
|
= SUM(1, k, (C_access_i + |J_i| * S_cond_i * C_cond_i))
|
|
If pushdown condition i contains a subquery subq_j in one of its AND-parts,
|
then its cost and selectivity are:
|
|
C_cond_i = C_subc_i + C_subq_j
|
S_cond_i = S_subc_i * S_subq_j
|
|
Notice that the selectivity and cost of the subuery is independent on the table
|
it is attached to, so it is not indexed with the table index.
|
|
Where subc_i is the conjunction of all AND-parts that do not contain the
|
subquery, and subq_i is the AND-part with the subquery.
|
|
|
3. Comparing the costs of two plans with alternative placement of a subquery
|
|
Let plan QEP_1 has a subquery subq_j pushed to table T_m, and QEP_2 is exactly
|
the same, but the subquery is pushed to table T_q. The table access costs are
|
the same for both plans. Let the sum of the access costs be: C_access = SUM(1,
|
k, (C_access_i)).
|
|
Cost(QEP_1) = C_access +
|
(|J_1| * S_cond_1 * C_cond_1) +
|
.......
|
(|J_m| * (S_subc_m * S_subq_j) * (C_subc_m + C_subq_j)) +
|
.......
|
(|J_q| * S_cond_q * C_cond_q) +
|
.......
|
(|J_k| * S_cond_k * C_cond_k)
|
|
Cost(QEP_2) = C_access +
|
(|J_1| * S_cond_1 * C_cond_1) +
|
.......
|
(|J_m| * S_cond_m * C_cond_m) +
|
.......
|
(|J_q| * (S_subc_q * S_subq_j) * (C_subc_q + C_subq_j)) +
|
.......
|
(|J_k| * S_cond_k * C_cond_k)
|
|
The costs above depend on the partial join cardinalities which are absolute
|
values. Before query execution is complete these cardinalities cannot be known
|
exactly. What can be known is an estimate of the join fanouts based on the
|
current number of rows retrieved by each partial join.
|
|
In order to make the two costs comparable during execution, they have to be
|
expressed in terms of the join fanouts Fj_i. Predicate costs and selectivity
|
can be estimated based on collected execution statistics at any time during
|
execution.
|
|
From point (1) above it follows we can divide the costs of both plans by |J_1|,
|
where |J_1| is optimizer estimate of the size of the result of the first query
|
plan operator. These resulting "relative" costs are sufficient for this task,
|
because we don't need the absolute cost values, we only need to be able to
|
compare the ratio of the costs of alternative placements of an expensive predicate.
|
The relative costs are:
|
|
RelCost(QEP_1) = (C_access / |J_1|) +
|
(|Fj_1| * S_cond_1 * C_cond_1) +
|
.......
|
(|Fj_m| * (S_subc_m * S_subq_j) * (C_subc_m + C_subq_j)) +
|
.......
|
(|Fj_q| * S_cond_q * C_cond_q) +
|
.......
|
(|Fj_k| * S_cond_k * C_cond_k)
|
|
RelCost(QEP_2) = (C_access / |J_1|) +
|
(|Fj_1| * S_cond_1 * C_cond_1) +
|
.......
|
(|Fj_m| * S_cond_m * C_cond_m) +
|
.......
|
(|Fj_q| * (S_subc_q * S_subq_j) * (C_subc_q + C_subq_j)) +
|
.......
|
(|Fj_k| * S_cond_k * C_cond_k)
|
|
|
4. Estimating join fanouts and predicate cost/selectivity during execution
|
|
After some number of subquery executions we assume that the collected
|
cost/selectivity/fanout statistics is representative. Predicate costs and
|
selectivities are:
|
(number_of_true_results / number_of_executions).
|
|
Let CPJ_i is the current partial join result size. Then:
|
Fj_i = CPJ_i / CPJ_[i-1]
|
where i > 1.
|
Fj_1 = 1.
|
|