Details
-
Task
-
Status: Stalled (View Workflow)
-
Minor
-
Resolution: Unresolved
-
None
-
None
Description
Contents:
- Problem description
- High-level design
- Main assumptions in this task
- General idea
- Detailed design
- Low-level design
- Testing
Problem description
If the evaluation of a dependent single-row subquery is performed always after accessing the last table the subquery depends on the total query execution time may happen to be suboptimal. We can see it in the report for LP bug #914569 .
With default settings (optimizer_switch='semijoin=on,materialization=on') for Q20 of the DBT-3
select sql_calc_found_rows |
s_name, s_address
|
from supplier, nation |
where s_suppkey in (select ps_suppkey from partsupp |
where ps_partkey in (select p_partkey from part |
where p_name like 'forest%') |
and ps_availqty > |
(select 0.5 * sum(l_quantity) |
from lineitem |
where l_partkey = ps_partkey |
and l_suppkey = ps_suppkey |
and l_shipdate >= date('1994-01-01') |
and l_shipdate < date('1994-01-01') + |
interval '1' year )) |
and s_nationkey = n_nationkey |
and n_name = 'CANADA' |
order by s_name |
limit 10;
|
the 5.3 optimizer chooses the following plan for the database dbt3x10_myisam
(a myisam DBT-3 database of scale factor 10):
+----+--------------------+----------+--------+--------------------------------------+---------------+---------+------------------------------------+------+----------------------------------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+----+--------------------+----------+--------+--------------------------------------+---------------+---------+------------------------------------+------+----------------------------------------------+
|
| 1 | PRIMARY | nation | ALL | PRIMARY | NULL | NULL | NULL | 25 | Using where; Using temporary; Using filesort |
|
| 1 | PRIMARY | supplier | ref | PRIMARY,i_s_nationkey | i_s_nationkey | 5 | dbt3x10_myisam.nation.n_nationkey | 4000 | |
|
| 1 | PRIMARY | partsupp | ref | PRIMARY,i_ps_partkey,i_ps_suppkey | i_ps_suppkey | 4 | dbt3x10_myisam.supplier.s_suppkey | 80 | Using where |
|
| 1 | PRIMARY | part | eq_ref | PRIMARY | PRIMARY | 4 | dbt3x10_myisam.partsupp.ps_partkey | 1 | Using where; FirstMatch(supplier) |
|
| 4 | DEPENDENT SUBQUERY | lineitem | ref | i_l_shipdate,i_l_partkey,i_l_suppkey | i_l_partkey | 5 | dbt3x10_myisam.partsupp.ps_partkey | 30 | Using where |
|
+----+--------------------+----------+--------+--------------------------------------+---------------+---------+------------------------------------+------+--------------------------------------------+
|
The execution by this plan takes significantly more time ( almost 4 times more) then the execution by the plan:
+----+--------------------+----------+-----------------+--------------------------------------+--------------+---------+-------------------------------------+--------+-----------------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+----+--------------------+----------+-----------------+--------------------------------------+--------------+---------+-------------------------------------+--------+-----------------------------+
|
| 1 | PRIMARY | supplier | ALL | i_s_nationkey | NULL | NULL | NULL | 100000 | Using where; Using filesort |
|
| 1 | PRIMARY | nation | eq_ref | PRIMARY | PRIMARY | 4 | dbt3x10_myisam.supplier.s_nationkey | 1 | Using where |
|
| 2 | DEPENDENT SUBQUERY | partsupp | index_subquery | i_ps_suppkey | i_ps_suppkey | 4 | func | 80 | Using where |
|
| 4 | DEPENDENT SUBQUERY | lineitem | ref | i_l_shipdate,i_l_partkey,i_l_suppkey | i_l_partkey | 5 | dbt3x10_myisam.partsupp.ps_partkey | 30 | Using where |
|
| 3 | DEPENDENT SUBQUERY | part | unique_subquery | PRIMARY | PRIMARY | 4 | func | 1 | Using where |
|
+----+--------------------+----------+-----------------+--------------------------------------+--------------+---------+-------------------------------------+--------+-----------------------------+
|
that is chosen with the settings optimizer_switch='semijoin=off,materialization=off'.
The main difference in these two plans is that with the first execution plan the dependent subquery:
(select 0.5 * sum(l_quantity) |
from lineitem |
where l_partkey = ps_partkey |
and l_suppkey = ps_suppkey |
and l_shipdate >= date('1994-01-01') |
and l_shipdate < date('1994-01-01') + interval '1' year )) |
is evaluated before the table part is accessed while many rows where (p_name like 'forest%') is not true are filtered out and evaluation of the dependent subquery is not needed for them.
The main goal of this task is to implement a cost based decision whether the subquery predicate should be pushed to the table partsupp or to the table part.
High-level design
Main assumptions in this task
- It is not possible to estimate the selectivity of a subquery
predicate during optimization.
- It is possible to estimate the cost of a subquery (mdev-402).
- While currently we can estimate only the selectivity of predicates for which
there exists suitable indexes, the task will assume that it will be possible
to estimate selectivity of most predicates based on the work in mwl#248.
Until mwl#248 can be used, predicate selectivity is estimated through
quick_condition_rows when there is an index (mdev-446).
General idea
The goal of this task is to minimize the number of executions of subqueries in
a join plan. Since join conditions are pushed to the earliest possible
table in the join plan, an expensive condition may be re-attached to any
table later in the plan (but not earlier) in order to minimize the number of
times it is executed.
- The pushdown of subquery predicates is decided for each complete join plan
during join optimization. - For each complete join plan, we find the partial join with the least
cardinality. - Depending on table dependencies, a subquery can be pushed to a join
no earlier than a certain position in the plan. We search the partial
join with the least cardinality after the first position where the
subquery can be pushed. The subquery is marked with the number of this
optimal join position. - The total cost of the plan is updated taking into account that size of
the partial join result, and the subquery cost. - After join optimization, but before the predicate pushdown phase, each
subquery is set to depend on the table that is accessed by the partial
join with least cardinality. - The predicate pushdown phase automatically pushes the subquery to the
right partial join, because it is already marked to depend on it.
Attachments
Issue Links
- includes
-
MDEV-387 Move the optimization of subqueries earlier, before make_join_select()
- Closed
-
MDEV-4612 SQ pushdown: Server crashes in make_join_statistics with materialization+semijoin, IN subqueries, constant table, impossible condition
- Closed
-
MDEV-5524 Fix "Subqueries: n" in EXPLAIN for condition pushdown
- Open
- is blocked by
-
MDEV-4145 Take into account the selectivity of single-table range predicates on non-indexed columns when searching for the best execution plan
- Closed
-
MDEV-5123 Remove duplicated conditions pushed both to join_tab->select_cond and join_tab->cache_select->cond for blocked joins.
- Closed
- relates to
-
MDEV-383 Evaluate subquery predicates earlier or later depending on their SELECTIVITY
- Open
-
MDEV-4612 SQ pushdown: Server crashes in make_join_statistics with materialization+semijoin, IN subqueries, constant table, impossible condition
- Closed
-
MDEV-4648 SQ pushdown: Wrong result (missing rows) with materialization+semijoin, IN and ALL subqueries, UNION
- Closed
-
MDEV-4657 SQ pushdown: Valgrind warnings (Conditional jump or move depends on uninitialised value) in compare_items_by_cost
- Closed
-
MDEV-4659 SQ pushdown: Valgrind warnings (Conditional jump or move depends on uninitialised value) in best_extension_by_limited_search / greedy_search with expensive_pred_static_pushdown=on
- Closed
-
MDEV-5178 Valgrind warnings (Conditional jump or move depends on uninitialised value) with static_cond_pushdown=on, FROM subquery
- Closed
-
MDEV-5201 Assertion `!(*exec_tab) || (*exec_tab >= first_tab && *exec_tab < last_tab)' fails in Item_func_dynamic_cond::val_int() with static_cond_pushdown=on
- Closed
-
MDEV-5203 Different results with record_cond_statistics=on and record_cond_statistics=off with SOME subquery
- Closed
-
MDEV-5470 Cost-based subquery item pushdown must take into account semi-joins
- Open