Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-83

Cost-based choice for the pushdown of subqueries to joined tables

    XMLWordPrintable

    Details

    • Type: Task
    • Status: Stalled (View Workflow)
    • Priority: Minor
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: None
    • Labels:

      Description

      Contents:

      1. Problem description
      2. High-level design
        1. Main assumptions in this task
        2. General idea
        3. Detailed design
      3. Low-level design
      4. Testing

      Problem description

      If the evaluation of a dependent single-row subquery is performed always after accessing the last table the subquery depends on the total query execution time may happen to be suboptimal. We can see it in the report for LP bug #914569 .

      With default settings (optimizer_switch='semijoin=on,materialization=on') for Q20 of the DBT-3

      select sql_calc_found_rows
             s_name, s_address
      from supplier, nation
      where s_suppkey in (select ps_suppkey from partsupp
                          where ps_partkey in (select p_partkey from part
                                               where p_name like 'forest%')
                                and ps_availqty >
                                    (select 0.5 * sum(l_quantity)
                                     from lineitem
                                     where l_partkey = ps_partkey
                                           and l_suppkey = ps_suppkey
                                           and l_shipdate >= date('1994-01-01')
                                           and l_shipdate < date('1994-01-01') +
                                               interval '1' year ))
      and s_nationkey = n_nationkey
      and n_name = 'CANADA'
      order by s_name
      limit 10;

      the 5.3 optimizer chooses the following plan for the database dbt3x10_myisam
      (a myisam DBT-3 database of scale factor 10):

      +----+--------------------+----------+--------+--------------------------------------+---------------+---------+------------------------------------+------+----------------------------------------------+
      | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
      +----+--------------------+----------+--------+--------------------------------------+---------------+---------+------------------------------------+------+----------------------------------------------+
      | 1 | PRIMARY | nation | ALL | PRIMARY | NULL | NULL | NULL | 25 | Using where; Using temporary; Using filesort |
      | 1 | PRIMARY | supplier | ref | PRIMARY,i_s_nationkey | i_s_nationkey | 5 | dbt3x10_myisam.nation.n_nationkey | 4000 | |
      | 1 | PRIMARY | partsupp | ref | PRIMARY,i_ps_partkey,i_ps_suppkey | i_ps_suppkey | 4 | dbt3x10_myisam.supplier.s_suppkey | 80 | Using where |
      | 1 | PRIMARY | part | eq_ref | PRIMARY | PRIMARY | 4 | dbt3x10_myisam.partsupp.ps_partkey | 1 | Using where; FirstMatch(supplier) |
      | 4 | DEPENDENT SUBQUERY | lineitem | ref | i_l_shipdate,i_l_partkey,i_l_suppkey | i_l_partkey | 5 | dbt3x10_myisam.partsupp.ps_partkey | 30 | Using where |
      +----+--------------------+----------+--------+--------------------------------------+---------------+---------+------------------------------------+------+--------------------------------------------+

      The execution by this plan takes significantly more time ( almost 4 times more) then the execution by the plan:

      +----+--------------------+----------+-----------------+--------------------------------------+--------------+---------+-------------------------------------+--------+-----------------------------+
      | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
      +----+--------------------+----------+-----------------+--------------------------------------+--------------+---------+-------------------------------------+--------+-----------------------------+
      | 1 | PRIMARY | supplier | ALL | i_s_nationkey | NULL | NULL | NULL | 100000 | Using where; Using filesort |
      | 1 | PRIMARY | nation | eq_ref | PRIMARY | PRIMARY | 4 | dbt3x10_myisam.supplier.s_nationkey | 1 | Using where |
      | 2 | DEPENDENT SUBQUERY | partsupp | index_subquery | i_ps_suppkey | i_ps_suppkey | 4 | func | 80 | Using where |
      | 4 | DEPENDENT SUBQUERY | lineitem | ref | i_l_shipdate,i_l_partkey,i_l_suppkey | i_l_partkey | 5 | dbt3x10_myisam.partsupp.ps_partkey | 30 | Using where |
      | 3 | DEPENDENT SUBQUERY | part | unique_subquery | PRIMARY | PRIMARY | 4 | func | 1 | Using where |
      +----+--------------------+----------+-----------------+--------------------------------------+--------------+---------+-------------------------------------+--------+-----------------------------+

      that is chosen with the settings optimizer_switch='semijoin=off,materialization=off'.

      The main difference in these two plans is that with the first execution plan the dependent subquery:

      (select 0.5 * sum(l_quantity)
       from lineitem
       where l_partkey = ps_partkey
         and l_suppkey = ps_suppkey
         and l_shipdate >= date('1994-01-01')
         and l_shipdate < date('1994-01-01') + interval '1' year ))

      is evaluated before the table part is accessed while many rows where (p_name like 'forest%') is not true are filtered out and evaluation of the dependent subquery is not needed for them.

      The main goal of this task is to implement a cost based decision whether the subquery predicate should be pushed to the table partsupp or to the table part.

      High-level design

      Main assumptions in this task

      • It is not possible to estimate the selectivity of a subquery
        predicate during optimization.
      • It is possible to estimate the cost of a subquery (mdev-402).
      • While currently we can estimate only the selectivity of predicates for which
        there exists suitable indexes, the task will assume that it will be possible
        to estimate selectivity of most predicates based on the work in mwl#248.
        Until mwl#248 can be used, predicate selectivity is estimated through
        quick_condition_rows when there is an index (mdev-446).

      General idea

      The goal of this task is to minimize the number of executions of subqueries in
      a join plan. Since join conditions are pushed to the earliest possible
      table in the join plan, an expensive condition may be re-attached to any
      table later in the plan (but not earlier) in order to minimize the number of
      times it is executed.

      • The pushdown of subquery predicates is decided for each complete join plan
        during join optimization.
      • For each complete join plan, we find the partial join with the least
        cardinality.
      • Depending on table dependencies, a subquery can be pushed to a join
        no earlier than a certain position in the plan. We search the partial
        join with the least cardinality after the first position where the
        subquery can be pushed. The subquery is marked with the number of this
        optimal join position.
      • The total cost of the plan is updated taking into account that size of
        the partial join result, and the subquery cost.
      • After join optimization, but before the predicate pushdown phase, each
        subquery is set to depend on the table that is accessed by the partial
        join with least cardinality.
      • The predicate pushdown phase automatically pushes the subquery to the
        right partial join, because it is already marked to depend on it.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              Unassigned
              Reporter:
              igor Igor Babaev
              Votes:
              2 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated: