Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-8306

Complete cost-based optimization for ORDER BY with LIMIT

Details

    • Server 12.1 dev sprint

    Description

      A long standing (and informally known) issue:

      Join optimizer makes its choices [almost] without regard for ORDER BY ... LIMIT clause. ORDER BY ... LIMIT optimizer is invoked when the join order is already fixed. If the picked join order doesn't allow to resolve ORDER BY ... LIMIT efficiently... then we end up with a very poor query plan.

      Example:

      select * from
        t_fact
          join dim1 on t_fact.dim1_id= dim1.dim1_id
          join dim2 on t_fact.dim2_id= dim2.dim2_id
      order by
         t_fact.col1
      limit 1000;
      

      +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
      | id   | select_type | table  | type   | possible_keys   | key     | key_len | ref               | rows | Extra                           |
      +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
      |    1 | SIMPLE      | dim1   | ALL    | PRIMARY         | NULL    | NULL    | NULL              |  500 | Using temporary; Using filesort |
      |    1 | SIMPLE      | t_fact | ref    | dim1_id,dim2_id | dim1_id | 4       | j3.dim1.dim1_id   |    1 |                                 |
      |    1 | SIMPLE      | dim2   | eq_ref | PRIMARY         | PRIMARY | 4       | j3.t_fact.dim2_id |    1 |                                 |
      +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
      

      This uses filesort and takes ~8 sec.
      Now, let's force the right join order:

      select * from
        t_fact
          straight_join dim1 on t_fact.dim1_id= dim1.dim1_id
          straight_join dim2 on t_fact.dim2_id= dim2.dim2_id
      order by
         t_fact.col1
      limit 1000;
      

      +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
      | id   | select_type | table  | type   | possible_keys   | key     | key_len | ref               | rows | Extra |
      +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
      |    1 | SIMPLE      | t_fact | index  | dim1_id,dim2_id | col1    | 4       | NULL              | 1000 |       |
      |    1 | SIMPLE      | dim1   | eq_ref | PRIMARY         | PRIMARY | 4       | j3.t_fact.dim1_id |    1 |       |
      |    1 | SIMPLE      | dim2   | eq_ref | PRIMARY         | PRIMARY | 4       | j3.t_fact.dim2_id |    1 |       |
      +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
      

      This uses index to resolve the ORDER BY ... LIMIT and the select takes 0.01 sec to execute.

      Dataset:

      create table ten(a int);
      insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
       
      create table one_k(a int);
      insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;
       
      create table t_fact
      (
        fact_id int not null,
        dim1_id int not null,
        dim2_id int not null,
        col1    int not null,
        primary key(fact_id),
        key(dim1_id),
        key(dim2_id),
        key(col1)
      );
       
      insert into t_fact
      select
        A.a+1000*B.a+1000*1000*C.a,
        A.a,
        B.a,
        A.a+1000*B.a+1000*1000*C.a
      from
        one_k A ,
        one_k B,
        ten C
      where
       A.a<500 and B.a<500
      ;
       
       
      create table dim1
      (
        dim1_id int not null primary key,
        col1 int
      );
       
      insert into dim1
      select a,a from one_k where a<500;
       
      create table dim2
      (
        dim2_id int not null primary key,
        col1 int
      );
      insert into dim2
      select a,a from one_k where a<500;
      

      Attachments

        Issue Links

          Activity

            Before MDEV-8306, best_extension_by_limited_search had this code:

                    if (join->sort_by_table &&
                        join->sort_by_table !=
                        join->positions[join->const_tables].table->table)
                      /*
                         We may have to make a temp table, note that this is only a
                         heuristic since we cannot know for sure at this point.
                         Hence it may be wrong.
                      */
                      current_read_time= COST_ADD(current_read_time, current_record_count);
            

            Now, I see only the comment is left:

                        /*
                          We may have to make a temp table, note that this is only a
                          heuristic since we cannot know for sure at this point.
                          Hence it may be wrong.
                        */
                        cost= current_record_count;
            

            Does this mean we can't provide the old behavior? (Yes I know it is ugly and incorrect) Still, being able to provide old behavior has value and we should discuss if we could do it, here.

            psergei Sergei Petrunia added a comment - Before MDEV-8306 , best_extension_by_limited_search had this code: if (join->sort_by_table && join->sort_by_table != join->positions[join->const_tables].table->table) /* We may have to make a temp table, note that this is only a heuristic since we cannot know for sure at this point. Hence it may be wrong. */ current_read_time= COST_ADD(current_read_time, current_record_count); Now, I see only the comment is left: /* We may have to make a temp table, note that this is only a heuristic since we cannot know for sure at this point. Hence it may be wrong. */ cost= current_record_count; Does this mean we can't provide the old behavior? (Yes I know it is ugly and incorrect) Still, being able to provide old behavior has value and we should discuss if we could do it, here.

            The entire snippet is

                  else
                  {
                    /*
                      'join' is either the best partial QEP with 'search_depth' relations,
                      or the best complete QEP so far, whichever is smaller.
                    */
                    if (!nest_created &&
                        ((join->sort_by_table &&
                          join->sort_by_table !=
                          join->positions[join->const_tables].table->table) ||
                          join->sort_nest_possible))
                    {
                      double cost;
                      if (join->sort_nest_possible)
                      {
                        ulong rec_len= cache_record_length_for_nest(join, idx);
                        cost= join->sort_nest_oper_cost(partial_join_cardinality, idx,
                                                        rec_len);
                        trace_one_table.add("cost_of_sorting", cost);
                      }
                      else
                      {
                        /*
                          We may have to make a temp table, note that this is only a
                          heuristic since we cannot know for sure at this point.
                          Hence it may be wrong.
                        */
                        cost= current_record_count;
                      }
             
                      current_read_time= COST_ADD(current_read_time, cost);
                    }
            

            So when the ORDER BY LIMIT optimization is not enabled we enter the branch

            else
                      {
                        /*
                          We may have to make a temp table, note that this is only a
                          heuristic since we cannot know for sure at this point.
                          Hence it may be wrong.
                        */
                        cost= current_record_count;
                      }
            

            And then this cost of sorting is added to the cost of the current plan

             
            current_read_time= COST_ADD(current_read_time, cost);
            

            So we still provide the same old behaviour when the ORDER BY LIMIT optimization is not enabled.

            varun Varun Gupta (Inactive) added a comment - The entire snippet is else { /* 'join' is either the best partial QEP with 'search_depth' relations, or the best complete QEP so far, whichever is smaller. */ if (!nest_created && ((join->sort_by_table && join->sort_by_table != join->positions[join->const_tables].table->table) || join->sort_nest_possible)) { double cost; if (join->sort_nest_possible) { ulong rec_len= cache_record_length_for_nest(join, idx); cost= join->sort_nest_oper_cost(partial_join_cardinality, idx, rec_len); trace_one_table.add( "cost_of_sorting" , cost); } else { /* We may have to make a temp table, note that this is only a heuristic since we cannot know for sure at this point. Hence it may be wrong. */ cost= current_record_count; }   current_read_time= COST_ADD(current_read_time, cost); } So when the ORDER BY LIMIT optimization is not enabled we enter the branch else { /* We may have to make a temp table, note that this is only a heuristic since we cannot know for sure at this point. Hence it may be wrong. */ cost= current_record_count; } And then this cost of sorting is added to the cost of the current plan current_read_time= COST_ADD(current_read_time, cost); So we still provide the same old behaviour when the ORDER BY LIMIT optimization is not enabled.

            The rebased code on top of 10.5 is in the branch 10.5-mdev8306-2

            varun Varun Gupta (Inactive) added a comment - The rebased code on top of 10.5 is in the branch 10.5-mdev8306-2

            A related bug in MySQL to look at : https://bugs.mysql.com/bug.php?id=97001

            psergei Sergei Petrunia added a comment - A related bug in MySQL to look at : https://bugs.mysql.com/bug.php?id=97001

            The updated branch is on 10.6-order_by_limt
            Also a push to buildbot has been made
            http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.6-varun

            varun Varun Gupta (Inactive) added a comment - The updated branch is on 10.6-order_by_limt Also a push to buildbot has been made http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.6-varun

            People

              psergei Sergei Petrunia
              psergei Sergei Petrunia
              Votes:
              23 Vote for this issue
              Watchers:
              29 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.