Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-8306

Complete cost-based optimization for ORDER BY with LIMIT

    Details

      Description

      A long standing (and informally known) issue:

      Join optimizer makes its choices [almost] without regard for ORDER BY ... LIMIT clause. ORDER BY ... LIMIT optimizer is invoked when the join order is already fixed. If the picked join order doesn't allow to resolve ORDER BY ... LIMIT efficiently... then we end up with a very poor query plan.

      Example:

      select * from 
        t_fact 
          join dim1 on t_fact.dim1_id= dim1.dim1_id
          join dim2 on t_fact.dim2_id= dim2.dim2_id
      order by
         t_fact.col1
      limit 1000;

      +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
      | id   | select_type | table  | type   | possible_keys   | key     | key_len | ref               | rows | Extra                           |
      +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
      |    1 | SIMPLE      | dim1   | ALL    | PRIMARY         | NULL    | NULL    | NULL              |  500 | Using temporary; Using filesort |
      |    1 | SIMPLE      | t_fact | ref    | dim1_id,dim2_id | dim1_id | 4       | j3.dim1.dim1_id   |    1 |                                 |
      |    1 | SIMPLE      | dim2   | eq_ref | PRIMARY         | PRIMARY | 4       | j3.t_fact.dim2_id |    1 |                                 |
      +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+

      This uses filesort and takes ~8 sec.
      Now, let's force the right join order:

      select * from 
        t_fact 
          straight_join dim1 on t_fact.dim1_id= dim1.dim1_id
          straight_join dim2 on t_fact.dim2_id= dim2.dim2_id
      order by
         t_fact.col1
      limit 1000;

      +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
      | id   | select_type | table  | type   | possible_keys   | key     | key_len | ref               | rows | Extra |
      +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
      |    1 | SIMPLE      | t_fact | index  | dim1_id,dim2_id | col1    | 4       | NULL              | 1000 |       |
      |    1 | SIMPLE      | dim1   | eq_ref | PRIMARY         | PRIMARY | 4       | j3.t_fact.dim1_id |    1 |       |
      |    1 | SIMPLE      | dim2   | eq_ref | PRIMARY         | PRIMARY | 4       | j3.t_fact.dim2_id |    1 |       |
      +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+

      This uses index to resolve the ORDER BY ... LIMIT and the select takes 0.01 sec to execute.

      Dataset:

      create table ten(a int);
      insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
       
      create table one_k(a int);
      insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;
       
      create table t_fact
      (
        fact_id int not null,
        dim1_id int not null,
        dim2_id int not null,
        col1    int not null,
        primary key(fact_id),
        key(dim1_id),
        key(dim2_id),
        key(col1)
      );
       
      insert into t_fact 
      select
        A.a+1000*B.a+1000*1000*C.a,
        A.a,
        B.a,
        A.a+1000*B.a+1000*1000*C.a
      from 
        one_k A ,
        one_k B,
        ten C
      where
       A.a<500 and B.a<500
      ;
       
       
      create table dim1 
      (
        dim1_id int not null primary key,
        col1 int
      );
       
      insert into dim1 
      select a,a from one_k where a<500;
       
      create table dim2
      (
        dim2_id int not null primary key,
        col1 int
      );
      insert into dim2 
      select a,a from one_k where a<500;

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                igor Igor Babaev
                Reporter:
                psergey Sergei Petrunia
              • Votes:
                19 Vote for this issue
                Watchers:
                20 Start watching this issue

                Dates

                • Created:
                  Updated: