Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-8306

Complete cost-based optimization for ORDER BY with LIMIT

    XMLWordPrintable

Details

    Description

      A long standing (and informally known) issue:

      Join optimizer makes its choices [almost] without regard for ORDER BY ... LIMIT clause. ORDER BY ... LIMIT optimizer is invoked when the join order is already fixed. If the picked join order doesn't allow to resolve ORDER BY ... LIMIT efficiently... then we end up with a very poor query plan.

      Example:

      select * from
        t_fact
          join dim1 on t_fact.dim1_id= dim1.dim1_id
          join dim2 on t_fact.dim2_id= dim2.dim2_id
      order by
         t_fact.col1
      limit 1000;
      

      +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
      | id   | select_type | table  | type   | possible_keys   | key     | key_len | ref               | rows | Extra                           |
      +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
      |    1 | SIMPLE      | dim1   | ALL    | PRIMARY         | NULL    | NULL    | NULL              |  500 | Using temporary; Using filesort |
      |    1 | SIMPLE      | t_fact | ref    | dim1_id,dim2_id | dim1_id | 4       | j3.dim1.dim1_id   |    1 |                                 |
      |    1 | SIMPLE      | dim2   | eq_ref | PRIMARY         | PRIMARY | 4       | j3.t_fact.dim2_id |    1 |                                 |
      +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
      

      This uses filesort and takes ~8 sec.
      Now, let's force the right join order:

      select * from
        t_fact
          straight_join dim1 on t_fact.dim1_id= dim1.dim1_id
          straight_join dim2 on t_fact.dim2_id= dim2.dim2_id
      order by
         t_fact.col1
      limit 1000;
      

      +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
      | id   | select_type | table  | type   | possible_keys   | key     | key_len | ref               | rows | Extra |
      +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
      |    1 | SIMPLE      | t_fact | index  | dim1_id,dim2_id | col1    | 4       | NULL              | 1000 |       |
      |    1 | SIMPLE      | dim1   | eq_ref | PRIMARY         | PRIMARY | 4       | j3.t_fact.dim1_id |    1 |       |
      |    1 | SIMPLE      | dim2   | eq_ref | PRIMARY         | PRIMARY | 4       | j3.t_fact.dim2_id |    1 |       |
      +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
      

      This uses index to resolve the ORDER BY ... LIMIT and the select takes 0.01 sec to execute.

      Dataset:

      create table ten(a int);
      insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
       
      create table one_k(a int);
      insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;
       
      create table t_fact
      (
        fact_id int not null,
        dim1_id int not null,
        dim2_id int not null,
        col1    int not null,
        primary key(fact_id),
        key(dim1_id),
        key(dim2_id),
        key(col1)
      );
       
      insert into t_fact
      select
        A.a+1000*B.a+1000*1000*C.a,
        A.a,
        B.a,
        A.a+1000*B.a+1000*1000*C.a
      from
        one_k A ,
        one_k B,
        ten C
      where
       A.a<500 and B.a<500
      ;
       
       
      create table dim1
      (
        dim1_id int not null primary key,
        col1 int
      );
       
      insert into dim1
      select a,a from one_k where a<500;
       
      create table dim2
      (
        dim2_id int not null primary key,
        col1 int
      );
      insert into dim2
      select a,a from one_k where a<500;
      

      Attachments

        Issue Links

          Activity

            People

              psergei Sergei Petrunia
              psergei Sergei Petrunia
              Votes:
              23 Vote for this issue
              Watchers:
              28 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.