Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-8306

Complete cost-based optimization for ORDER BY with LIMIT

Details

    Description

      A long standing (and informally known) issue:

      Join optimizer makes its choices [almost] without regard for ORDER BY ... LIMIT clause. ORDER BY ... LIMIT optimizer is invoked when the join order is already fixed. If the picked join order doesn't allow to resolve ORDER BY ... LIMIT efficiently... then we end up with a very poor query plan.

      Example:

      select * from
        t_fact
          join dim1 on t_fact.dim1_id= dim1.dim1_id
          join dim2 on t_fact.dim2_id= dim2.dim2_id
      order by
         t_fact.col1
      limit 1000;
      

      +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
      | id   | select_type | table  | type   | possible_keys   | key     | key_len | ref               | rows | Extra                           |
      +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
      |    1 | SIMPLE      | dim1   | ALL    | PRIMARY         | NULL    | NULL    | NULL              |  500 | Using temporary; Using filesort |
      |    1 | SIMPLE      | t_fact | ref    | dim1_id,dim2_id | dim1_id | 4       | j3.dim1.dim1_id   |    1 |                                 |
      |    1 | SIMPLE      | dim2   | eq_ref | PRIMARY         | PRIMARY | 4       | j3.t_fact.dim2_id |    1 |                                 |
      +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
      

      This uses filesort and takes ~8 sec.
      Now, let's force the right join order:

      select * from
        t_fact
          straight_join dim1 on t_fact.dim1_id= dim1.dim1_id
          straight_join dim2 on t_fact.dim2_id= dim2.dim2_id
      order by
         t_fact.col1
      limit 1000;
      

      +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
      | id   | select_type | table  | type   | possible_keys   | key     | key_len | ref               | rows | Extra |
      +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
      |    1 | SIMPLE      | t_fact | index  | dim1_id,dim2_id | col1    | 4       | NULL              | 1000 |       |
      |    1 | SIMPLE      | dim1   | eq_ref | PRIMARY         | PRIMARY | 4       | j3.t_fact.dim1_id |    1 |       |
      |    1 | SIMPLE      | dim2   | eq_ref | PRIMARY         | PRIMARY | 4       | j3.t_fact.dim2_id |    1 |       |
      +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
      

      This uses index to resolve the ORDER BY ... LIMIT and the select takes 0.01 sec to execute.

      Dataset:

      create table ten(a int);
      insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
       
      create table one_k(a int);
      insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;
       
      create table t_fact
      (
        fact_id int not null,
        dim1_id int not null,
        dim2_id int not null,
        col1    int not null,
        primary key(fact_id),
        key(dim1_id),
        key(dim2_id),
        key(col1)
      );
       
      insert into t_fact
      select
        A.a+1000*B.a+1000*1000*C.a,
        A.a,
        B.a,
        A.a+1000*B.a+1000*1000*C.a
      from
        one_k A ,
        one_k B,
        ten C
      where
       A.a<500 and B.a<500
      ;
       
       
      create table dim1
      (
        dim1_id int not null primary key,
        col1 int
      );
       
      insert into dim1
      select a,a from one_k where a<500;
       
      create table dim2
      (
        dim2_id int not null primary key,
        col1 int
      );
      insert into dim2
      select a,a from one_k where a<500;
      

      Attachments

        Issue Links

          Activity

            psergei Sergei Petrunia created issue -
            psergei Sergei Petrunia made changes -
            Field Original Value New Value
            Description A long standing (and informally known) issue:

            Join optimizer makes its choices [almost] without regard for ORDER BY ... LIMIT clause. ORDER BY ... LIMIT optimizer is invoked when the join order is already fixed. If the picked join order doesn't allow to resolve ORDER BY ... LIMIT efficiently... then we end up with a very poor query plan.

            Example:
            {noformat}
            select * from
              t_fact
                join dim1 on t_fact.dim1_id= dim1.dim1_id
                join dim2 on t_fact.dim2_id= dim2.dim2_id
            order by
               t_fact.col1
            limit 1000;
            {noformat}
            {noformat}
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
            | 1 | SIMPLE | dim1 | ALL | PRIMARY | NULL | NULL | NULL | 500 | Using temporary; Using filesort |
            | 1 | SIMPLE | t_fact | ref | dim1_id,dim2_id | dim1_id | 4 | j3.dim1.dim1_id | 1 | |
            | 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | |
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
            {noformat}

            This uses filesort and takes ~8 sec.
            Now, let's force the right join order:
            {noformat}
            select * from
              t_fact
                straight_join dim1 on t_fact.dim1_id= dim1.dim1_id
                straight_join dim2 on t_fact.dim2_id= dim2.dim2_id
            order by
               t_fact.col1
            limit 1000;
            {noformat}

            {noformat}
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
            | 1 | SIMPLE | t_fact | index | dim1_id,dim2_id | col1 | 4 | NULL | 1000 | |
            | 1 | SIMPLE | dim1 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim1_id | 1 | |
            | 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | |
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
            {noformat}
            This uses index to resolve the ORDER BY ... LIMIT and the select takes 0.01 sec to execute.
            A long standing (and informally known) issue:

            Join optimizer makes its choices [almost] without regard for ORDER BY ... LIMIT clause. ORDER BY ... LIMIT optimizer is invoked when the join order is already fixed. If the picked join order doesn't allow to resolve ORDER BY ... LIMIT efficiently... then we end up with a very poor query plan.

            Example:
            {noformat}
            select * from
              t_fact
                join dim1 on t_fact.dim1_id= dim1.dim1_id
                join dim2 on t_fact.dim2_id= dim2.dim2_id
            order by
               t_fact.col1
            limit 1000;
            {noformat}
            {noformat}
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
            | 1 | SIMPLE | dim1 | ALL | PRIMARY | NULL | NULL | NULL | 500 | Using temporary; Using filesort |
            | 1 | SIMPLE | t_fact | ref | dim1_id,dim2_id | dim1_id | 4 | j3.dim1.dim1_id | 1 | |
            | 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | |
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
            {noformat}

            This uses filesort and takes ~8 sec.
            Now, let's force the right join order:
            {noformat}
            select * from
              t_fact
                straight_join dim1 on t_fact.dim1_id= dim1.dim1_id
                straight_join dim2 on t_fact.dim2_id= dim2.dim2_id
            order by
               t_fact.col1
            limit 1000;
            {noformat}

            {noformat}
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
            | 1 | SIMPLE | t_fact | index | dim1_id,dim2_id | col1 | 4 | NULL | 1000 | |
            | 1 | SIMPLE | dim1 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim1_id | 1 | |
            | 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | |
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
            {noformat}
            This uses index to resolve the ORDER BY ... LIMIT and the select takes 0.01 sec to execute.


            Dataset:
            {noformat}
            create table ten(a int);
            insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);

            create table one_k(a int);
            insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;

            create table t_fact
            (
              fact_id int not null,
              dim1_id int not null,
              dim2_id int not null,
              col1 int not null,
              primary key(fact_id),
              key(dim1_id),
              key(dim2_id),
              key(col1)
            );

            insert into t_fact
            select
              A.a+1000*B.a+1000*1000*C.a,
              A.a,
              B.a,
              A.a+1000*B.a+1000*1000*C.a
            from
              one_k A ,
              one_k B,
              ten C
            where
             A.a<500 and B.a<500
            ;


            create table dim1
            (
              dim1_id int not null primary key,
              col1 int
            );

            insert into dim1
            select a,a from one_k where a<500;

            create table dim2
            (
              dim2_id int not null primary key,
              col1 int
            );
            insert into dim2
            select a,a from one_k where a<500;
            {noformat}
            psergei Sergei Petrunia made changes -
            Labels optimizer order-by-optimization
            psergei Sergei Petrunia made changes -
            Fix Version/s 10.1 [ 16100 ]
            psergei Sergei Petrunia made changes -
            psergei Sergei Petrunia made changes -
            Fix Version/s 10.2 [ 14601 ]
            Fix Version/s 10.1 [ 16100 ]
            psergei Sergei Petrunia made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            psergei Sergei Petrunia made changes -
            Status In Progress [ 3 ] Stalled [ 10000 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 10.4 [ 22408 ]
            Fix Version/s 10.2 [ 14601 ]
            serg Sergei Golubchik made changes -
            Affects Version/s 10.0.19 [ 19200 ]
            Issue Type Bug [ 1 ] Task [ 3 ]
            psergei Sergei Petrunia made changes -
            alice Alice Sherepa made changes -
            julien.fritsch Julien Fritsch made changes -
            Support case ID 9377 not-9377
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 10.4 [ 22408 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            NRE Projects RM_105_CANDIDATE
            ralf.gebhardt Ralf Gebhardt made changes -
            NRE Projects RM_105_CANDIDATE RM_105_CANDIDATE RM_105_OPTIMIZER
            varun Varun Gupta (Inactive) made changes -
            Assignee Sergei Petrunia [ psergey ] Varun Gupta [ varun ]
            varun Varun Gupta (Inactive) made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            varun Varun Gupta (Inactive) made changes -
            varun Varun Gupta (Inactive) made changes -
            Fix Version/s 10.5 [ 23123 ]
            varun Varun Gupta (Inactive) made changes -
            varun Varun Gupta (Inactive) made changes -
            varun Varun Gupta (Inactive) made changes -
            varun Varun Gupta (Inactive) made changes -
            serg Sergei Golubchik made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            varun Varun Gupta (Inactive) made changes -
            varun Varun Gupta (Inactive) made changes -
            Assignee Varun Gupta [ varun ] Igor Babaev [ igor ]
            Status In Progress [ 3 ] In Review [ 10002 ]
            varun Varun Gupta (Inactive) made changes -
            varun Varun Gupta (Inactive) made changes -
            Summary Poor optimization of JOIN and ORDER BY ... LIMIT Complete cost-based optimization for ORDER BY with LIMIT
            bradjorgensen Brad Jorgensen made changes -
            varun Varun Gupta (Inactive) made changes -
            varun Varun Gupta (Inactive) made changes -
            varun Varun Gupta (Inactive) made changes -
            varun Varun Gupta (Inactive) made changes -
            igor Igor Babaev (Inactive) made changes -
            Status In Review [ 10002 ] Stalled [ 10000 ]
            igor Igor Babaev (Inactive) made changes -
            Assignee Igor Babaev [ igor ] Varun Gupta [ varun ]
            igor Igor Babaev (Inactive) made changes -
            Comment [ Review is not needed as it is a header mdev now.
            ]
            igor Igor Babaev (Inactive) made changes -
            Assignee Varun Gupta [ varun ] Igor Babaev [ igor ]
            igor Igor Babaev (Inactive) made changes -
            Status Stalled [ 10000 ] Open [ 1 ]
            varun Varun Gupta (Inactive) made changes -
            Status Open [ 1 ] Confirmed [ 10101 ]
            igor Igor Babaev (Inactive) made changes -
            Status Confirmed [ 10101 ] In Review [ 10002 ]
            serg Sergei Golubchik made changes -
            psergei Sergei Petrunia made changes -
            varun Varun Gupta (Inactive) made changes -
            Assignee Igor Babaev [ igor ] Varun Gupta [ varun ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 10.6 [ 24028 ]
            Fix Version/s 10.5 [ 23123 ]
            varun Varun Gupta (Inactive) made changes -
            Status In Review [ 10002 ] Stalled [ 10000 ]
            varun Varun Gupta (Inactive) made changes -
            serg Sergei Golubchik made changes -
            Rank Ranked higher
            varun Varun Gupta (Inactive) made changes -
            Assignee Varun Gupta [ varun ] Sergei Petrunia [ psergey ]
            Status Stalled [ 10000 ] In Review [ 10002 ]
            psergei Sergei Petrunia made changes -
            Fix Version/s 10.7 [ 24805 ]
            Fix Version/s 10.6 [ 24028 ]
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            Labels optimizer order-by-optimization ServiceNow optimizer order-by-optimization
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            Labels ServiceNow optimizer order-by-optimization 76qDvLB8Gju6Hs7nk3VY3EX42G795W5z optimizer order-by-optimization
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            serg Sergei Golubchik made changes -
            Labels 76qDvLB8Gju6Hs7nk3VY3EX42G795W5z optimizer order-by-optimization optimizer order-by-optimization
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            serg Sergei Golubchik made changes -
            Priority Critical [ 2 ] Major [ 3 ]
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 10.8 [ 26121 ]
            Fix Version/s 10.7 [ 24805 ]
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 69896 ] MariaDB v4 [ 131761 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 10.9 [ 26905 ]
            Fix Version/s 10.8 [ 26121 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 10.10 [ 27530 ]
            Fix Version/s 10.9 [ 26905 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 10.11 [ 27614 ]
            Fix Version/s 10.10 [ 27530 ]
            AirFocus AirFocus made changes -
            Description A long standing (and informally known) issue:

            Join optimizer makes its choices [almost] without regard for ORDER BY ... LIMIT clause. ORDER BY ... LIMIT optimizer is invoked when the join order is already fixed. If the picked join order doesn't allow to resolve ORDER BY ... LIMIT efficiently... then we end up with a very poor query plan.

            Example:
            {noformat}
            select * from
              t_fact
                join dim1 on t_fact.dim1_id= dim1.dim1_id
                join dim2 on t_fact.dim2_id= dim2.dim2_id
            order by
               t_fact.col1
            limit 1000;
            {noformat}
            {noformat}
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
            | 1 | SIMPLE | dim1 | ALL | PRIMARY | NULL | NULL | NULL | 500 | Using temporary; Using filesort |
            | 1 | SIMPLE | t_fact | ref | dim1_id,dim2_id | dim1_id | 4 | j3.dim1.dim1_id | 1 | |
            | 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | |
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
            {noformat}

            This uses filesort and takes ~8 sec.
            Now, let's force the right join order:
            {noformat}
            select * from
              t_fact
                straight_join dim1 on t_fact.dim1_id= dim1.dim1_id
                straight_join dim2 on t_fact.dim2_id= dim2.dim2_id
            order by
               t_fact.col1
            limit 1000;
            {noformat}

            {noformat}
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
            | 1 | SIMPLE | t_fact | index | dim1_id,dim2_id | col1 | 4 | NULL | 1000 | |
            | 1 | SIMPLE | dim1 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim1_id | 1 | |
            | 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | |
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
            {noformat}
            This uses index to resolve the ORDER BY ... LIMIT and the select takes 0.01 sec to execute.


            Dataset:
            {noformat}
            create table ten(a int);
            insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);

            create table one_k(a int);
            insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;

            create table t_fact
            (
              fact_id int not null,
              dim1_id int not null,
              dim2_id int not null,
              col1 int not null,
              primary key(fact_id),
              key(dim1_id),
              key(dim2_id),
              key(col1)
            );

            insert into t_fact
            select
              A.a+1000*B.a+1000*1000*C.a,
              A.a,
              B.a,
              A.a+1000*B.a+1000*1000*C.a
            from
              one_k A ,
              one_k B,
              ten C
            where
             A.a<500 and B.a<500
            ;


            create table dim1
            (
              dim1_id int not null primary key,
              col1 int
            );

            insert into dim1
            select a,a from one_k where a<500;

            create table dim2
            (
              dim2_id int not null primary key,
              col1 int
            );
            insert into dim2
            select a,a from one_k where a<500;
            {noformat}
            A long standing (and informally known) issue:

            Join optimizer makes its choices \[almost\] without regard for ORDER BY ... LIMIT clause. ORDER BY ... LIMIT optimizer is invoked when the join order is already fixed. If the picked join order doesn't allow to resolve ORDER BY ... LIMIT efficiently... then we end up with a very poor query plan.

            Example:

            {noformat}
            select * from
              t_fact
                join dim1 on t_fact.dim1_id= dim1.dim1_id
                join dim2 on t_fact.dim2_id= dim2.dim2_id
            order by
               t_fact.col1
            limit 1000;
            {noformat}

            {noformat}
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
            | 1 | SIMPLE | dim1 | ALL | PRIMARY | NULL | NULL | NULL | 500 | Using temporary; Using filesort |
            | 1 | SIMPLE | t_fact | ref | dim1_id,dim2_id | dim1_id | 4 | j3.dim1.dim1_id | 1 | |
            | 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | |
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
            {noformat}

            This uses filesort and takes ~8 sec.
            Now, let's force the right join order:

            {noformat}
            select * from
              t_fact
                straight_join dim1 on t_fact.dim1_id= dim1.dim1_id
                straight_join dim2 on t_fact.dim2_id= dim2.dim2_id
            order by
               t_fact.col1
            limit 1000;
            {noformat}

            {noformat}
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
            | 1 | SIMPLE | t_fact | index | dim1_id,dim2_id | col1 | 4 | NULL | 1000 | |
            | 1 | SIMPLE | dim1 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim1_id | 1 | |
            | 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | |
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
            {noformat}

            This uses index to resolve the ORDER BY ... LIMIT and the select takes 0.01 sec to execute.

            Dataset:

            {noformat}
            create table ten(a int);
            insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);

            create table one_k(a int);
            insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;

            create table t_fact
            (
              fact_id int not null,
              dim1_id int not null,
              dim2_id int not null,
              col1 int not null,
              primary key(fact_id),
              key(dim1_id),
              key(dim2_id),
              key(col1)
            );

            insert into t_fact
            select
              A.a+1000*B.a+1000*1000*C.a,
              A.a,
              B.a,
              A.a+1000*B.a+1000*1000*C.a
            from
              one_k A ,
              one_k B,
              ten C
            where
             A.a<500 and B.a<500
            ;


            create table dim1
            (
              dim1_id int not null primary key,
              col1 int
            );

            insert into dim1
            select a,a from one_k where a<500;

            create table dim2
            (
              dim2_id int not null primary key,
              col1 int
            );
            insert into dim2
            select a,a from one_k where a<500;
            {noformat}
            julien.fritsch Julien Fritsch made changes -
            Description A long standing (and informally known) issue:

            Join optimizer makes its choices \[almost\] without regard for ORDER BY ... LIMIT clause. ORDER BY ... LIMIT optimizer is invoked when the join order is already fixed. If the picked join order doesn't allow to resolve ORDER BY ... LIMIT efficiently... then we end up with a very poor query plan.

            Example:

            {noformat}
            select * from
              t_fact
                join dim1 on t_fact.dim1_id= dim1.dim1_id
                join dim2 on t_fact.dim2_id= dim2.dim2_id
            order by
               t_fact.col1
            limit 1000;
            {noformat}

            {noformat}
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
            | 1 | SIMPLE | dim1 | ALL | PRIMARY | NULL | NULL | NULL | 500 | Using temporary; Using filesort |
            | 1 | SIMPLE | t_fact | ref | dim1_id,dim2_id | dim1_id | 4 | j3.dim1.dim1_id | 1 | |
            | 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | |
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
            {noformat}

            This uses filesort and takes ~8 sec.
            Now, let's force the right join order:

            {noformat}
            select * from
              t_fact
                straight_join dim1 on t_fact.dim1_id= dim1.dim1_id
                straight_join dim2 on t_fact.dim2_id= dim2.dim2_id
            order by
               t_fact.col1
            limit 1000;
            {noformat}

            {noformat}
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
            | 1 | SIMPLE | t_fact | index | dim1_id,dim2_id | col1 | 4 | NULL | 1000 | |
            | 1 | SIMPLE | dim1 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim1_id | 1 | |
            | 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | |
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
            {noformat}

            This uses index to resolve the ORDER BY ... LIMIT and the select takes 0.01 sec to execute.

            Dataset:

            {noformat}
            create table ten(a int);
            insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);

            create table one_k(a int);
            insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;

            create table t_fact
            (
              fact_id int not null,
              dim1_id int not null,
              dim2_id int not null,
              col1 int not null,
              primary key(fact_id),
              key(dim1_id),
              key(dim2_id),
              key(col1)
            );

            insert into t_fact
            select
              A.a+1000*B.a+1000*1000*C.a,
              A.a,
              B.a,
              A.a+1000*B.a+1000*1000*C.a
            from
              one_k A ,
              one_k B,
              ten C
            where
             A.a<500 and B.a<500
            ;


            create table dim1
            (
              dim1_id int not null primary key,
              col1 int
            );

            insert into dim1
            select a,a from one_k where a<500;

            create table dim2
            (
              dim2_id int not null primary key,
              col1 int
            );
            insert into dim2
            select a,a from one_k where a<500;
            {noformat}
            A long standing (and informally known) issue:-

            Join optimizer makes its choices \[almost\] without regard for ORDER BY ... LIMIT clause. ORDER BY ... LIMIT optimizer is invoked when the join order is already fixed. If the picked join order doesn't allow to resolve ORDER BY ... LIMIT efficiently... then we end up with a very poor query plan.

            Example:

            {noformat}
            select * from
              t_fact
                join dim1 on t_fact.dim1_id= dim1.dim1_id
                join dim2 on t_fact.dim2_id= dim2.dim2_id
            order by
               t_fact.col1
            limit 1000;
            {noformat}

            {noformat}
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
            | 1 | SIMPLE | dim1 | ALL | PRIMARY | NULL | NULL | NULL | 500 | Using temporary; Using filesort |
            | 1 | SIMPLE | t_fact | ref | dim1_id,dim2_id | dim1_id | 4 | j3.dim1.dim1_id | 1 | |
            | 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | |
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
            {noformat}

            This uses filesort and takes ~8 sec.
            Now, let's force the right join order:

            {noformat}
            select * from
              t_fact
                straight_join dim1 on t_fact.dim1_id= dim1.dim1_id
                straight_join dim2 on t_fact.dim2_id= dim2.dim2_id
            order by
               t_fact.col1
            limit 1000;
            {noformat}

            {noformat}
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
            | 1 | SIMPLE | t_fact | index | dim1_id,dim2_id | col1 | 4 | NULL | 1000 | |
            | 1 | SIMPLE | dim1 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim1_id | 1 | |
            | 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | |
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
            {noformat}

            This uses index to resolve the ORDER BY ... LIMIT and the select takes 0.01 sec to execute.

            Dataset:

            {noformat}
            create table ten(a int);
            insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);

            create table one_k(a int);
            insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;

            create table t_fact
            (
              fact_id int not null,
              dim1_id int not null,
              dim2_id int not null,
              col1 int not null,
              primary key(fact_id),
              key(dim1_id),
              key(dim2_id),
              key(col1)
            );

            insert into t_fact
            select
              A.a+1000*B.a+1000*1000*C.a,
              A.a,
              B.a,
              A.a+1000*B.a+1000*1000*C.a
            from
              one_k A ,
              one_k B,
              ten C
            where
             A.a<500 and B.a<500
            ;


            create table dim1
            (
              dim1_id int not null primary key,
              col1 int
            );

            insert into dim1
            select a,a from one_k where a<500;

            create table dim2
            (
              dim2_id int not null primary key,
              col1 int
            );
            insert into dim2
            select a,a from one_k where a<500;
            {noformat}
            julien.fritsch Julien Fritsch made changes -
            Description A long standing (and informally known) issue:-

            Join optimizer makes its choices \[almost\] without regard for ORDER BY ... LIMIT clause. ORDER BY ... LIMIT optimizer is invoked when the join order is already fixed. If the picked join order doesn't allow to resolve ORDER BY ... LIMIT efficiently... then we end up with a very poor query plan.

            Example:

            {noformat}
            select * from
              t_fact
                join dim1 on t_fact.dim1_id= dim1.dim1_id
                join dim2 on t_fact.dim2_id= dim2.dim2_id
            order by
               t_fact.col1
            limit 1000;
            {noformat}

            {noformat}
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
            | 1 | SIMPLE | dim1 | ALL | PRIMARY | NULL | NULL | NULL | 500 | Using temporary; Using filesort |
            | 1 | SIMPLE | t_fact | ref | dim1_id,dim2_id | dim1_id | 4 | j3.dim1.dim1_id | 1 | |
            | 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | |
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
            {noformat}

            This uses filesort and takes ~8 sec.
            Now, let's force the right join order:

            {noformat}
            select * from
              t_fact
                straight_join dim1 on t_fact.dim1_id= dim1.dim1_id
                straight_join dim2 on t_fact.dim2_id= dim2.dim2_id
            order by
               t_fact.col1
            limit 1000;
            {noformat}

            {noformat}
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
            | 1 | SIMPLE | t_fact | index | dim1_id,dim2_id | col1 | 4 | NULL | 1000 | |
            | 1 | SIMPLE | dim1 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim1_id | 1 | |
            | 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | |
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
            {noformat}

            This uses index to resolve the ORDER BY ... LIMIT and the select takes 0.01 sec to execute.

            Dataset:

            {noformat}
            create table ten(a int);
            insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);

            create table one_k(a int);
            insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;

            create table t_fact
            (
              fact_id int not null,
              dim1_id int not null,
              dim2_id int not null,
              col1 int not null,
              primary key(fact_id),
              key(dim1_id),
              key(dim2_id),
              key(col1)
            );

            insert into t_fact
            select
              A.a+1000*B.a+1000*1000*C.a,
              A.a,
              B.a,
              A.a+1000*B.a+1000*1000*C.a
            from
              one_k A ,
              one_k B,
              ten C
            where
             A.a<500 and B.a<500
            ;


            create table dim1
            (
              dim1_id int not null primary key,
              col1 int
            );

            insert into dim1
            select a,a from one_k where a<500;

            create table dim2
            (
              dim2_id int not null primary key,
              col1 int
            );
            insert into dim2
            select a,a from one_k where a<500;
            {noformat}
            A long standing (and informally known) issue:

            Join optimizer makes its choices [almost] without regard for ORDER BY ... LIMIT clause. ORDER BY ... LIMIT optimizer is invoked when the join order is already fixed. If the picked join order doesn't allow to resolve ORDER BY ... LIMIT efficiently... then we end up with a very poor query plan.

            Example:

            {noformat}
            select * from
              t_fact
                join dim1 on t_fact.dim1_id= dim1.dim1_id
                join dim2 on t_fact.dim2_id= dim2.dim2_id
            order by
               t_fact.col1
            limit 1000;
            {noformat}

            {noformat}
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
            | 1 | SIMPLE | dim1 | ALL | PRIMARY | NULL | NULL | NULL | 500 | Using temporary; Using filesort |
            | 1 | SIMPLE | t_fact | ref | dim1_id,dim2_id | dim1_id | 4 | j3.dim1.dim1_id | 1 | |
            | 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | |
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
            {noformat}

            This uses filesort and takes ~8 sec.
            Now, let's force the right join order:

            {noformat}
            select * from
              t_fact
                straight_join dim1 on t_fact.dim1_id= dim1.dim1_id
                straight_join dim2 on t_fact.dim2_id= dim2.dim2_id
            order by
               t_fact.col1
            limit 1000;
            {noformat}

            {noformat}
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
            | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
            | 1 | SIMPLE | t_fact | index | dim1_id,dim2_id | col1 | 4 | NULL | 1000 | |
            | 1 | SIMPLE | dim1 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim1_id | 1 | |
            | 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | |
            +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
            {noformat}

            This uses index to resolve the ORDER BY ... LIMIT and the select takes 0.01 sec to execute.

            Dataset:

            {noformat}
            create table ten(a int);
            insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);

            create table one_k(a int);
            insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;

            create table t_fact
            (
              fact_id int not null,
              dim1_id int not null,
              dim2_id int not null,
              col1 int not null,
              primary key(fact_id),
              key(dim1_id),
              key(dim2_id),
              key(col1)
            );

            insert into t_fact
            select
              A.a+1000*B.a+1000*1000*C.a,
              A.a,
              B.a,
              A.a+1000*B.a+1000*1000*C.a
            from
              one_k A ,
              one_k B,
              ten C
            where
             A.a<500 and B.a<500
            ;


            create table dim1
            (
              dim1_id int not null primary key,
              col1 int
            );

            insert into dim1
            select a,a from one_k where a<500;

            create table dim2
            (
              dim2_id int not null primary key,
              col1 int
            );
            insert into dim2
            select a,a from one_k where a<500;
            {noformat}
            julien.fritsch Julien Fritsch made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            psergei Sergei Petrunia made changes -
            Fix Version/s 10.12 [ 28320 ]
            Fix Version/s 10.11 [ 27614 ]
            serg Sergei Golubchik made changes -
            psergei Sergei Petrunia made changes -
            Status In Review [ 10002 ] Stalled [ 10000 ]
            psergei Sergei Petrunia made changes -
            Priority Critical [ 2 ] Major [ 3 ]
            psergei Sergei Petrunia made changes -
            Fix Version/s 11.1 [ 28549 ]
            Fix Version/s 11.0 [ 28320 ]
            psergei Sergei Petrunia made changes -
            Fix Version/s 11.2 [ 28603 ]
            Fix Version/s 11.1 [ 28549 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 11.3 [ 28565 ]
            Fix Version/s 11.2 [ 28603 ]
            psergei Sergei Petrunia made changes -
            Fix Version/s 11.4 [ 29301 ]
            Fix Version/s 11.3 [ 28565 ]
            psergei Sergei Petrunia made changes -
            Labels optimizer order-by-optimization optimizer optimizer-feature order-by-optimization
            julien.fritsch Julien Fritsch made changes -
            Issue Type Task [ 3 ] New Feature [ 2 ]
            psergei Sergei Petrunia made changes -
            Fix Version/s 11.5 [ 29506 ]
            Fix Version/s 11.4 [ 29301 ]
            Roel Roel Van de Paar made changes -
            Roel Roel Van de Paar made changes -
            Roel Roel Van de Paar made changes -
            Roel Roel Van de Paar made changes -
            serg Sergei Golubchik made changes -
            Fix Version/s 11.6 [ 29515 ]
            Fix Version/s 11.5 [ 29506 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 11.7 [ 29815 ]
            Fix Version/s 11.6 [ 29515 ]
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk Related Tickets 201658 202060 159278
            Zendesk active tickets 201658
            serg Sergei Golubchik made changes -
            Fix Version/s 11.8 [ 29921 ]
            Fix Version/s 11.7 [ 29815 ]
            serg Sergei Golubchik made changes -
            alice Alice Sherepa made changes -
            alice Alice Sherepa made changes -
            serg Sergei Golubchik made changes -
            psergei Sergei Petrunia made changes -
            Fix Version/s 11.9 [ 29945 ]
            Fix Version/s 11.8 [ 29921 ]
            psergei Sergei Petrunia made changes -
            Fix Version/s 12.1 [ 29992 ]
            Fix Version/s 12.0 [ 29945 ]
            psergei Sergei Petrunia made changes -
            psergei Sergei Petrunia made changes -
            psergei Sergei Petrunia made changes -
            psergei Sergei Petrunia made changes -
            julien.fritsch Julien Fritsch made changes -
            Sprint Server 12.1 dev sprint [ 793 ]
            julien.fritsch Julien Fritsch made changes -
            Sprint Server 12.1 dev sprint [ 793 ]

            People

              psergei Sergei Petrunia
              psergei Sergei Petrunia
              Votes:
              23 Vote for this issue
              Watchers:
              29 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.