[MDEV-8306] Complete cost-based optimization for ORDER BY with LIMIT - Jira

Sergei Petrunia created issue - 2015-06-11 21:35

Sergei Petrunia made changes - 2015-06-11 21:36

Field	Original Value	New Value
Description	A long standing (and informally known) issue: Join optimizer makes its choices [almost] without regard for ORDER BY ... LIMIT clause. ORDER BY ... LIMIT optimizer is invoked when the join order is already fixed. If the picked join order doesn't allow to resolve ORDER BY ... LIMIT efficiently... then we end up with a very poor query plan. Example: {noformat} select * from t_fact join dim1 on t_fact.dim1_id= dim1.dim1_id join dim2 on t_fact.dim2_id= dim2.dim2_id order by t_fact.col1 limit 1000; {noformat} {noformat} +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+ \| id \| select_type \| table \| type \| possible_keys \| key \| key_len \| ref \| rows \| Extra \| +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+ \| 1 \| SIMPLE \| dim1 \| ALL \| PRIMARY \| NULL \| NULL \| NULL \| 500 \| Using temporary; Using filesort \| \| 1 \| SIMPLE \| t_fact \| ref \| dim1_id,dim2_id \| dim1_id \| 4 \| j3.dim1.dim1_id \| 1 \| \| \| 1 \| SIMPLE \| dim2 \| eq_ref \| PRIMARY \| PRIMARY \| 4 \| j3.t_fact.dim2_id \| 1 \| \| +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+ {noformat} This uses filesort and takes ~8 sec. Now, let's force the right join order: {noformat} select * from t_fact straight_join dim1 on t_fact.dim1_id= dim1.dim1_id straight_join dim2 on t_fact.dim2_id= dim2.dim2_id order by t_fact.col1 limit 1000; {noformat} {noformat} +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+ \| id \| select_type \| table \| type \| possible_keys \| key \| key_len \| ref \| rows \| Extra \| +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+ \| 1 \| SIMPLE \| t_fact \| index \| dim1_id,dim2_id \| col1 \| 4 \| NULL \| 1000 \| \| \| 1 \| SIMPLE \| dim1 \| eq_ref \| PRIMARY \| PRIMARY \| 4 \| j3.t_fact.dim1_id \| 1 \| \| \| 1 \| SIMPLE \| dim2 \| eq_ref \| PRIMARY \| PRIMARY \| 4 \| j3.t_fact.dim2_id \| 1 \| \| +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+ {noformat} This uses index to resolve the ORDER BY ... LIMIT and the select takes 0.01 sec to execute.	A long standing (and informally known) issue: Join optimizer makes its choices [almost] without regard for ORDER BY ... LIMIT clause. ORDER BY ... LIMIT optimizer is invoked when the join order is already fixed. If the picked join order doesn't allow to resolve ORDER BY ... LIMIT efficiently... then we end up with a very poor query plan. Example: {noformat} select * from t_fact join dim1 on t_fact.dim1_id= dim1.dim1_id join dim2 on t_fact.dim2_id= dim2.dim2_id order by t_fact.col1 limit 1000; {noformat} {noformat} +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+ \| id \| select_type \| table \| type \| possible_keys \| key \| key_len \| ref \| rows \| Extra \| +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+ \| 1 \| SIMPLE \| dim1 \| ALL \| PRIMARY \| NULL \| NULL \| NULL \| 500 \| Using temporary; Using filesort \| \| 1 \| SIMPLE \| t_fact \| ref \| dim1_id,dim2_id \| dim1_id \| 4 \| j3.dim1.dim1_id \| 1 \| \| \| 1 \| SIMPLE \| dim2 \| eq_ref \| PRIMARY \| PRIMARY \| 4 \| j3.t_fact.dim2_id \| 1 \| \| +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+ {noformat} This uses filesort and takes ~8 sec. Now, let's force the right join order: {noformat} select * from t_fact straight_join dim1 on t_fact.dim1_id= dim1.dim1_id straight_join dim2 on t_fact.dim2_id= dim2.dim2_id order by t_fact.col1 limit 1000; {noformat} {noformat} +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+ \| id \| select_type \| table \| type \| possible_keys \| key \| key_len \| ref \| rows \| Extra \| +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+ \| 1 \| SIMPLE \| t_fact \| index \| dim1_id,dim2_id \| col1 \| 4 \| NULL \| 1000 \| \| \| 1 \| SIMPLE \| dim1 \| eq_ref \| PRIMARY \| PRIMARY \| 4 \| j3.t_fact.dim1_id \| 1 \| \| \| 1 \| SIMPLE \| dim2 \| eq_ref \| PRIMARY \| PRIMARY \| 4 \| j3.t_fact.dim2_id \| 1 \| \| +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+ {noformat} This uses index to resolve the ORDER BY ... LIMIT and the select takes 0.01 sec to execute. Dataset: {noformat} create table ten(a int); insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9); create table one_k(a int); insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C; create table t_fact ( fact_id int not null, dim1_id int not null, dim2_id int not null, col1 int not null, primary key(fact_id), key(dim1_id), key(dim2_id), key(col1) ); insert into t_fact select A.a+1000B.a+10001000C.a, A.a, B.a, A.a+1000B.a+10001000C.a from one_k A , one_k B, ten C where A.a<500 and B.a<500 ; create table dim1 ( dim1_id int not null primary key, col1 int ); insert into dim1 select a,a from one_k where a<500; create table dim2 ( dim2_id int not null primary key, col1 int ); insert into dim2 select a,a from one_k where a<500; {noformat}

Sergei Petrunia made changes - 2015-06-11 21:37

Labels

optimizer order-by-optimization

Sergei Petrunia made changes - 2015-06-11 22:09

Fix Version/s

10.1 [ 16100 ]

Sergei Petrunia made changes - 2015-11-09 11:30

Link

This issue relates to MDEV-8880 [ MDEV-8880 ]

Sergei Petrunia made changes - 2015-11-26 20:56

Fix Version/s		10.2 [ 14601 ]
Fix Version/s	10.1 [ 16100 ]

Sergei Petrunia made changes - 2017-08-05 17:39

Status

Open [ 1 ]

In Progress [ 3 ]

Sergei Petrunia made changes - 2017-08-05 17:39

Status

In Progress [ 3 ]

Stalled [ 10000 ]

Sergei Golubchik made changes - 2017-08-05 18:09

Fix Version/s		10.4 [ 22408 ]
Fix Version/s	10.2 [ 14601 ]

Sergei Golubchik made changes - 2017-08-05 18:09

Affects Version/s	10.0.19 [ 19200 ]
Issue Type	Bug [ 1 ]	Task [ 3 ]

Sergei Petrunia made changes - 2018-01-19 10:08

Link

This issue relates to ~~MDEV-14621~~ [ ~~MDEV-14621~~ ]

Alice Sherepa made changes - 2018-03-05 10:46

Link

This issue is duplicated by ~~MDEV-14569~~ [ ~~MDEV-14569~~ ]

Julien Fritsch made changes - 2019-02-08 08:14

Support case ID

9377

not-9377

Ralf Gebhardt made changes - 2019-04-03 15:56

Fix Version/s

10.4 [ 22408 ]

Ralf Gebhardt made changes - 2019-04-03 15:56

NRE Projects

RM_105_CANDIDATE

Ralf Gebhardt made changes - 2019-04-23 17:45

NRE Projects

RM_105_CANDIDATE

RM_105_CANDIDATE RM_105_OPTIMIZER

Varun Gupta (Inactive) made changes - 2019-04-24 14:00

Assignee

Sergei Petrunia [ psergey ]

Varun Gupta [ varun ]

Varun Gupta (Inactive) made changes - 2019-04-29 06:28

Status

Stalled [ 10000 ]

In Progress [ 3 ]

Varun Gupta (Inactive) made changes - 2019-04-30 19:12

Link

This issue relates to ~~MDEV-16214~~ [ ~~MDEV-16214~~ ]

Varun Gupta (Inactive) made changes - 2019-05-28 07:19

Fix Version/s

10.5 [ 23123 ]

Varun Gupta (Inactive) made changes - 2019-07-23 11:58

Link

This issue relates to MDEV-20129 [ MDEV-20129 ]

Varun Gupta (Inactive) made changes - 2019-07-28 06:53

Link

This issue relates to ~~MDEV-13694~~ [ ~~MDEV-13694~~ ]

Varun Gupta (Inactive) made changes - 2019-07-30 08:58

Link

This issue relates to ~~MDEV-20209~~ [ ~~MDEV-20209~~ ]

Varun Gupta (Inactive) made changes - 2019-08-01 14:20

Link

This issue relates to MDEV-13275 [ MDEV-13275 ]

Sergei Golubchik made changes - 2019-08-19 13:53

Priority

Major [ 3 ]

Critical [ 2 ]

Varun Gupta (Inactive) made changes - 2019-08-31 08:42

Link

This issue relates to MDEV-20459 [ MDEV-20459 ]

Varun Gupta (Inactive) made changes - 2019-09-04 08:09

Assignee	Varun Gupta [ varun ]	Igor Babaev [ igor ]
Status	In Progress [ 3 ]	In Review [ 10002 ]

Varun Gupta (Inactive) made changes - 2019-09-10 09:36

Link

This issue relates to ~~MDEV-18094~~ [ ~~MDEV-18094~~ ]

Varun Gupta (Inactive) made changes - 2019-09-23 15:22

Summary

Poor optimization of JOIN and ORDER BY ... LIMIT

Complete cost-based optimization for ORDER BY with LIMIT

Brad Jorgensen made changes - 2019-09-27 16:09

Link

This issue relates to MDEV-19808 [ MDEV-19808 ]

Varun Gupta (Inactive) made changes - 2020-01-03 04:00

Link

This issue duplicates MDEV-21408 [ MDEV-21408 ]

Varun Gupta (Inactive) made changes - 2020-01-07 09:07

Link

This issue duplicates MDEV-21408 [ MDEV-21408 ]

Varun Gupta (Inactive) made changes - 2020-01-07 09:07

Link

This issue relates to MDEV-21408 [ MDEV-21408 ]

Varun Gupta (Inactive) made changes - 2020-01-29 18:30

Link

This issue includes MDEV-8002 [ MDEV-8002 ]

Igor Babaev (Inactive) made changes - 2020-01-29 20:45

Status

In Review [ 10002 ]

Stalled [ 10000 ]

Igor Babaev (Inactive) made changes - 2020-01-29 20:47

Assignee

Igor Babaev [ igor ]

Varun Gupta [ varun ]

Igor Babaev (Inactive) made changes - 2020-01-29 20:47

Comment

[ Review is not needed as it is a header mdev now.
]

Igor Babaev (Inactive) made changes - 2020-01-29 20:48

Assignee

Varun Gupta [ varun ]

Igor Babaev [ igor ]

Igor Babaev (Inactive) made changes - 2020-01-29 20:48

Status

Stalled [ 10000 ]

Open [ 1 ]

Varun Gupta (Inactive) made changes - 2020-01-29 20:49

Status

Open [ 1 ]

Confirmed [ 10101 ]

Igor Babaev (Inactive) made changes - 2020-01-29 20:50

Status

Confirmed [ 10101 ]

In Review [ 10002 ]

Sergei Golubchik made changes - 2020-02-03 16:18

Link

This issue relates to MDEV-21643 [ MDEV-21643 ]

Sergei Petrunia made changes - 2020-02-12 13:19

Link

This issue includes MDEV-21713 [ MDEV-21713 ]

Varun Gupta (Inactive) made changes - 2020-02-18 15:45

Assignee

Igor Babaev [ igor ]

Varun Gupta [ varun ]

Ralf Gebhardt made changes - 2020-03-27 17:05

Fix Version/s		10.6 [ 24028 ]
Fix Version/s	10.5 [ 23123 ]

Varun Gupta (Inactive) made changes - 2020-04-14 09:49

Status

In Review [ 10002 ]

Stalled [ 10000 ]

Varun Gupta (Inactive) made changes - 2020-04-24 09:00

Link

This issue includes MDEV-22360 [ MDEV-22360 ]

Sergei Golubchik made changes - 2020-08-16 20:03

Rank

Ranked higher

Varun Gupta (Inactive) made changes - 2021-02-09 11:53

Assignee	Varun Gupta [ varun ]	Sergei Petrunia [ psergey ]
Status	Stalled [ 10000 ]	In Review [ 10002 ]

Sergei Petrunia made changes - 2021-04-29 12:18

Fix Version/s		10.7 [ 24805 ]
Fix Version/s	10.6 [ 24028 ]

Rob Schwyzer (Inactive) made changes - 2021-05-27 18:59

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31251 ]

Rob Schwyzer (Inactive) made changes - 2021-06-01 22:55

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31310 ]

Rob Schwyzer (Inactive) made changes - 2021-06-03 00:23

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31321 ]

Rob Schwyzer (Inactive) made changes - 2021-06-07 20:48

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31339 ]

Rob Schwyzer (Inactive) made changes - 2021-06-10 16:59

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31351 ]

Rob Schwyzer (Inactive) made changes - 2021-06-15 23:07

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31371 ]

Rob Schwyzer (Inactive) made changes - 2021-06-17 18:26

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31382 ]

Rob Schwyzer (Inactive) made changes - 2021-06-17 20:35

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31382 ]

Rob Schwyzer (Inactive) made changes - 2021-06-23 00:37

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31396 ]

Rob Schwyzer (Inactive) made changes - 2021-06-25 17:21

Labels

optimizer order-by-optimization

ServiceNow optimizer order-by-optimization

Rob Schwyzer (Inactive) made changes - 2021-06-29 18:43

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31418 ]

Rob Schwyzer (Inactive) made changes - 2021-07-02 20:01

Labels

ServiceNow optimizer order-by-optimization

76qDvLB8Gju6Hs7nk3VY3EX42G795W5z optimizer order-by-optimization

Rob Schwyzer (Inactive) made changes - 2021-07-07 17:35

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31444 ]

Rob Schwyzer (Inactive) made changes - 2021-07-14 15:03

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31473 ]

Rob Schwyzer (Inactive) made changes - 2021-07-27 23:02

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31510 ]

Rob Schwyzer (Inactive) made changes - 2021-08-04 15:25

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31534 ]

Rob Schwyzer (Inactive) made changes - 2021-08-10 19:12

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31563 ]

Sergei Golubchik made changes - 2021-08-13 22:34

Labels

76qDvLB8Gju6Hs7nk3VY3EX42G795W5z optimizer order-by-optimization

optimizer order-by-optimization

Rob Schwyzer (Inactive) made changes - 2021-08-18 15:04

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31599 ]

Rob Schwyzer (Inactive) made changes - 2021-08-18 21:56

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31609 ]

Rob Schwyzer (Inactive) made changes - 2021-08-18 22:10

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31622 ]

Rob Schwyzer (Inactive) made changes - 2021-08-25 20:37

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31644 ]

Rob Schwyzer (Inactive) made changes - 2021-08-31 22:06

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31710 ]

Rob Schwyzer (Inactive) made changes - 2021-09-02 16:42

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31720 ]

Sergei Golubchik made changes - 2021-09-04 12:22

Priority

Critical [ 2 ]

Major [ 3 ]

Rob Schwyzer (Inactive) made changes - 2021-09-09 16:53

Remote Link

This issue links to "Page (Confluence)" [ 31737 ]

Rob Schwyzer (Inactive) made changes - 2021-09-16 16:48

Remote Link

This issue links to "Page (Confluence)" [ 31759 ]

Rob Schwyzer (Inactive) made changes - 2021-09-16 17:31

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31609 ]

Ralf Gebhardt made changes - 2021-10-14 13:11

Fix Version/s		10.8 [ 26121 ]
Fix Version/s	10.7 [ 24805 ]

Sergei Golubchik made changes - 2021-12-06 21:22

Workflow

MariaDB v3 [ 69896 ]

MariaDB v4 [ 131761 ]

Sergei Golubchik made changes - 2022-02-01 11:35

Fix Version/s		10.9 [ 26905 ]
Fix Version/s	10.8 [ 26121 ]

Ralf Gebhardt made changes - 2022-02-22 09:45

Fix Version/s		10.10 [ 27530 ]
Fix Version/s	10.9 [ 26905 ]

Sergei Golubchik made changes - 2022-06-15 13:36

Fix Version/s		10.11 [ 27614 ]
Fix Version/s	10.10 [ 27530 ]

AirFocus made changes - 2022-08-09 16:11

Description

A long standing (and informally known) issue:

Join optimizer makes its choices [almost] without regard for ORDER BY ... LIMIT clause. ORDER BY ... LIMIT optimizer is invoked when the join order is already fixed. If the picked join order doesn't allow to resolve ORDER BY ... LIMIT efficiently... then we end up with a very poor query plan.

Example:
{noformat}
select * from
  t_fact
    join dim1 on t_fact.dim1_id= dim1.dim1_id
    join dim2 on t_fact.dim2_id= dim2.dim2_id
order by
   t_fact.col1
limit 1000;
{noformat}
{noformat}
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
| 1 | SIMPLE | dim1 | ALL | PRIMARY | NULL | NULL | NULL | 500 | Using temporary; Using filesort |
| 1 | SIMPLE | t_fact | ref | dim1_id,dim2_id | dim1_id | 4 | j3.dim1.dim1_id | 1 | |
| 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
{noformat}

This uses filesort and takes ~8 sec.
Now, let's force the right join order:
{noformat}
select * from
  t_fact
    straight_join dim1 on t_fact.dim1_id= dim1.dim1_id
    straight_join dim2 on t_fact.dim2_id= dim2.dim2_id
order by
   t_fact.col1
limit 1000;
{noformat}

{noformat}
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
| 1 | SIMPLE | t_fact | index | dim1_id,dim2_id | col1 | 4 | NULL | 1000 | |
| 1 | SIMPLE | dim1 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim1_id | 1 | |
| 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
{noformat}
This uses index to resolve the ORDER BY ... LIMIT and the select takes 0.01 sec to execute.

Dataset:
{noformat}
create table ten(a int);
insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);

create table one_k(a int);
insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;

create table t_fact
(
  fact_id int not null,
  dim1_id int not null,
  dim2_id int not null,
  col1 int not null,
  primary key(fact_id),
  key(dim1_id),
  key(dim2_id),
  key(col1)
);

insert into t_fact
select
  A.a+1000*B.a+1000*1000*C.a,
  A.a,
  B.a,
  A.a+1000*B.a+1000*1000*C.a
from
  one_k A ,
  one_k B,
  ten C
where
A.a<500 and B.a<500
;

create table dim1
(
  dim1_id int not null primary key,
  col1 int
);

insert into dim1
select a,a from one_k where a<500;

create table dim2
(
  dim2_id int not null primary key,
  col1 int
);
insert into dim2
select a,a from one_k where a<500;
{noformat}

A long standing (and informally known) issue:

Join optimizer makes its choices \[almost\] without regard for ORDER BY ... LIMIT clause. ORDER BY ... LIMIT optimizer is invoked when the join order is already fixed. If the picked join order doesn't allow to resolve ORDER BY ... LIMIT efficiently... then we end up with a very poor query plan.

Example:

{noformat}
select * from
  t_fact
    join dim1 on t_fact.dim1_id= dim1.dim1_id
    join dim2 on t_fact.dim2_id= dim2.dim2_id
order by
   t_fact.col1
limit 1000;
{noformat}

{noformat}
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
| 1 | SIMPLE | dim1 | ALL | PRIMARY | NULL | NULL | NULL | 500 | Using temporary; Using filesort |
| 1 | SIMPLE | t_fact | ref | dim1_id,dim2_id | dim1_id | 4 | j3.dim1.dim1_id | 1 | |
| 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
{noformat}

This uses filesort and takes ~8 sec.
Now, let's force the right join order:

{noformat}
select * from
  t_fact
    straight_join dim1 on t_fact.dim1_id= dim1.dim1_id
    straight_join dim2 on t_fact.dim2_id= dim2.dim2_id
order by
   t_fact.col1
limit 1000;
{noformat}

{noformat}
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
| 1 | SIMPLE | t_fact | index | dim1_id,dim2_id | col1 | 4 | NULL | 1000 | |
| 1 | SIMPLE | dim1 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim1_id | 1 | |
| 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
{noformat}

This uses index to resolve the ORDER BY ... LIMIT and the select takes 0.01 sec to execute.

Dataset:

{noformat}
create table ten(a int);
insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);

create table one_k(a int);
insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;

create table t_fact
(
  fact_id int not null,
  dim1_id int not null,
  dim2_id int not null,
  col1 int not null,
  primary key(fact_id),
  key(dim1_id),
  key(dim2_id),
  key(col1)
);

insert into t_fact
select
  A.a+1000*B.a+1000*1000*C.a,
  A.a,
  B.a,
  A.a+1000*B.a+1000*1000*C.a
from
  one_k A ,
  one_k B,
  ten C
where
A.a<500 and B.a<500
;

create table dim1
(
  dim1_id int not null primary key,
  col1 int
);

insert into dim1
select a,a from one_k where a<500;

create table dim2
(
  dim2_id int not null primary key,
  col1 int
);
insert into dim2
select a,a from one_k where a<500;
{noformat}

Julien Fritsch made changes - 2022-08-10 08:03

Description

A long standing (and informally known) issue:

Join optimizer makes its choices \[almost\] without regard for ORDER BY ... LIMIT clause. ORDER BY ... LIMIT optimizer is invoked when the join order is already fixed. If the picked join order doesn't allow to resolve ORDER BY ... LIMIT efficiently... then we end up with a very poor query plan.

Example:

{noformat}
select * from
  t_fact
    join dim1 on t_fact.dim1_id= dim1.dim1_id
    join dim2 on t_fact.dim2_id= dim2.dim2_id
order by
   t_fact.col1
limit 1000;
{noformat}

{noformat}
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
| 1 | SIMPLE | dim1 | ALL | PRIMARY | NULL | NULL | NULL | 500 | Using temporary; Using filesort |
| 1 | SIMPLE | t_fact | ref | dim1_id,dim2_id | dim1_id | 4 | j3.dim1.dim1_id | 1 | |
| 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
{noformat}

This uses filesort and takes ~8 sec.
Now, let's force the right join order:

{noformat}
select * from
  t_fact
    straight_join dim1 on t_fact.dim1_id= dim1.dim1_id
    straight_join dim2 on t_fact.dim2_id= dim2.dim2_id
order by
   t_fact.col1
limit 1000;
{noformat}

{noformat}
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
| 1 | SIMPLE | t_fact | index | dim1_id,dim2_id | col1 | 4 | NULL | 1000 | |
| 1 | SIMPLE | dim1 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim1_id | 1 | |
| 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
{noformat}

This uses index to resolve the ORDER BY ... LIMIT and the select takes 0.01 sec to execute.

Dataset:

{noformat}
create table ten(a int);
insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);

create table one_k(a int);
insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;

create table t_fact
(
  fact_id int not null,
  dim1_id int not null,
  dim2_id int not null,
  col1 int not null,
  primary key(fact_id),
  key(dim1_id),
  key(dim2_id),
  key(col1)
);

insert into t_fact
select
  A.a+1000*B.a+1000*1000*C.a,
  A.a,
  B.a,
  A.a+1000*B.a+1000*1000*C.a
from
  one_k A ,
  one_k B,
  ten C
where
A.a<500 and B.a<500
;

create table dim1
(
  dim1_id int not null primary key,
  col1 int
);

insert into dim1
select a,a from one_k where a<500;

create table dim2
(
  dim2_id int not null primary key,
  col1 int
);
insert into dim2
select a,a from one_k where a<500;
{noformat}

A long standing (and informally known) issue:-

Join optimizer makes its choices \[almost\] without regard for ORDER BY ... LIMIT clause. ORDER BY ... LIMIT optimizer is invoked when the join order is already fixed. If the picked join order doesn't allow to resolve ORDER BY ... LIMIT efficiently... then we end up with a very poor query plan.

Example:

{noformat}
select * from
  t_fact
    join dim1 on t_fact.dim1_id= dim1.dim1_id
    join dim2 on t_fact.dim2_id= dim2.dim2_id
order by
   t_fact.col1
limit 1000;
{noformat}

{noformat}
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
| 1 | SIMPLE | dim1 | ALL | PRIMARY | NULL | NULL | NULL | 500 | Using temporary; Using filesort |
| 1 | SIMPLE | t_fact | ref | dim1_id,dim2_id | dim1_id | 4 | j3.dim1.dim1_id | 1 | |
| 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
{noformat}

This uses filesort and takes ~8 sec.
Now, let's force the right join order:

{noformat}
select * from
  t_fact
    straight_join dim1 on t_fact.dim1_id= dim1.dim1_id
    straight_join dim2 on t_fact.dim2_id= dim2.dim2_id
order by
   t_fact.col1
limit 1000;
{noformat}

{noformat}
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
| 1 | SIMPLE | t_fact | index | dim1_id,dim2_id | col1 | 4 | NULL | 1000 | |
| 1 | SIMPLE | dim1 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim1_id | 1 | |
| 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
{noformat}

This uses index to resolve the ORDER BY ... LIMIT and the select takes 0.01 sec to execute.

Dataset:

{noformat}
create table ten(a int);
insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);

create table one_k(a int);
insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;

create table t_fact
(
  fact_id int not null,
  dim1_id int not null,
  dim2_id int not null,
  col1 int not null,
  primary key(fact_id),
  key(dim1_id),
  key(dim2_id),
  key(col1)
);

insert into t_fact
select
  A.a+1000*B.a+1000*1000*C.a,
  A.a,
  B.a,
  A.a+1000*B.a+1000*1000*C.a
from
  one_k A ,
  one_k B,
  ten C
where
A.a<500 and B.a<500
;

create table dim1
(
  dim1_id int not null primary key,
  col1 int
);

insert into dim1
select a,a from one_k where a<500;

create table dim2
(
  dim2_id int not null primary key,
  col1 int
);
insert into dim2
select a,a from one_k where a<500;
{noformat}

Julien Fritsch made changes - 2022-08-10 08:04

Description

A long standing (and informally known) issue:-

Join optimizer makes its choices \[almost\] without regard for ORDER BY ... LIMIT clause. ORDER BY ... LIMIT optimizer is invoked when the join order is already fixed. If the picked join order doesn't allow to resolve ORDER BY ... LIMIT efficiently... then we end up with a very poor query plan.

Example:

{noformat}
select * from
  t_fact
    join dim1 on t_fact.dim1_id= dim1.dim1_id
    join dim2 on t_fact.dim2_id= dim2.dim2_id
order by
   t_fact.col1
limit 1000;
{noformat}

{noformat}
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
| 1 | SIMPLE | dim1 | ALL | PRIMARY | NULL | NULL | NULL | 500 | Using temporary; Using filesort |
| 1 | SIMPLE | t_fact | ref | dim1_id,dim2_id | dim1_id | 4 | j3.dim1.dim1_id | 1 | |
| 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
{noformat}

This uses filesort and takes ~8 sec.
Now, let's force the right join order:

{noformat}
select * from
  t_fact
    straight_join dim1 on t_fact.dim1_id= dim1.dim1_id
    straight_join dim2 on t_fact.dim2_id= dim2.dim2_id
order by
   t_fact.col1
limit 1000;
{noformat}

{noformat}
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
| 1 | SIMPLE | t_fact | index | dim1_id,dim2_id | col1 | 4 | NULL | 1000 | |
| 1 | SIMPLE | dim1 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim1_id | 1 | |
| 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
{noformat}

This uses index to resolve the ORDER BY ... LIMIT and the select takes 0.01 sec to execute.

Dataset:

{noformat}
create table ten(a int);
insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);

create table one_k(a int);
insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;

create table t_fact
(
  fact_id int not null,
  dim1_id int not null,
  dim2_id int not null,
  col1 int not null,
  primary key(fact_id),
  key(dim1_id),
  key(dim2_id),
  key(col1)
);

insert into t_fact
select
  A.a+1000*B.a+1000*1000*C.a,
  A.a,
  B.a,
  A.a+1000*B.a+1000*1000*C.a
from
  one_k A ,
  one_k B,
  ten C
where
A.a<500 and B.a<500
;

create table dim1
(
  dim1_id int not null primary key,
  col1 int
);

insert into dim1
select a,a from one_k where a<500;

create table dim2
(
  dim2_id int not null primary key,
  col1 int
);
insert into dim2
select a,a from one_k where a<500;
{noformat}

A long standing (and informally known) issue:

Join optimizer makes its choices [almost] without regard for ORDER BY ... LIMIT clause. ORDER BY ... LIMIT optimizer is invoked when the join order is already fixed. If the picked join order doesn't allow to resolve ORDER BY ... LIMIT efficiently... then we end up with a very poor query plan.

Example:

{noformat}
select * from
  t_fact
    join dim1 on t_fact.dim1_id= dim1.dim1_id
    join dim2 on t_fact.dim2_id= dim2.dim2_id
order by
   t_fact.col1
limit 1000;
{noformat}

{noformat}
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
| 1 | SIMPLE | dim1 | ALL | PRIMARY | NULL | NULL | NULL | 500 | Using temporary; Using filesort |
| 1 | SIMPLE | t_fact | ref | dim1_id,dim2_id | dim1_id | 4 | j3.dim1.dim1_id | 1 | |
| 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
{noformat}

This uses filesort and takes ~8 sec.
Now, let's force the right join order:

{noformat}
select * from
  t_fact
    straight_join dim1 on t_fact.dim1_id= dim1.dim1_id
    straight_join dim2 on t_fact.dim2_id= dim2.dim2_id
order by
   t_fact.col1
limit 1000;
{noformat}

{noformat}
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
| 1 | SIMPLE | t_fact | index | dim1_id,dim2_id | col1 | 4 | NULL | 1000 | |
| 1 | SIMPLE | dim1 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim1_id | 1 | |
| 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
{noformat}

This uses index to resolve the ORDER BY ... LIMIT and the select takes 0.01 sec to execute.

Dataset:

{noformat}
create table ten(a int);
insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);

create table one_k(a int);
insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;

create table t_fact
(
  fact_id int not null,
  dim1_id int not null,
  dim2_id int not null,
  col1 int not null,
  primary key(fact_id),
  key(dim1_id),
  key(dim2_id),
  key(col1)
);

insert into t_fact
select
  A.a+1000*B.a+1000*1000*C.a,
  A.a,
  B.a,
  A.a+1000*B.a+1000*1000*C.a
from
  one_k A ,
  one_k B,
  ten C
where
A.a<500 and B.a<500
;

create table dim1
(
  dim1_id int not null primary key,
  col1 int
);

insert into dim1
select a,a from one_k where a<500;

create table dim2
(
  dim2_id int not null primary key,
  col1 int
);
insert into dim2
select a,a from one_k where a<500;
{noformat}

Julien Fritsch made changes - 2022-08-11 10:19

Priority

Major [ 3 ]

Critical [ 2 ]

Sergei Petrunia made changes - 2022-08-11 17:50

Fix Version/s		10.12 [ 28320 ]
Fix Version/s	10.11 [ 27614 ]

Sergei Golubchik made changes - 2022-11-08 15:40

Link

This issue is duplicated by ~~MDEV-6813~~ [ ~~MDEV-6813~~ ]

Sergei Petrunia made changes - 2022-12-09 10:46

Status

In Review [ 10002 ]

Stalled [ 10000 ]

Sergei Petrunia made changes - 2022-12-09 10:46

Priority

Critical [ 2 ]

Major [ 3 ]

Sergei Petrunia made changes - 2023-03-24 16:23

Fix Version/s		11.1 [ 28549 ]
Fix Version/s	11.0 [ 28320 ]

Sergei Petrunia made changes - 2023-03-24 16:24

Fix Version/s		11.2 [ 28603 ]
Fix Version/s	11.1 [ 28549 ]

Ralf Gebhardt made changes - 2023-08-03 07:45

Fix Version/s		11.3 [ 28565 ]
Fix Version/s	11.2 [ 28603 ]

Sergei Petrunia made changes - 2023-08-15 13:30

Fix Version/s		11.4 [ 29301 ]
Fix Version/s	11.3 [ 28565 ]

Sergei Petrunia made changes - 2023-11-03 08:06

Labels

optimizer order-by-optimization

optimizer optimizer-feature order-by-optimization

Julien Fritsch made changes - 2023-11-30 16:29

Issue Type

Task [ 3 ]

New Feature [ 2 ]

Sergei Petrunia made changes - 2023-12-12 15:16

Fix Version/s		11.5 [ 29506 ]
Fix Version/s	11.4 [ 29301 ]

Roel Van de Paar made changes - 2023-12-16 02:15

Link

This issue is blocked by MDEV-21643 [ MDEV-21643 ]

Roel Van de Paar made changes - 2023-12-16 02:15

Link

This issue is blocked by MDEV-21643 [ MDEV-21643 ]

Roel Van de Paar made changes - 2023-12-16 02:16

Link

This issue is blocked by MDEV-21643 [ MDEV-21643 ]

Roel Van de Paar made changes - 2023-12-16 02:16

Link

This issue relates to MDEV-21643 [ MDEV-21643 ]

Sergei Golubchik made changes - 2024-03-19 18:31

Fix Version/s		11.6 [ 29515 ]
Fix Version/s	11.5 [ 29506 ]

Ralf Gebhardt made changes - 2024-06-21 09:21

Fix Version/s		11.7 [ 29815 ]
Fix Version/s	11.6 [ 29515 ]

Jira Automation (IT) made changes - 2024-07-04 08:53

Zendesk Related Tickets		201658 202060 159278
Zendesk active tickets		201658

Sergei Golubchik made changes - 2024-09-24 13:54

Fix Version/s		11.8 [ 29921 ]
Fix Version/s	11.7 [ 29815 ]

Sergei Golubchik made changes - 2024-10-24 14:56

Link

This issue relates to ~~MDEV-35246~~ [ ~~MDEV-35246~~ ]

Alice Sherepa made changes - 2024-11-05 13:36

Link

This issue relates to MDEV-35280 [ MDEV-35280 ]

Alice Sherepa made changes - 2024-11-05 13:45

Link

This issue relates to MDEV-18079 [ MDEV-18079 ]

Sergei Golubchik made changes - 2024-11-16 12:12

Link

This issue blocks MDEV-33412 [ MDEV-33412 ]

Sergei Petrunia made changes - 2024-11-26 16:12

Fix Version/s		11.9 [ 29945 ]
Fix Version/s	11.8 [ 29921 ]

Sergei Petrunia made changes - 2025-02-26 08:40

Fix Version/s		12.1 [ 29992 ]
Fix Version/s	12.0 [ 29945 ]

Sergei Petrunia made changes - 2025-02-27 13:39

Link

This issue is blocked by MDEV-21643 [ MDEV-21643 ]

Sergei Petrunia made changes - 2025-02-27 13:40

Link

This issue includes MDEV-21643 [ MDEV-21643 ]

Sergei Petrunia made changes - 2025-02-27 13:52

Link

This issue relates to ~~MDEV-34720~~ [ ~~MDEV-34720~~ ]

Sergei Petrunia made changes - 2025-03-11 08:53

Link

This issue relates to ~~MDEV-25480~~ [ ~~MDEV-25480~~ ]

Julien Fritsch made changes - 2025-04-02 17:34

Sprint

Server 12.1 dev sprint [ 793 ]

Julien Fritsch made changes - 1 week ago

Sprint

Server 12.1 dev sprint [ 793 ]

MariaDB Server

Complete cost-based optimization for ORDER BY with LIMIT

Details

Description

Attachments

Issue Links

Sub-Tasks

Activity

People

Dates

Git Integration