Details
-
New Feature
-
Status: Stalled (View Workflow)
-
Major
-
Resolution: Unresolved
Description
A long standing (and informally known) issue:
Join optimizer makes its choices [almost] without regard for ORDER BY ... LIMIT clause. ORDER BY ... LIMIT optimizer is invoked when the join order is already fixed. If the picked join order doesn't allow to resolve ORDER BY ... LIMIT efficiently... then we end up with a very poor query plan.
Example:
select * from
|
t_fact
|
join dim1 on t_fact.dim1_id= dim1.dim1_id
|
join dim2 on t_fact.dim2_id= dim2.dim2_id
|
order by
|
t_fact.col1
|
limit 1000;
|
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
|
| 1 | SIMPLE | dim1 | ALL | PRIMARY | NULL | NULL | NULL | 500 | Using temporary; Using filesort |
|
| 1 | SIMPLE | t_fact | ref | dim1_id,dim2_id | dim1_id | 4 | j3.dim1.dim1_id | 1 | |
|
| 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | |
|
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
|
This uses filesort and takes ~8 sec.
Now, let's force the right join order:
select * from
|
t_fact
|
straight_join dim1 on t_fact.dim1_id= dim1.dim1_id
|
straight_join dim2 on t_fact.dim2_id= dim2.dim2_id
|
order by
|
t_fact.col1
|
limit 1000;
|
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
|
| 1 | SIMPLE | t_fact | index | dim1_id,dim2_id | col1 | 4 | NULL | 1000 | |
|
| 1 | SIMPLE | dim1 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim1_id | 1 | |
|
| 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | |
|
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
|
This uses index to resolve the ORDER BY ... LIMIT and the select takes 0.01 sec to execute.
Dataset:
create table ten(a int);
|
insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
|
|
create table one_k(a int);
|
insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;
|
|
create table t_fact
|
(
|
fact_id int not null,
|
dim1_id int not null,
|
dim2_id int not null,
|
col1 int not null,
|
primary key(fact_id),
|
key(dim1_id),
|
key(dim2_id),
|
key(col1)
|
);
|
|
insert into t_fact
|
select
|
A.a+1000*B.a+1000*1000*C.a,
|
A.a,
|
B.a,
|
A.a+1000*B.a+1000*1000*C.a
|
from
|
one_k A ,
|
one_k B,
|
ten C
|
where
|
A.a<500 and B.a<500
|
;
|
|
|
create table dim1
|
(
|
dim1_id int not null primary key,
|
col1 int
|
);
|
|
insert into dim1
|
select a,a from one_k where a<500;
|
|
create table dim2
|
(
|
dim2_id int not null primary key,
|
col1 int
|
);
|
insert into dim2
|
select a,a from one_k where a<500;
|
Attachments
Issue Links
- blocks
-
MDEV-33412 cost-based optimizer choice for k-NN indexes
-
- Open
-
- includes
-
MDEV-8002 Ability to use index for order-by and covering index changes with type of join to small table
-
- Open
-
-
MDEV-21643 test correctness of cost_based_order_by_limit
-
- Stalled
-
-
MDEV-21713 LIMIT optimization and selectivity: pessimistic estimates cause optimistic plans
-
- Open
-
-
MDEV-22360 Sufficient conditions for accurate calculation of join cardinality
-
- Stalled
-
- is duplicated by
-
MDEV-6813 ORDER BY limit optimizer doesn't take condition selectivity into account
-
- Closed
-
-
MDEV-14569 Query runs slower using a UNIQUE KEY vs. a KEY
-
- Closed
-
- relates to
-
MDEV-18079 Opportunistic optimization for ORDER BY LIMIT N queries
-
- Open
-
-
MDEV-35280 Performance Issue on TPC-H Query 2
-
- Open
-
-
MDEV-8880 Very suboptimal join order is generated for a simple query even if MariaDB query planner knows the other is better in every sense
-
- Stalled
-
-
MDEV-13275 Query optimizer not picking optimal ORDER BY PRIMARY in case of INNER JOIN on PRIMARY like it does for similar WHERE condition
-
- Stalled
-
-
MDEV-13694 Wrong result upon GROUP BY with orderby_uses_equalities=on
-
- Closed
-
-
MDEV-14621 Query very slow compared to Mysql
-
- Closed
-
-
MDEV-16214 Incorrect plan taken by the optimizer , uses INDEX instead of ref access with ORDER BY
-
- Closed
-
-
MDEV-18094 Query with order by limit picking index scan over filesort
-
- Closed
-
-
MDEV-19808 Add Optimizer Switch for Filesort with Small LIMIT Optimization
-
- Open
-
-
MDEV-20129 Equality propagation for ORDER BY items do not work with expressions
-
- Stalled
-
-
MDEV-20209 Using different index with same range gives different number of records
-
- Closed
-
-
MDEV-20459 Selectivity of equality condition in ref access not discounted if range access on same index involved a non-equality condition
-
- Stalled
-
-
MDEV-21408 Performances testing for ORDER BY with LIMIT optimization
-
- Open
-
-
MDEV-25480 Optimizer uses wrong index
-
- Closed
-
-
MDEV-34720 Poor plan choice for large JOIN with ORDER BY and small LIMIT
-
- Closed
-
-
MDEV-35246 Vector search skips a row in the table
-
- Closed
-
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
Activity
Field | Original Value | New Value |
---|---|---|
Description |
A long standing (and informally known) issue: Join optimizer makes its choices [almost] without regard for ORDER BY ... LIMIT clause. ORDER BY ... LIMIT optimizer is invoked when the join order is already fixed. If the picked join order doesn't allow to resolve ORDER BY ... LIMIT efficiently... then we end up with a very poor query plan. Example: {noformat} select * from t_fact join dim1 on t_fact.dim1_id= dim1.dim1_id join dim2 on t_fact.dim2_id= dim2.dim2_id order by t_fact.col1 limit 1000; {noformat} {noformat} +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+ | 1 | SIMPLE | dim1 | ALL | PRIMARY | NULL | NULL | NULL | 500 | Using temporary; Using filesort | | 1 | SIMPLE | t_fact | ref | dim1_id,dim2_id | dim1_id | 4 | j3.dim1.dim1_id | 1 | | | 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | | +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+ {noformat} This uses filesort and takes ~8 sec. Now, let's force the right join order: {noformat} select * from t_fact straight_join dim1 on t_fact.dim1_id= dim1.dim1_id straight_join dim2 on t_fact.dim2_id= dim2.dim2_id order by t_fact.col1 limit 1000; {noformat} {noformat} +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+ | 1 | SIMPLE | t_fact | index | dim1_id,dim2_id | col1 | 4 | NULL | 1000 | | | 1 | SIMPLE | dim1 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim1_id | 1 | | | 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | | +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+ {noformat} This uses index to resolve the ORDER BY ... LIMIT and the select takes 0.01 sec to execute. |
A long standing (and informally known) issue: Join optimizer makes its choices [almost] without regard for ORDER BY ... LIMIT clause. ORDER BY ... LIMIT optimizer is invoked when the join order is already fixed. If the picked join order doesn't allow to resolve ORDER BY ... LIMIT efficiently... then we end up with a very poor query plan. Example: {noformat} select * from t_fact join dim1 on t_fact.dim1_id= dim1.dim1_id join dim2 on t_fact.dim2_id= dim2.dim2_id order by t_fact.col1 limit 1000; {noformat} {noformat} +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+ | 1 | SIMPLE | dim1 | ALL | PRIMARY | NULL | NULL | NULL | 500 | Using temporary; Using filesort | | 1 | SIMPLE | t_fact | ref | dim1_id,dim2_id | dim1_id | 4 | j3.dim1.dim1_id | 1 | | | 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | | +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+ {noformat} This uses filesort and takes ~8 sec. Now, let's force the right join order: {noformat} select * from t_fact straight_join dim1 on t_fact.dim1_id= dim1.dim1_id straight_join dim2 on t_fact.dim2_id= dim2.dim2_id order by t_fact.col1 limit 1000; {noformat} {noformat} +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+ | 1 | SIMPLE | t_fact | index | dim1_id,dim2_id | col1 | 4 | NULL | 1000 | | | 1 | SIMPLE | dim1 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim1_id | 1 | | | 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | | +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+ {noformat} This uses index to resolve the ORDER BY ... LIMIT and the select takes 0.01 sec to execute. Dataset: {noformat} create table ten(a int); insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9); create table one_k(a int); insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C; create table t_fact ( fact_id int not null, dim1_id int not null, dim2_id int not null, col1 int not null, primary key(fact_id), key(dim1_id), key(dim2_id), key(col1) ); insert into t_fact select A.a+1000*B.a+1000*1000*C.a, A.a, B.a, A.a+1000*B.a+1000*1000*C.a from one_k A , one_k B, ten C where A.a<500 and B.a<500 ; create table dim1 ( dim1_id int not null primary key, col1 int ); insert into dim1 select a,a from one_k where a<500; create table dim2 ( dim2_id int not null primary key, col1 int ); insert into dim2 select a,a from one_k where a<500; {noformat} |
Labels | optimizer order-by-optimization |
Fix Version/s | 10.1 [ 16100 ] |
Fix Version/s | 10.2 [ 14601 ] | |
Fix Version/s | 10.1 [ 16100 ] |
Status | Open [ 1 ] | In Progress [ 3 ] |
Status | In Progress [ 3 ] | Stalled [ 10000 ] |
Fix Version/s | 10.4 [ 22408 ] | |
Fix Version/s | 10.2 [ 14601 ] |
Affects Version/s | 10.0.19 [ 19200 ] | |
Issue Type | Bug [ 1 ] | Task [ 3 ] |
Link |
This issue relates to |
Link |
This issue is duplicated by |
Support case ID | 9377 | not-9377 |
Fix Version/s | 10.4 [ 22408 ] |
NRE Projects | RM_105_CANDIDATE |
NRE Projects | RM_105_CANDIDATE | RM_105_CANDIDATE RM_105_OPTIMIZER |
Assignee | Sergei Petrunia [ psergey ] | Varun Gupta [ varun ] |
Status | Stalled [ 10000 ] | In Progress [ 3 ] |
Link |
This issue relates to |
Fix Version/s | 10.5 [ 23123 ] |
Link | This issue relates to MDEV-20129 [ MDEV-20129 ] |
Link |
This issue relates to |
Link |
This issue relates to |
Link | This issue relates to MDEV-13275 [ MDEV-13275 ] |
Priority | Major [ 3 ] | Critical [ 2 ] |
Link | This issue relates to MDEV-20459 [ MDEV-20459 ] |
Assignee | Varun Gupta [ varun ] | Igor Babaev [ igor ] |
Status | In Progress [ 3 ] | In Review [ 10002 ] |
Link |
This issue relates to |
Summary | Poor optimization of JOIN and ORDER BY ... LIMIT | Complete cost-based optimization for ORDER BY with LIMIT |
Link | This issue relates to MDEV-19808 [ MDEV-19808 ] |
Link | This issue duplicates MDEV-21408 [ MDEV-21408 ] |
Link | This issue duplicates MDEV-21408 [ MDEV-21408 ] |
Link | This issue relates to MDEV-21408 [ MDEV-21408 ] |
Status | In Review [ 10002 ] | Stalled [ 10000 ] |
Assignee | Igor Babaev [ igor ] | Varun Gupta [ varun ] |
Comment |
[ Review is not needed as it is a header mdev now.
] |
Assignee | Varun Gupta [ varun ] | Igor Babaev [ igor ] |
Status | Stalled [ 10000 ] | Open [ 1 ] |
Status | Open [ 1 ] | Confirmed [ 10101 ] |
Status | Confirmed [ 10101 ] | In Review [ 10002 ] |
Link | This issue relates to MDEV-21643 [ MDEV-21643 ] |
Link | This issue includes MDEV-21713 [ MDEV-21713 ] |
Assignee | Igor Babaev [ igor ] | Varun Gupta [ varun ] |
Fix Version/s | 10.6 [ 24028 ] | |
Fix Version/s | 10.5 [ 23123 ] |
Status | In Review [ 10002 ] | Stalled [ 10000 ] |
Link | This issue includes MDEV-22360 [ MDEV-22360 ] |
Rank | Ranked higher |
Assignee | Varun Gupta [ varun ] | Sergei Petrunia [ psergey ] |
Status | Stalled [ 10000 ] | In Review [ 10002 ] |
Fix Version/s | 10.7 [ 24805 ] | |
Fix Version/s | 10.6 [ 24028 ] |
Remote Link | This issue links to "Page (MariaDB Confluence)" [ 31251 ] |
Remote Link | This issue links to "Page (MariaDB Confluence)" [ 31310 ] |
Remote Link | This issue links to "Page (MariaDB Confluence)" [ 31321 ] |
Remote Link | This issue links to "Page (MariaDB Confluence)" [ 31339 ] |
Remote Link | This issue links to "Page (MariaDB Confluence)" [ 31351 ] |
Remote Link | This issue links to "Page (MariaDB Confluence)" [ 31371 ] |
Remote Link | This issue links to "Page (MariaDB Confluence)" [ 31382 ] |
Remote Link | This issue links to "Page (MariaDB Confluence)" [ 31382 ] |
Remote Link | This issue links to "Page (MariaDB Confluence)" [ 31396 ] |
Labels | optimizer order-by-optimization | ServiceNow optimizer order-by-optimization |
Remote Link | This issue links to "Page (MariaDB Confluence)" [ 31418 ] |
Labels | ServiceNow optimizer order-by-optimization | 76qDvLB8Gju6Hs7nk3VY3EX42G795W5z optimizer order-by-optimization |
Remote Link | This issue links to "Page (MariaDB Confluence)" [ 31444 ] |
Remote Link | This issue links to "Page (MariaDB Confluence)" [ 31473 ] |
Remote Link | This issue links to "Page (MariaDB Confluence)" [ 31510 ] |
Remote Link | This issue links to "Page (MariaDB Confluence)" [ 31534 ] |
Remote Link | This issue links to "Page (MariaDB Confluence)" [ 31563 ] |
Labels | 76qDvLB8Gju6Hs7nk3VY3EX42G795W5z optimizer order-by-optimization | optimizer order-by-optimization |
Remote Link | This issue links to "Page (MariaDB Confluence)" [ 31599 ] |
Remote Link | This issue links to "Page (MariaDB Confluence)" [ 31609 ] |
Remote Link | This issue links to "Page (MariaDB Confluence)" [ 31622 ] |
Remote Link | This issue links to "Page (MariaDB Confluence)" [ 31644 ] |
Remote Link | This issue links to "Page (MariaDB Confluence)" [ 31710 ] |
Remote Link | This issue links to "Page (MariaDB Confluence)" [ 31720 ] |
Priority | Critical [ 2 ] | Major [ 3 ] |
Remote Link | This issue links to "Page (Confluence)" [ 31737 ] |
Remote Link | This issue links to "Page (Confluence)" [ 31759 ] |
Remote Link | This issue links to "Page (MariaDB Confluence)" [ 31609 ] |
Fix Version/s | 10.8 [ 26121 ] | |
Fix Version/s | 10.7 [ 24805 ] |
Workflow | MariaDB v3 [ 69896 ] | MariaDB v4 [ 131761 ] |
Fix Version/s | 10.9 [ 26905 ] | |
Fix Version/s | 10.8 [ 26121 ] |
Fix Version/s | 10.10 [ 27530 ] | |
Fix Version/s | 10.9 [ 26905 ] |
Fix Version/s | 10.11 [ 27614 ] | |
Fix Version/s | 10.10 [ 27530 ] |
Description |
A long standing (and informally known) issue: Join optimizer makes its choices [almost] without regard for ORDER BY ... LIMIT clause. ORDER BY ... LIMIT optimizer is invoked when the join order is already fixed. If the picked join order doesn't allow to resolve ORDER BY ... LIMIT efficiently... then we end up with a very poor query plan. Example: {noformat} select * from t_fact join dim1 on t_fact.dim1_id= dim1.dim1_id join dim2 on t_fact.dim2_id= dim2.dim2_id order by t_fact.col1 limit 1000; {noformat} {noformat} +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+ | 1 | SIMPLE | dim1 | ALL | PRIMARY | NULL | NULL | NULL | 500 | Using temporary; Using filesort | | 1 | SIMPLE | t_fact | ref | dim1_id,dim2_id | dim1_id | 4 | j3.dim1.dim1_id | 1 | | | 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | | +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+ {noformat} This uses filesort and takes ~8 sec. Now, let's force the right join order: {noformat} select * from t_fact straight_join dim1 on t_fact.dim1_id= dim1.dim1_id straight_join dim2 on t_fact.dim2_id= dim2.dim2_id order by t_fact.col1 limit 1000; {noformat} {noformat} +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+ | 1 | SIMPLE | t_fact | index | dim1_id,dim2_id | col1 | 4 | NULL | 1000 | | | 1 | SIMPLE | dim1 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim1_id | 1 | | | 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | | +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+ {noformat} This uses index to resolve the ORDER BY ... LIMIT and the select takes 0.01 sec to execute. Dataset: {noformat} create table ten(a int); insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9); create table one_k(a int); insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C; create table t_fact ( fact_id int not null, dim1_id int not null, dim2_id int not null, col1 int not null, primary key(fact_id), key(dim1_id), key(dim2_id), key(col1) ); insert into t_fact select A.a+1000*B.a+1000*1000*C.a, A.a, B.a, A.a+1000*B.a+1000*1000*C.a from one_k A , one_k B, ten C where A.a<500 and B.a<500 ; create table dim1 ( dim1_id int not null primary key, col1 int ); insert into dim1 select a,a from one_k where a<500; create table dim2 ( dim2_id int not null primary key, col1 int ); insert into dim2 select a,a from one_k where a<500; {noformat} |
A long standing (and informally known) issue: Join optimizer makes its choices \[almost\] without regard for ORDER BY ... LIMIT clause. ORDER BY ... LIMIT optimizer is invoked when the join order is already fixed. If the picked join order doesn't allow to resolve ORDER BY ... LIMIT efficiently... then we end up with a very poor query plan. Example: {noformat} select * from t_fact join dim1 on t_fact.dim1_id= dim1.dim1_id join dim2 on t_fact.dim2_id= dim2.dim2_id order by t_fact.col1 limit 1000; {noformat} {noformat} +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+ | 1 | SIMPLE | dim1 | ALL | PRIMARY | NULL | NULL | NULL | 500 | Using temporary; Using filesort | | 1 | SIMPLE | t_fact | ref | dim1_id,dim2_id | dim1_id | 4 | j3.dim1.dim1_id | 1 | | | 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | | +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+ {noformat} This uses filesort and takes ~8 sec. Now, let's force the right join order: {noformat} select * from t_fact straight_join dim1 on t_fact.dim1_id= dim1.dim1_id straight_join dim2 on t_fact.dim2_id= dim2.dim2_id order by t_fact.col1 limit 1000; {noformat} {noformat} +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+ | 1 | SIMPLE | t_fact | index | dim1_id,dim2_id | col1 | 4 | NULL | 1000 | | | 1 | SIMPLE | dim1 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim1_id | 1 | | | 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | | +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+ {noformat} This uses index to resolve the ORDER BY ... LIMIT and the select takes 0.01 sec to execute. Dataset: {noformat} create table ten(a int); insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9); create table one_k(a int); insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C; create table t_fact ( fact_id int not null, dim1_id int not null, dim2_id int not null, col1 int not null, primary key(fact_id), key(dim1_id), key(dim2_id), key(col1) ); insert into t_fact select A.a+1000*B.a+1000*1000*C.a, A.a, B.a, A.a+1000*B.a+1000*1000*C.a from one_k A , one_k B, ten C where A.a<500 and B.a<500 ; create table dim1 ( dim1_id int not null primary key, col1 int ); insert into dim1 select a,a from one_k where a<500; create table dim2 ( dim2_id int not null primary key, col1 int ); insert into dim2 select a,a from one_k where a<500; {noformat} |
Description |
A long standing (and informally known) issue: Join optimizer makes its choices \[almost\] without regard for ORDER BY ... LIMIT clause. ORDER BY ... LIMIT optimizer is invoked when the join order is already fixed. If the picked join order doesn't allow to resolve ORDER BY ... LIMIT efficiently... then we end up with a very poor query plan. Example: {noformat} select * from t_fact join dim1 on t_fact.dim1_id= dim1.dim1_id join dim2 on t_fact.dim2_id= dim2.dim2_id order by t_fact.col1 limit 1000; {noformat} {noformat} +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+ | 1 | SIMPLE | dim1 | ALL | PRIMARY | NULL | NULL | NULL | 500 | Using temporary; Using filesort | | 1 | SIMPLE | t_fact | ref | dim1_id,dim2_id | dim1_id | 4 | j3.dim1.dim1_id | 1 | | | 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | | +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+ {noformat} This uses filesort and takes ~8 sec. Now, let's force the right join order: {noformat} select * from t_fact straight_join dim1 on t_fact.dim1_id= dim1.dim1_id straight_join dim2 on t_fact.dim2_id= dim2.dim2_id order by t_fact.col1 limit 1000; {noformat} {noformat} +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+ | 1 | SIMPLE | t_fact | index | dim1_id,dim2_id | col1 | 4 | NULL | 1000 | | | 1 | SIMPLE | dim1 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim1_id | 1 | | | 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | | +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+ {noformat} This uses index to resolve the ORDER BY ... LIMIT and the select takes 0.01 sec to execute. Dataset: {noformat} create table ten(a int); insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9); create table one_k(a int); insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C; create table t_fact ( fact_id int not null, dim1_id int not null, dim2_id int not null, col1 int not null, primary key(fact_id), key(dim1_id), key(dim2_id), key(col1) ); insert into t_fact select A.a+1000*B.a+1000*1000*C.a, A.a, B.a, A.a+1000*B.a+1000*1000*C.a from one_k A , one_k B, ten C where A.a<500 and B.a<500 ; create table dim1 ( dim1_id int not null primary key, col1 int ); insert into dim1 select a,a from one_k where a<500; create table dim2 ( dim2_id int not null primary key, col1 int ); insert into dim2 select a,a from one_k where a<500; {noformat} |
A long standing (and informally known) issue:-
Join optimizer makes its choices \[almost\] without regard for ORDER BY ... LIMIT clause. ORDER BY ... LIMIT optimizer is invoked when the join order is already fixed. If the picked join order doesn't allow to resolve ORDER BY ... LIMIT efficiently... then we end up with a very poor query plan. Example: {noformat} select * from t_fact join dim1 on t_fact.dim1_id= dim1.dim1_id join dim2 on t_fact.dim2_id= dim2.dim2_id order by t_fact.col1 limit 1000; {noformat} {noformat} +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+ | 1 | SIMPLE | dim1 | ALL | PRIMARY | NULL | NULL | NULL | 500 | Using temporary; Using filesort | | 1 | SIMPLE | t_fact | ref | dim1_id,dim2_id | dim1_id | 4 | j3.dim1.dim1_id | 1 | | | 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | | +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+ {noformat} This uses filesort and takes ~8 sec. Now, let's force the right join order: {noformat} select * from t_fact straight_join dim1 on t_fact.dim1_id= dim1.dim1_id straight_join dim2 on t_fact.dim2_id= dim2.dim2_id order by t_fact.col1 limit 1000; {noformat} {noformat} +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+ | 1 | SIMPLE | t_fact | index | dim1_id,dim2_id | col1 | 4 | NULL | 1000 | | | 1 | SIMPLE | dim1 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim1_id | 1 | | | 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | | +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+ {noformat} This uses index to resolve the ORDER BY ... LIMIT and the select takes 0.01 sec to execute. Dataset: {noformat} create table ten(a int); insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9); create table one_k(a int); insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C; create table t_fact ( fact_id int not null, dim1_id int not null, dim2_id int not null, col1 int not null, primary key(fact_id), key(dim1_id), key(dim2_id), key(col1) ); insert into t_fact select A.a+1000*B.a+1000*1000*C.a, A.a, B.a, A.a+1000*B.a+1000*1000*C.a from one_k A , one_k B, ten C where A.a<500 and B.a<500 ; create table dim1 ( dim1_id int not null primary key, col1 int ); insert into dim1 select a,a from one_k where a<500; create table dim2 ( dim2_id int not null primary key, col1 int ); insert into dim2 select a,a from one_k where a<500; {noformat} |
Description |
A long standing (and informally known) issue:-
Join optimizer makes its choices \[almost\] without regard for ORDER BY ... LIMIT clause. ORDER BY ... LIMIT optimizer is invoked when the join order is already fixed. If the picked join order doesn't allow to resolve ORDER BY ... LIMIT efficiently... then we end up with a very poor query plan. Example: {noformat} select * from t_fact join dim1 on t_fact.dim1_id= dim1.dim1_id join dim2 on t_fact.dim2_id= dim2.dim2_id order by t_fact.col1 limit 1000; {noformat} {noformat} +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+ | 1 | SIMPLE | dim1 | ALL | PRIMARY | NULL | NULL | NULL | 500 | Using temporary; Using filesort | | 1 | SIMPLE | t_fact | ref | dim1_id,dim2_id | dim1_id | 4 | j3.dim1.dim1_id | 1 | | | 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | | +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+ {noformat} This uses filesort and takes ~8 sec. Now, let's force the right join order: {noformat} select * from t_fact straight_join dim1 on t_fact.dim1_id= dim1.dim1_id straight_join dim2 on t_fact.dim2_id= dim2.dim2_id order by t_fact.col1 limit 1000; {noformat} {noformat} +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+ | 1 | SIMPLE | t_fact | index | dim1_id,dim2_id | col1 | 4 | NULL | 1000 | | | 1 | SIMPLE | dim1 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim1_id | 1 | | | 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | | +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+ {noformat} This uses index to resolve the ORDER BY ... LIMIT and the select takes 0.01 sec to execute. Dataset: {noformat} create table ten(a int); insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9); create table one_k(a int); insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C; create table t_fact ( fact_id int not null, dim1_id int not null, dim2_id int not null, col1 int not null, primary key(fact_id), key(dim1_id), key(dim2_id), key(col1) ); insert into t_fact select A.a+1000*B.a+1000*1000*C.a, A.a, B.a, A.a+1000*B.a+1000*1000*C.a from one_k A , one_k B, ten C where A.a<500 and B.a<500 ; create table dim1 ( dim1_id int not null primary key, col1 int ); insert into dim1 select a,a from one_k where a<500; create table dim2 ( dim2_id int not null primary key, col1 int ); insert into dim2 select a,a from one_k where a<500; {noformat} |
A long standing (and informally known) issue:
Join optimizer makes its choices [almost] without regard for ORDER BY ... LIMIT clause. ORDER BY ... LIMIT optimizer is invoked when the join order is already fixed. If the picked join order doesn't allow to resolve ORDER BY ... LIMIT efficiently... then we end up with a very poor query plan. Example: {noformat} select * from t_fact join dim1 on t_fact.dim1_id= dim1.dim1_id join dim2 on t_fact.dim2_id= dim2.dim2_id order by t_fact.col1 limit 1000; {noformat} {noformat} +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+ | 1 | SIMPLE | dim1 | ALL | PRIMARY | NULL | NULL | NULL | 500 | Using temporary; Using filesort | | 1 | SIMPLE | t_fact | ref | dim1_id,dim2_id | dim1_id | 4 | j3.dim1.dim1_id | 1 | | | 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | | +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+ {noformat} This uses filesort and takes ~8 sec. Now, let's force the right join order: {noformat} select * from t_fact straight_join dim1 on t_fact.dim1_id= dim1.dim1_id straight_join dim2 on t_fact.dim2_id= dim2.dim2_id order by t_fact.col1 limit 1000; {noformat} {noformat} +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+ | 1 | SIMPLE | t_fact | index | dim1_id,dim2_id | col1 | 4 | NULL | 1000 | | | 1 | SIMPLE | dim1 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim1_id | 1 | | | 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | | +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+ {noformat} This uses index to resolve the ORDER BY ... LIMIT and the select takes 0.01 sec to execute. Dataset: {noformat} create table ten(a int); insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9); create table one_k(a int); insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C; create table t_fact ( fact_id int not null, dim1_id int not null, dim2_id int not null, col1 int not null, primary key(fact_id), key(dim1_id), key(dim2_id), key(col1) ); insert into t_fact select A.a+1000*B.a+1000*1000*C.a, A.a, B.a, A.a+1000*B.a+1000*1000*C.a from one_k A , one_k B, ten C where A.a<500 and B.a<500 ; create table dim1 ( dim1_id int not null primary key, col1 int ); insert into dim1 select a,a from one_k where a<500; create table dim2 ( dim2_id int not null primary key, col1 int ); insert into dim2 select a,a from one_k where a<500; {noformat} |
Priority | Major [ 3 ] | Critical [ 2 ] |
Fix Version/s | 10.12 [ 28320 ] | |
Fix Version/s | 10.11 [ 27614 ] |
Status | In Review [ 10002 ] | Stalled [ 10000 ] |
Priority | Critical [ 2 ] | Major [ 3 ] |
Fix Version/s | 11.1 [ 28549 ] | |
Fix Version/s | 11.0 [ 28320 ] |
Fix Version/s | 11.2 [ 28603 ] | |
Fix Version/s | 11.1 [ 28549 ] |
Fix Version/s | 11.3 [ 28565 ] | |
Fix Version/s | 11.2 [ 28603 ] |
Fix Version/s | 11.4 [ 29301 ] | |
Fix Version/s | 11.3 [ 28565 ] |
Labels | optimizer order-by-optimization | optimizer optimizer-feature order-by-optimization |
Issue Type | Task [ 3 ] | New Feature [ 2 ] |
Fix Version/s | 11.5 [ 29506 ] | |
Fix Version/s | 11.4 [ 29301 ] |
Link | This issue is blocked by MDEV-21643 [ MDEV-21643 ] |
Link | This issue is blocked by MDEV-21643 [ MDEV-21643 ] |
Link | This issue is blocked by MDEV-21643 [ MDEV-21643 ] |
Link | This issue relates to MDEV-21643 [ MDEV-21643 ] |
Fix Version/s | 11.6 [ 29515 ] | |
Fix Version/s | 11.5 [ 29506 ] |
Fix Version/s | 11.7 [ 29815 ] | |
Fix Version/s | 11.6 [ 29515 ] |
Zendesk Related Tickets | 201658 202060 159278 | |
Zendesk active tickets | 201658 |
Fix Version/s | 11.8 [ 29921 ] | |
Fix Version/s | 11.7 [ 29815 ] |
Link |
This issue relates to |
Link | This issue relates to MDEV-35280 [ MDEV-35280 ] |
Link | This issue relates to MDEV-18079 [ MDEV-18079 ] |
Link | This issue blocks MDEV-33412 [ MDEV-33412 ] |
Fix Version/s | 11.9 [ 29945 ] | |
Fix Version/s | 11.8 [ 29921 ] |
Fix Version/s | 12.1 [ 29992 ] | |
Fix Version/s | 12.0 [ 29945 ] |
Link | This issue is blocked by MDEV-21643 [ MDEV-21643 ] |
Link | This issue includes MDEV-21643 [ MDEV-21643 ] |
Link |
This issue relates to |
Link |
This issue relates to |
Sprint | Server 12.1 dev sprint [ 793 ] |
Sprint | Server 12.1 dev sprint [ 793 ] |
This can be solved by making the optimizer do this:
1. Run the join optimizer (as it is currently done)
2. check if #join_output_rows < select_limit
3. If yes, re-run the join optimizer in a special way:
3.1 The first table must produce the required ordering
3.2 Join buffering must not be used
3.3 Costs/#rows must be adjusted to take into account
that we are going to produce at most select_limit
rows.
Unclear parts:
3.1 may also need to include the case where one runs
filesort(.., limit=none, ...) for the first table
and then joins the sorted sequence with the rest (applying the LIMIT)
3.3 is the most complex and needs to be elaborated on.