[MDEV-8306] Complete cost-based optimization for ORDER BY with LIMIT - Jira

Sergei Petrunia created issue - 2015-06-11 21:35

Sergei Petrunia made changes - 2015-06-11 21:36

Field	Original Value	New Value
Description	A long standing (and informally known) issue: Join optimizer makes its choices [almost] without regard for ORDER BY ... LIMIT clause. ORDER BY ... LIMIT optimizer is invoked when the join order is already fixed. If the picked join order doesn't allow to resolve ORDER BY ... LIMIT efficiently... then we end up with a very poor query plan. Example: {noformat} select * from t_fact join dim1 on t_fact.dim1_id= dim1.dim1_id join dim2 on t_fact.dim2_id= dim2.dim2_id order by t_fact.col1 limit 1000; {noformat} {noformat} +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+ \| id \| select_type \| table \| type \| possible_keys \| key \| key_len \| ref \| rows \| Extra \| +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+ \| 1 \| SIMPLE \| dim1 \| ALL \| PRIMARY \| NULL \| NULL \| NULL \| 500 \| Using temporary; Using filesort \| \| 1 \| SIMPLE \| t_fact \| ref \| dim1_id,dim2_id \| dim1_id \| 4 \| j3.dim1.dim1_id \| 1 \| \| \| 1 \| SIMPLE \| dim2 \| eq_ref \| PRIMARY \| PRIMARY \| 4 \| j3.t_fact.dim2_id \| 1 \| \| +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+ {noformat} This uses filesort and takes ~8 sec. Now, let's force the right join order: {noformat} select * from t_fact straight_join dim1 on t_fact.dim1_id= dim1.dim1_id straight_join dim2 on t_fact.dim2_id= dim2.dim2_id order by t_fact.col1 limit 1000; {noformat} {noformat} +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+ \| id \| select_type \| table \| type \| possible_keys \| key \| key_len \| ref \| rows \| Extra \| +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+ \| 1 \| SIMPLE \| t_fact \| index \| dim1_id,dim2_id \| col1 \| 4 \| NULL \| 1000 \| \| \| 1 \| SIMPLE \| dim1 \| eq_ref \| PRIMARY \| PRIMARY \| 4 \| j3.t_fact.dim1_id \| 1 \| \| \| 1 \| SIMPLE \| dim2 \| eq_ref \| PRIMARY \| PRIMARY \| 4 \| j3.t_fact.dim2_id \| 1 \| \| +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+ {noformat} This uses index to resolve the ORDER BY ... LIMIT and the select takes 0.01 sec to execute.	A long standing (and informally known) issue: Join optimizer makes its choices [almost] without regard for ORDER BY ... LIMIT clause. ORDER BY ... LIMIT optimizer is invoked when the join order is already fixed. If the picked join order doesn't allow to resolve ORDER BY ... LIMIT efficiently... then we end up with a very poor query plan. Example: {noformat} select * from t_fact join dim1 on t_fact.dim1_id= dim1.dim1_id join dim2 on t_fact.dim2_id= dim2.dim2_id order by t_fact.col1 limit 1000; {noformat} {noformat} +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+ \| id \| select_type \| table \| type \| possible_keys \| key \| key_len \| ref \| rows \| Extra \| +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+ \| 1 \| SIMPLE \| dim1 \| ALL \| PRIMARY \| NULL \| NULL \| NULL \| 500 \| Using temporary; Using filesort \| \| 1 \| SIMPLE \| t_fact \| ref \| dim1_id,dim2_id \| dim1_id \| 4 \| j3.dim1.dim1_id \| 1 \| \| \| 1 \| SIMPLE \| dim2 \| eq_ref \| PRIMARY \| PRIMARY \| 4 \| j3.t_fact.dim2_id \| 1 \| \| +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+ {noformat} This uses filesort and takes ~8 sec. Now, let's force the right join order: {noformat} select * from t_fact straight_join dim1 on t_fact.dim1_id= dim1.dim1_id straight_join dim2 on t_fact.dim2_id= dim2.dim2_id order by t_fact.col1 limit 1000; {noformat} {noformat} +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+ \| id \| select_type \| table \| type \| possible_keys \| key \| key_len \| ref \| rows \| Extra \| +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+ \| 1 \| SIMPLE \| t_fact \| index \| dim1_id,dim2_id \| col1 \| 4 \| NULL \| 1000 \| \| \| 1 \| SIMPLE \| dim1 \| eq_ref \| PRIMARY \| PRIMARY \| 4 \| j3.t_fact.dim1_id \| 1 \| \| \| 1 \| SIMPLE \| dim2 \| eq_ref \| PRIMARY \| PRIMARY \| 4 \| j3.t_fact.dim2_id \| 1 \| \| +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+ {noformat} This uses index to resolve the ORDER BY ... LIMIT and the select takes 0.01 sec to execute. Dataset: {noformat} create table ten(a int); insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9); create table one_k(a int); insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C; create table t_fact ( fact_id int not null, dim1_id int not null, dim2_id int not null, col1 int not null, primary key(fact_id), key(dim1_id), key(dim2_id), key(col1) ); insert into t_fact select A.a+1000B.a+10001000C.a, A.a, B.a, A.a+1000B.a+10001000C.a from one_k A , one_k B, ten C where A.a<500 and B.a<500 ; create table dim1 ( dim1_id int not null primary key, col1 int ); insert into dim1 select a,a from one_k where a<500; create table dim2 ( dim2_id int not null primary key, col1 int ); insert into dim2 select a,a from one_k where a<500; {noformat}

Sergei Petrunia made changes - 2015-06-11 21:37

Labels

optimizer order-by-optimization

Sergei Petrunia added a comment - 2015-06-11 21:56

This can be solved by making the optimizer do this:

1. Run the join optimizer (as it is currently done)
2. check if #join_output_rows < select_limit
3. If yes, re-run the join optimizer in a special way:
3.1 The first table must produce the required ordering
3.2 Join buffering must not be used
3.3 Costs/#rows must be adjusted to take into account
that we are going to produce at most select_limit
rows.

Unclear parts:

3.1 may also need to include the case where one runs
filesort(.., limit=none, ...) for the first table
and then joins the sorted sequence with the rest (applying the LIMIT)

3.3 is the most complex and needs to be elaborated on.

Sergei Petrunia added a comment - 2015-06-11 21:56 This can be solved by making the optimizer do this: 1. Run the join optimizer (as it is currently done) 2. check if #join_output_rows < select_limit 3. If yes, re-run the join optimizer in a special way: 3.1 The first table must produce the required ordering 3.2 Join buffering must not be used 3.3 Costs/#rows must be adjusted to take into account that we are going to produce at most select_limit rows. Unclear parts: 3.1 may also need to include the case where one runs filesort(.., limit=none, ...) for the first table and then joins the sorted sequence with the rest (applying the LIMIT) 3.3 is the most complex and needs to be elaborated on.

Sergei Petrunia made changes - 2015-06-11 22:09

Fix Version/s

10.1 [ 16100 ]

Sergei Petrunia made changes - 2015-11-09 11:30

Link

This issue relates to MDEV-8880 [ MDEV-8880 ]

Sergei Petrunia added a comment - 2015-11-09 11:38

Another problematic issue is WHERE clause. It is easy to make a decision to switch to using index-based resolution to ORDER BY ... LIMIT N when you assume that the first N rows you enumerate will match the WHERE clause. If the WHERE condition is very selective, one may end up enumerating many more rows before they find the first N rows that match the WHERE.

EITS gives the optimizer information about condition selectivity, but does it give enough? The way EITS works currently, it is difficult to tell whether EITS plus join operations "took into account" selectivities of all parts of WHERE clause or there were some un-handled conditions.

Sergei Petrunia added a comment - 2015-11-09 11:38 Another problematic issue is WHERE clause. It is easy to make a decision to switch to using index-based resolution to ORDER BY ... LIMIT N when you assume that the first N rows you enumerate will match the WHERE clause. If the WHERE condition is very selective, one may end up enumerating many more rows before they find the first N rows that match the WHERE. EITS gives the optimizer information about condition selectivity, but does it give enough? The way EITS works currently, it is difficult to tell whether EITS plus join operations "took into account" selectivities of all parts of WHERE clause or there were some un-handled conditions.

Sergei Petrunia made changes - 2015-11-26 20:56

Fix Version/s		10.2 [ 14601 ]
Fix Version/s	10.1 [ 16100 ]

Sergei Petrunia added a comment - 2016-10-11 14:27

See also: http://bugs.mysql.com/bug.php?id=83323

Sergei Petrunia added a comment - 2016-10-11 14:27 See also: http://bugs.mysql.com/bug.php?id=83323

Sergei Petrunia made changes - 2017-08-05 17:39

Status

Open [ 1 ]

In Progress [ 3 ]

Sergei Petrunia made changes - 2017-08-05 17:39

Status

In Progress [ 3 ]

Stalled [ 10000 ]

Sergei Golubchik made changes - 2017-08-05 18:09

Fix Version/s		10.4 [ 22408 ]
Fix Version/s	10.2 [ 14601 ]

Sergei Golubchik made changes - 2017-08-05 18:09

Affects Version/s	10.0.19 [ 19200 ]
Issue Type	Bug [ 1 ]	Task [ 3 ]

Sergei Golubchik added a comment - 2017-08-06 12:18

Two ideas about properly taking ORDER BY (but not LIMIT) into account in join optimizer.

1. Join optimizer is doing depth-first search. First it builds some join order, then tries other partial orders and compares costs. When a very first some join order is built, optimizer has the estimate of number of rows in the result. At that point it can estimate the cost of a filesort. Later when comparing plans (or partial plans), it should add the cost of a filesort to those plans that will end up using filesort. This way it can to a fair comparison of a plan-with-filesort and a plan that uses index and no filesort.

One of the weak points of this idea, it assumes that independently from the join order, optimizer will always get the same (or similar) estimate for the number of rows in the result.

Sergei Golubchik added a comment - 2017-08-06 12:18 Two ideas about properly taking ORDER BY (but not LIMIT) into account in join optimizer. 1. Join optimizer is doing depth-first search. First it builds some join order, then tries other partial orders and compares costs. When a very first some join order is built, optimizer has the estimate of number of rows in the result. At that point it can estimate the cost of a filesort. Later when comparing plans (or partial plans), it should add the cost of a filesort to those plans that will end up using filesort. This way it can to a fair comparison of a plan-with-filesort and a plan that uses index and no filesort. One of the weak points of this idea, it assumes that independently from the join order, optimizer will always get the same (or similar) estimate for the number of rows in the result.

Sergei Golubchik added a comment - 2017-08-06 12:26

2. Another idea, calculate three plans in parallel. That is, currently, best_access_path() does

      if (tmp + 0.0001 < best_time - records/(double) TIME_FOR_COMPARE)

        best_time= tmp + records/(double) TIME_FOR_COMPARE;

        best= tmp;

        best_records= records;

        best_key= start_key;

        best_max_key_part= max_key_part;

        best_ref_depends_map= found_ref;

Instead it can do the following:

      enum { NO_FILESORT, TMPTABLE_FILESORT, FIRST_TABLE_FILESORT } plan_type;

      if (tmp + 0.0001 < best_time[plan_type] - records/(double) TIME_FOR_COMPARE)

        best_time[plan_type]= tmp + records/(double) TIME_FOR_COMPARE;

        best[plan_type]= tmp;

        best_records[plan_type]= records;

        best_key[plan_type]= start_key;

        best_max_key_part[plan_type]= max_key_part;

        best_ref_depends_map[plan_type]= found_ref;

basically, it finds three best plans — one that uses no filesort, one that uses filesort after the first table, and one that does temptable and filesort at the end. When join optimizer is finished, we can select the best plan out of three, taking filesort cost into account.

Sergei Golubchik added a comment - 2017-08-06 12:26 2. Another idea, calculate three plans in parallel. That is, currently, best_access_path() does if (tmp + 0.0001 < best_time - records/( double ) TIME_FOR_COMPARE) { best_time= tmp + records/( double ) TIME_FOR_COMPARE; best= tmp; best_records= records; best_key= start_key; best_max_key_part= max_key_part; best_ref_depends_map= found_ref; } Instead it can do the following: enum { NO_FILESORT, TMPTABLE_FILESORT, FIRST_TABLE_FILESORT } plan_type; if (tmp + 0.0001 < best_time[plan_type] - records/( double ) TIME_FOR_COMPARE) { best_time[plan_type]= tmp + records/( double ) TIME_FOR_COMPARE; best[plan_type]= tmp; best_records[plan_type]= records; best_key[plan_type]= start_key; best_max_key_part[plan_type]= max_key_part; best_ref_depends_map[plan_type]= found_ref; } basically, it finds three best plans — one that uses no filesort, one that uses filesort after the first table, and one that does temptable and filesort at the end. When join optimizer is finished, we can select the best plan out of three, taking filesort cost into account.

Sergei Petrunia made changes - 2018-01-19 10:08

Link

This issue relates to ~~MDEV-14621~~ [ ~~MDEV-14621~~ ]

Alice Sherepa made changes - 2018-03-05 10:46

Link

This issue is duplicated by ~~MDEV-14569~~ [ ~~MDEV-14569~~ ]

Julien Fritsch made changes - 2019-02-08 08:14

Support case ID

9377

not-9377

Ralf Gebhardt made changes - 2019-04-03 15:56

Fix Version/s

10.4 [ 22408 ]

Ralf Gebhardt made changes - 2019-04-03 15:56

NRE Projects

RM_105_CANDIDATE

Ralf Gebhardt made changes - 2019-04-23 17:45

NRE Projects

RM_105_CANDIDATE

RM_105_CANDIDATE RM_105_OPTIMIZER

Varun Gupta (Inactive) made changes - 2019-04-24 14:00

Assignee

Sergei Petrunia [ psergey ]

Varun Gupta [ varun ]

Varun Gupta (Inactive) made changes - 2019-04-29 06:28

Status

Stalled [ 10000 ]

In Progress [ 3 ]

Varun Gupta (Inactive) made changes - 2019-04-30 19:12

Link

This issue relates to ~~MDEV-16214~~ [ ~~MDEV-16214~~ ]

Sergei Petrunia added a comment - 2019-05-03 16:29

Takeaways from browsing through some more-or-less related papers :

"Interesting orders"

Traditional query optimizer relies on the concept of "interesting orders". ORDER BY/GROUP BY specify "interesting orders", for example.

The optimizer tries to keep (ie to not prune away) the query plan [fragments] that produce rows in interesting orders.

For example, if there is a query plan [fragment] QP1 that produces an interesting order, and a query plan QP2 (or a fragment for the same part of the query) which is cheaper but doesn't produce the same interesting order, then QP1 is not pruned in favor of QP2.

Peculiar property of the LIMIT-n construct

Consider the process of constructing a regular join plan for "t1 JOIN t2 JOIN t3".

We start with "t1 JOIN t2", and compute its cost.
Then we add the t3 as the 3rd table. The plan for "t1 JOIN t2 JOIN t3" will still include the plan for "t1 JOIN t2" inside it.

Now, consider "t1 JOIN t2 JOIN t3 ORDER BY ... LIMIT N".

Let's assume that short-cutting is possible (execution stops after producing N
matches) and try to pick the first table. Suppose, it's t1.

What value of LIMIT should we use? It depends on the fanout of t2, t3 (which we do not know).

We could get an estimate of fanout by running the join optimizer and checking how many rows are expected to be in the join output. @serg's comment above was talking about this also.

Simplified approach - satisfy ORDER BY, ignore the LIMIT

Can we just assume that any query plan that produces the matching order is good enough?

Sergei Petrunia added a comment - 2019-05-03 16:29 Takeaways from browsing through some more-or-less related papers : "Interesting orders" Traditional query optimizer relies on the concept of "interesting orders". ORDER BY/GROUP BY specify "interesting orders", for example. The optimizer tries to keep (ie to not prune away) the query plan [fragments] that produce rows in interesting orders. For example, if there is a query plan [fragment] QP1 that produces an interesting order, and a query plan QP2 (or a fragment for the same part of the query) which is cheaper but doesn't produce the same interesting order, then QP1 is not pruned in favor of QP2. Peculiar property of the LIMIT-n construct Consider the process of constructing a regular join plan for "t1 JOIN t2 JOIN t3". We start with "t1 JOIN t2", and compute its cost. Then we add the t3 as the 3rd table. The plan for "t1 JOIN t2 JOIN t3" will still include the plan for "t1 JOIN t2" inside it. Now, consider "t1 JOIN t2 JOIN t3 ORDER BY ... LIMIT N". Let's assume that short-cutting is possible (execution stops after producing N matches) and try to pick the first table. Suppose, it's t1. What value of LIMIT should we use? It depends on the fanout of t2, t3 (which we do not know). We could get an estimate of fanout by running the join optimizer and checking how many rows are expected to be in the join output. @serg's comment above was talking about this also. Simplified approach - satisfy ORDER BY, ignore the LIMIT Can we just assume that any query plan that produces the matching order is good enough?

Sergei Petrunia added a comment - 2019-05-07 21:25

Notes from the optimizer call:
Igor suggests to also consider query plans where the first N tables are written into temp.table, then passed to filesort, and then the sorted output is joined with the rest of the tables. That is, bushy plans with a "nest" at the front. TODO: elaborate on this proposal.

Sergei Petrunia added a comment - 2019-05-07 21:25 Notes from the optimizer call: Igor suggests to also consider query plans where the first N tables are written into temp.table, then passed to filesort, and then the sorted output is joined with the rest of the tables. That is, bushy plans with a "nest" at the front. TODO: elaborate on this proposal.

Varun Gupta (Inactive) made changes - 2019-05-28 07:19

Fix Version/s

10.5 [ 23123 ]

Varun Gupta (Inactive) added a comment - 2019-06-05 05:09

Details about the approach to be taken, termed as Order by limit (bushy plan)

General description

No need for this optimization when we have grouping(DISTINCT, GROUP BY, window function and aggregate functions)
We can also disallow this bushy plan if we have HAVING clause that is not pushed into WHERE clause
For bushy plan we will not need anything different until we get the partial order that supports the ORDER BY CLAUSE

So an approach for the bushy plan(if chosen as the best plan) would be like

Compute Partial join (that covers ordering)
Store the result of the partial join in a temp table
Sort the temp table
Join the temp table with the remaining tables

Note: the bush strategy allows to shortcut the execution for LIMIT early:

The part of the join that is inside the bush will need to be computed fully (and passed to Filesort)
After that, we can short-cut the join execution as soon as we’ve found #LIMIT matches.

Varun Gupta (Inactive) added a comment - 2019-06-05 05:09 Details about the approach to be taken, termed as Order by limit (bushy plan) General description No need for this optimization when we have grouping(DISTINCT, GROUP BY, window function and aggregate functions) We can also disallow this bushy plan if we have HAVING clause that is not pushed into WHERE clause For bushy plan we will not need anything different until we get the partial order that supports the ORDER BY CLAUSE So an approach for the bushy plan(if chosen as the best plan) would be like Compute Partial join (that covers ordering) Store the result of the partial join in a temp table Sort the temp table Join the temp table with the remaining tables Note: the bush strategy allows to shortcut the execution for LIMIT early: The part of the join that is inside the bush will need to be computed fully (and passed to Filesort) After that, we can short-cut the join execution as soon as we’ve found #LIMIT matches.

Varun Gupta (Inactive) added a comment - 2019-06-07 11:35 - edited

Optimizer part

Take the LIMIT part of ORDER BY LIMIT into account during join optimization

The biggest difference in query execution for ORDER BY … LIMIT plans comes from ability (or inability) to terminate the query execution early, as soon as #LIMIT matches were found.

Two issues for the optimizer here:

Data distribution: it’s difficult to predict whether an index scan will find the matching rows early or late (TODO rephrase). Perhaps we should add a penalty for plans that take risks.
Partial plans and LIMIT: The whole plan has to produce N rows. How many rows should the partial plan produce for that? This depends on the fanout of the rest of the plan.

Suggested answer for the second: run the join optimizer, compute the total join output cardinality. Then, we can see which fraction of join output we will need and assume that we will need to do the same fraction of work for each table (TODO: our optimizer typically heavily over-estimates cardinality due to unknown selectivity of predicates. Will this cause an overly-optimistic estimates?)

(TODO2: another suggestion is to
1. Force a construction of query plan which produces rows in a suitable order (that is: don’t prune it away even if there are cheaper query plans which produce rows in a non-suitable order)
2. Take into account LIMIT shortcut only after the whole query plan has been constructed)

Add a sort operation when join prefix allows it

We can add a sorting operation if all of the columns that are required to compute the sort key are in the join prefix. Note that the check should take into account equality propagation (See ~~MDEV-8989~~).

Example:

WHERE
t2.a= t3.a and t3.a=t4.a
ORDER BY t1.a, t2.a

For each reference to tblX.columnY in ORDER BY list, we get:
A single table_map referring to tblX
If tblX is a part of multiple equality, a bitmap of tables whose fields participate in the multiple equality.
For this example, we would get:

  table_map_1= {t1},

  table_map_2= {t2, t3, t4}

Then, when we have a join prefix, we can check if it covers all the table_map values that we have.

1.3 Cost of sorting
It is cheaper to sort fewer rows, so sort operation must be done at the point in the query plan where the output cardinality is minimal. Add the cost of sorting as soon as we have achieved the ordering of the columns

No join buffering in ordered-join mode

After the sorting nest has been added, the rest of the join is performed in ordered-join mode (first “table” is sorted, the join output is sorted too). Join buffering and BKA must be disabled here.

Representation of the sort operation in the JOIN prefix
TODO: Should sort operation be represented as a member of a POSITION object, or as a separate POSITION object?

1.6 Enumeration of different sorting options
If we construct a join prefix of
t1, t2
And then add a sort operation:

sort {t1, t2}

And then consider adding table t3,

We will add it afterwards:

 Sort {t1, t2} , t3

No overlaps with semi/outer joins

It’s difficult to handle the situation where the sorting operation comes in the middle of outer join or semi-join processing.
Example:

SELECT * FROM t1 LEFT JOIN (t2, t3) ON …

It is hard to make a sort bush of

{t1, t2}

here. The suggestion is to not allow this.
It’s similar for semi-joins: Putting a sort operation after a join prefix that has [partial] duplicate row combinations will add a lot of complexity. Possible variants of condition under which sort operation can be added:

Varun Gupta (Inactive) added a comment - 2019-06-07 11:35 - edited Optimizer part Take the LIMIT part of ORDER BY LIMIT into account during join optimization The biggest difference in query execution for ORDER BY … LIMIT plans comes from ability (or inability) to terminate the query execution early, as soon as #LIMIT matches were found. Two issues for the optimizer here: Data distribution: it’s difficult to predict whether an index scan will find the matching rows early or late (TODO rephrase). Perhaps we should add a penalty for plans that take risks. Partial plans and LIMIT: The whole plan has to produce N rows. How many rows should the partial plan produce for that? This depends on the fanout of the rest of the plan. Suggested answer for the second: run the join optimizer, compute the total join output cardinality. Then, we can see which fraction of join output we will need and assume that we will need to do the same fraction of work for each table (TODO: our optimizer typically heavily over-estimates cardinality due to unknown selectivity of predicates. Will this cause an overly-optimistic estimates?) (TODO2: another suggestion is to 1. Force a construction of query plan which produces rows in a suitable order (that is: don’t prune it away even if there are cheaper query plans which produce rows in a non-suitable order) 2. Take into account LIMIT shortcut only after the whole query plan has been constructed) Add a sort operation when join prefix allows it We can add a sorting operation if all of the columns that are required to compute the sort key are in the join prefix. Note that the check should take into account equality propagation (See MDEV-8989 ). Example: WHERE t2.a= t3.a and t3.a=t4.a ORDER BY t1.a, t2.a For each reference to tblX.columnY in ORDER BY list, we get: A single table_map referring to tblX If tblX is a part of multiple equality, a bitmap of tables whose fields participate in the multiple equality. For this example, we would get: table_map_1= {t1}, table_map_2= {t2, t3, t4} Then, when we have a join prefix, we can check if it covers all the table_map values that we have. 1.3 Cost of sorting It is cheaper to sort fewer rows, so sort operation must be done at the point in the query plan where the output cardinality is minimal. Add the cost of sorting as soon as we have achieved the ordering of the columns No join buffering in ordered-join mode After the sorting nest has been added, the rest of the join is performed in ordered-join mode (first “table” is sorted, the join output is sorted too). Join buffering and BKA must be disabled here. Representation of the sort operation in the JOIN prefix TODO: Should sort operation be represented as a member of a POSITION object, or as a separate POSITION object? 1.6 Enumeration of different sorting options If we construct a join prefix of t1, t2 And then add a sort operation: sort {t1, t2} And then consider adding table t3, We will add it afterwards: Sort {t1, t2} , t3 No overlaps with semi/outer joins It’s difficult to handle the situation where the sorting operation comes in the middle of outer join or semi-join processing. Example: SELECT * FROM t1 LEFT JOIN (t2, t3) ON … It is hard to make a sort bush of {t1, t2} here. The suggestion is to not allow this. It’s similar for semi-joins: Putting a sort operation after a join prefix that has [partial] duplicate row combinations will add a lot of complexity. Possible variants of condition under which sort operation can be added:

Varun Gupta (Inactive) added a comment - 2019-06-10 09:26

Plan refinement part

No linked join buffers across the bush bound

Join buffering code should be adjusted to not do this.

Create temp. table

Figure out the set of fields we need to be saved from the tables inside the sort nest, and create a temporary table with those fields.

Access to tables inside the sort nest from post-sorting context

Query execution with sort nest:
Step 1: Compute a join of tables inside the sort nest; store record combinations in the temporary table;
Step 2: Use filesort on the temporary table
Step 3: Read filesort result and join it with the tables outside the sort nest
Step 4: Return the query result (to the client or elsewhere)

When we are at Step 3 and 4, we may need to read columns of tables inside the sort nest.
This requires some extra arrangement, as the “current” values are in the temporary table.

Possible solutions:

The “Unpack” solution (aka copy-back)

After we read a row from the temporary table, put the values back into the record buffers of the original tables.
This is used by:
SJ-Materialization-Scan (see rr_sequential_and_unpack())
Join buffering code (see JOIN_CACHE::read_all_record_fields())
When reading the result of filesort’ing a single table.

Advantages: no need to modify any Item objects.
Disadvantages: Extra overhead to copy the values back.

The “refer-to-temptable like GROUP BY does” solution

This is used for example by GROUP BY code. When a tableX.fieldY is accessed from post-group operation context (e.g. when computing the HAVING clause), it is done as follows:

The Item_field referring to tableX.fieldY is wrapped inside Item_ref object.
Item_ref::val_int() translates the call to val_int_result() (the same happens for other datatypes, too).
Item_field has two pointers to table fields: “field” refers to the Field in the original table, while “result_field” refers to its corresponding field in the temporary table.
Item_field::val_int_result() returns value from the “result_field”, that is, from the temporary table.

Advantages: no need to copy
Disadvantages: need to have Item_ref objects (can we use them also or only GROUP BY code can?)

The “refer to temptable but not like GROUP BY does it” solution.

Is there anything that prevents us from changing Item_field objects to point to temptable directly? (This change will need to be un-done for SP re-execution).
(note: there is class Item_temptable_field, but it is the same as Item_field for most purposes)

Advantages: no need to copy
Disadvantages: need to walk Item trees and make substitutions (which then will need to be undone if running as PS or in SP).

Varun Gupta (Inactive) added a comment - 2019-06-10 09:26 Plan refinement part No linked join buffers across the bush bound Join buffering code should be adjusted to not do this. Create temp. table Figure out the set of fields we need to be saved from the tables inside the sort nest, and create a temporary table with those fields. Access to tables inside the sort nest from post-sorting context Query execution with sort nest: Step 1: Compute a join of tables inside the sort nest; store record combinations in the temporary table; Step 2: Use filesort on the temporary table Step 3: Read filesort result and join it with the tables outside the sort nest Step 4: Return the query result (to the client or elsewhere) When we are at Step 3 and 4, we may need to read columns of tables inside the sort nest. This requires some extra arrangement, as the “current” values are in the temporary table. Possible solutions: The “Unpack” solution (aka copy-back) After we read a row from the temporary table, put the values back into the record buffers of the original tables. This is used by: SJ-Materialization-Scan (see rr_sequential_and_unpack()) Join buffering code (see JOIN_CACHE::read_all_record_fields()) When reading the result of filesort’ing a single table. Advantages: no need to modify any Item objects. Disadvantages: Extra overhead to copy the values back. The “refer-to-temptable like GROUP BY does” solution This is used for example by GROUP BY code. When a tableX.fieldY is accessed from post-group operation context (e.g. when computing the HAVING clause), it is done as follows: The Item_field referring to tableX.fieldY is wrapped inside Item_ref object. Item_ref::val_int() translates the call to val_int_result() (the same happens for other datatypes, too). Item_field has two pointers to table fields: “field” refers to the Field in the original table, while “result_field” refers to its corresponding field in the temporary table. Item_field::val_int_result() returns value from the “result_field”, that is, from the temporary table. Advantages: no need to copy Disadvantages: need to have Item_ref objects (can we use them also or only GROUP BY code can?) The “refer to temptable but not like GROUP BY does it” solution. Is there anything that prevents us from changing Item_field objects to point to temptable directly? (This change will need to be un-done for SP re-execution). (note: there is class Item_temptable_field, but it is the same as Item_field for most purposes) Advantages: no need to copy Disadvantages: need to walk Item trees and make substitutions (which then will need to be undone if running as PS or in SP).

Varun Gupta (Inactive) added a comment - 2019-06-11 15:32

The branch is 10.5-mdev8306

Varun Gupta (Inactive) added a comment - 2019-06-11 15:32 The branch is 10.5-mdev8306

Varun Gupta (Inactive) added a comment - 2019-06-25 06:27

While testing for the optimizer part where we enumerate the plans, I found two issues, opened sub-tasks MDEV-19853 and MDEV-19854.

Varun Gupta (Inactive) added a comment - 2019-06-25 06:27 While testing for the optimizer part where we enumerate the plans, I found two issues, opened sub-tasks MDEV-19853 and MDEV-19854 .

Varun Gupta (Inactive) added a comment - 2019-07-03 19:48 - edited

Optimizer part

Part 1 of the project is done, the planner now enumerates plans that would also consider adding a nest to a partial join that would satisfy the order by clause.
Also for the plans I have added the cost of filling and sorting the temporary table that would store the result of the partial join.

Also I have made some progress in the second part
1) Representing the order-nest in the top level select just as we do for Semi Join Materialization Strategies
So lets consider a case of 4 tables
the join order is t1,t2,t3,t4
order nest is on table (t1,t2)
So the top level select would have 3 tables <order-nest>, t3,t4

This would like this:

                 <order-nest> ---- t3 ---- t4

                        +----t1---t2

2) Created the temp table with that would store the partial join

Varun Gupta (Inactive) added a comment - 2019-07-03 19:48 - edited Optimizer part Part 1 of the project is done, the planner now enumerates plans that would also consider adding a nest to a partial join that would satisfy the order by clause. Also for the plans I have added the cost of filling and sorting the temporary table that would store the result of the partial join. Also I have made some progress in the second part 1) Representing the order-nest in the top level select just as we do for Semi Join Materialization Strategies So lets consider a case of 4 tables the join order is t1,t2,t3,t4 order nest is on table (t1,t2) So the top level select would have 3 tables <order-nest>, t3,t4 This would like this: <order-nest> ---- t3 ---- t4 | +----t1---t2 2) Created the temp table with that would store the partial join

Varun Gupta (Inactive) added a comment - 2019-07-09 05:22 - edited

More updates

Instead of the order-nest for now, kept a seperate JOIN_TAB for the temp table containing the result of the partial join order. This is done currently because we need to adjust the number of JOIN_TAB's how they are allocated(will make a seperate patch for this).
Also the iterations over the JOIN_TAB looked a bit complex, we have this WITH_BUSH_ROOT and WITHOUT_BUSH_ROOT that is used to include and exclude the root of the SJ-nest, we can use this for the order nest too.
Also introduced a struct NEST_INFO to store some of the contents of the nest.

The commits are:
https://github.com/MariaDB/server/commit/ec395d289533317e30b156d9e3b17afd4b50982e
https://github.com/MariaDB/server/commit/d3d66daccbc51cf87a96a8c4cea130c0fa404056

Varun Gupta (Inactive) added a comment - 2019-07-09 05:22 - edited More updates Instead of the order-nest for now, kept a seperate JOIN_TAB for the temp table containing the result of the partial join order. This is done currently because we need to adjust the number of JOIN_TAB's how they are allocated(will make a seperate patch for this). Also the iterations over the JOIN_TAB looked a bit complex, we have this WITH_BUSH_ROOT and WITHOUT_BUSH_ROOT that is used to include and exclude the root of the SJ-nest, we can use this for the order nest too. Also introduced a struct NEST_INFO to store some of the contents of the nest. The commits are: https://github.com/MariaDB/server/commit/ec395d289533317e30b156d9e3b17afd4b50982e https://github.com/MariaDB/server/commit/d3d66daccbc51cf87a96a8c4cea130c0fa404056

Varun Gupta (Inactive) added a comment - 2019-07-16 13:14 - edited

Updates

Making explain work for the queries that used order by limit, had to add the special behaviour for the order nest https://github.com/MariaDB/server/commit/c4139bdb5a87949371f98ec0321d22788dae7814
This would look like

MariaDB [test]> explain select t1.b from t1,t2 where t1.a=t2.a  order by t2.b limit 2;

+------+-------------+--------------+------+---------------+------+---------+------+------+----------------+

| id   | select_type | table        | type | possible_keys | key  | key_len | ref  | rows | Extra          |

+------+-------------+--------------+------+---------------+------+---------+------+------+----------------+

|    1 | SIMPLE      | t2           | ALL  | NULL          | NULL | NULL    | NULL | 100  |                |

|    1 | SIMPLE      | <order-nest> | ALL  | NULL          | NULL | NULL    | NULL | 100  | Using filesort |

|    1 | SIMPLE      | t1           | ALL  | NULL          | NULL | NULL    | NULL | 100  | Using where    |

+------+-------------+--------------+------+---------------+------+---------+------+------+----------------+

Substitution of order by items with the nest table items, this is needed because the filesort is attached to the temporary table(sort-nest) https://github.com/MariaDB/server/commit/47fb57cfef7eb3bae82c6b55af66a144ef3fef5c
Introduced a function end_nest_materialization that would hold the result of the partial join of the tables inside the sort nest
https://github.com/MariaDB/server/commit/03debf755f1238efe48114744f04c73440b73056
https://github.com/MariaDB/server/commit/a1f06e0346a003f48308145e6db4fe7a55f32a94
Substitution of the base items with the nest items, this is currently done for the WHERE clause and the items in the select list , so that we can evaluate the expression in the POST ORDER BY mode
https://github.com/MariaDB/server/commit/1e61de2299216800d016188df45ba7af09b111c8
https://github.com/MariaDB/server/commit/ca15e96f188d5420e32eeda50af9f5f9540d8ce5
A simple test case
https://github.com/MariaDB/server/commit/82b4674bfdc18ef98f41c88f072259ec9f16f29f

Varun Gupta (Inactive) added a comment - 2019-07-16 13:14 - edited Updates Making explain work for the queries that used order by limit, had to add the special behaviour for the order nest https://github.com/MariaDB/server/commit/c4139bdb5a87949371f98ec0321d22788dae7814 This would look like MariaDB [test]> explain select t1.b from t1,t2 where t1.a=t2.a order by t2.b limit 2; +------+-------------+--------------+------+---------------+------+---------+------+------+----------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+--------------+------+---------------+------+---------+------+------+----------------+ | 1 | SIMPLE | t2 | ALL | NULL | NULL | NULL | NULL | 100 | | | 1 | SIMPLE | <order-nest> | ALL | NULL | NULL | NULL | NULL | 100 | Using filesort | | 1 | SIMPLE | t1 | ALL | NULL | NULL | NULL | NULL | 100 | Using where | +------+-------------+--------------+------+---------------+------+---------+------+------+----------------+ Substitution of order by items with the nest table items, this is needed because the filesort is attached to the temporary table(sort-nest) https://github.com/MariaDB/server/commit/47fb57cfef7eb3bae82c6b55af66a144ef3fef5c Introduced a function end_nest_materialization that would hold the result of the partial join of the tables inside the sort nest https://github.com/MariaDB/server/commit/03debf755f1238efe48114744f04c73440b73056 https://github.com/MariaDB/server/commit/a1f06e0346a003f48308145e6db4fe7a55f32a94 Substitution of the base items with the nest items, this is currently done for the WHERE clause and the items in the select list , so that we can evaluate the expression in the POST ORDER BY mode https://github.com/MariaDB/server/commit/1e61de2299216800d016188df45ba7af09b111c8 https://github.com/MariaDB/server/commit/ca15e96f188d5420e32eeda50af9f5f9540d8ce5 A simple test case https://github.com/MariaDB/server/commit/82b4674bfdc18ef98f41c88f072259ec9f16f29f

Varun Gupta (Inactive) made changes - 2019-07-23 11:58

Link

This issue relates to MDEV-20129 [ MDEV-20129 ]

Varun Gupta (Inactive) added a comment - 2019-07-23 12:09

Updates

Made substitution for ref items, so that they point to the sort-nest fields instead of the base table items
https://github.com/MariaDB/server/commit/aa34be302818b974d0e41336baf89d3552559d4d
Extracted the condition from the WHERE clause that would be internal to the nest and attached those to the inner table of the nest
https://github.com/MariaDB/server/commit/3e1c17fedd5f4a603e61e3d1c9cc091f7bd2d1a2
For the remaining condition, substitute base items of inner tables with the sort-nest items
https://github.com/MariaDB/server/commit/5e3110531ce6ce04f023787de3a0f566c2d20120
Tests added for the above extraction and substitution
https://github.com/MariaDB/server/commit/e279814250e1a99bb9ff486b331c9cd9f22e0f46
No need to create a sort-nest when there is only one table to sort, this case is already handled in the server
https://github.com/MariaDB/server/commit/cb70533b44eaa8a2fa6a136cb4bc894f6c238364
Removing constants from the ORDER BY clause before picking the join order
https://github.com/MariaDB/server/commit/c3331cd8598c9e3a903ee70674148a8aec51c2d4
*Equality propagation for the order by items (including expressions)
https://github.com/MariaDB/server/commit/c7d0dc62c24d8a6f49ff1d328bfc7be2f98292cf
*Substitution of order by items with the first field in the multiple equality if found
https://github.com/MariaDB/server/commit/b9e9679e53c6426f6652527d3c205fb221b15e37

Varun Gupta (Inactive) added a comment - 2019-07-23 12:09 Updates Made substitution for ref items, so that they point to the sort-nest fields instead of the base table items https://github.com/MariaDB/server/commit/aa34be302818b974d0e41336baf89d3552559d4d Extracted the condition from the WHERE clause that would be internal to the nest and attached those to the inner table of the nest https://github.com/MariaDB/server/commit/3e1c17fedd5f4a603e61e3d1c9cc091f7bd2d1a2 For the remaining condition, substitute base items of inner tables with the sort-nest items https://github.com/MariaDB/server/commit/5e3110531ce6ce04f023787de3a0f566c2d20120 Tests added for the above extraction and substitution https://github.com/MariaDB/server/commit/e279814250e1a99bb9ff486b331c9cd9f22e0f46 No need to create a sort-nest when there is only one table to sort, this case is already handled in the server https://github.com/MariaDB/server/commit/cb70533b44eaa8a2fa6a136cb4bc894f6c238364 Removing constants from the ORDER BY clause before picking the join order https://github.com/MariaDB/server/commit/c3331cd8598c9e3a903ee70674148a8aec51c2d4 *Equality propagation for the order by items (including expressions) https://github.com/MariaDB/server/commit/c7d0dc62c24d8a6f49ff1d328bfc7be2f98292cf *Substitution of order by items with the first field in the multiple equality if found https://github.com/MariaDB/server/commit/b9e9679e53c6426f6652527d3c205fb221b15e37

Varun Gupta (Inactive) made changes - 2019-07-28 06:53

Link

This issue relates to ~~MDEV-13694~~ [ ~~MDEV-13694~~ ]

Varun Gupta (Inactive) made changes - 2019-07-30 08:58

Link

This issue relates to ~~MDEV-20209~~ [ ~~MDEV-20209~~ ]

Varun Gupta (Inactive) added a comment - 2019-07-30 19:47 - edited

Tasks left:

Need to get a bush for the order-nest (this was the suggestion that I discarded earlier as it was taking me more time to do it that way)
ON expressions and add test cases
MDEV-19854: Using indexes that can achieve ordering
Transform function for subqueries
Test if Item_ref need some special handling
Adding more tests for subqueries and outer joins
Make sure that the entire regression test suite passes
MDEV-20129 should be pushed along with MDEV-8306 (this is done needs review)
There is some work to handle the case when the first non-const table satisfies ordering like enable join buffering if the index used achieves the ordering for first non-const table
Use cache_record_length, to get more precise length of rec_length for the temp table instead of using AVG_REC_LEN
Improving Analyze and Explain format=json for sort-nest(MDEV-20146)
Handling for Blobs (need to disable if sorting cannot be done with them)
Addressing some review and adding proper comments
Enable the sort-nest for multi table UPDATE and DELETE queries also

Varun Gupta (Inactive) added a comment - 2019-07-30 19:47 - edited Tasks left: Need to get a bush for the order-nest (this was the suggestion that I discarded earlier as it was taking me more time to do it that way) ON expressions and add test cases MDEV-19854 : Using indexes that can achieve ordering Transform function for subqueries Test if Item_ref need some special handling Adding more tests for subqueries and outer joins Make sure that the entire regression test suite passes MDEV-20129 should be pushed along with MDEV-8306 (this is done needs review) There is some work to handle the case when the first non-const table satisfies ordering like enable join buffering if the index used achieves the ordering for first non-const table Use cache_record_length, to get more precise length of rec_length for the temp table instead of using AVG_REC_LEN Improving Analyze and Explain format=json for sort-nest( MDEV-20146 ) Handling for Blobs (need to disable if sorting cannot be done with them) Addressing some review and adding proper comments Enable the sort-nest for multi table UPDATE and DELETE queries also

Varun Gupta (Inactive) made changes - 2019-08-01 14:20

Link

This issue relates to MDEV-13275 [ MDEV-13275 ]

Sergei Golubchik made changes - 2019-08-19 13:53

Priority

Major [ 3 ]

Critical [ 2 ]

Varun Gupta (Inactive) added a comment - 2019-08-20 15:35 - edited

Issue with DEPENDENT SUBQUERY

Case 1: when the subquery is attached to the tables inside the nest

Query:

SELECT abs(t1.a) as x, t2.a,t3.a FROM t1,t2,t3

WHERE t1.b = t2.b and t3.b=t1.b and

EXISTS (select 1 from t4 where t4.b=t1.c and t4.b < 4 group by t4.c having max(t4.a) = x)

ORDER BY t2.a desc, t1.a desc limit 5;

x refers to the item abs(t1.a) as x in the parent select
the subquery has a reference to a select_list element
So when I do the substitution of the select list with sort-nest items then I
change the outer-ref to x also(unintentionally)

The idea to fix this would be:

From the select list filter out the items that are internal to the sort-nest
Create temporary table fields for all the items found above, this would mean that
abs(t1.a) would also be in the sort-nest table
Then substitute in the select list abs(t1.a) as x with sort-nest(abs(t1.a)) as x
Also this would require change to the ref_pointer_array as those should also point
to the new substituted items in the select list

Varun Gupta (Inactive) added a comment - 2019-08-20 15:35 - edited Issue with DEPENDENT SUBQUERY Case 1: when the subquery is attached to the tables inside the nest Query: SELECT abs (t1.a) as x, t2.a,t3.a FROM t1,t2,t3 WHERE t1.b = t2.b and t3.b=t1.b and EXISTS ( select 1 from t4 where t4.b=t1.c and t4.b < 4 group by t4.c having max (t4.a) = x) ORDER BY t2.a desc , t1.a desc limit 5; x refers to the item abs(t1.a) as x in the parent select the subquery has a reference to a select_list element So when I do the substitution of the select list with sort-nest items then I change the outer-ref to x also(unintentionally) The idea to fix this would be: From the select list filter out the items that are internal to the sort-nest Create temporary table fields for all the items found above, this would mean that abs(t1.a) would also be in the sort-nest table Then substitute in the select list abs(t1.a) as x with sort-nest(abs(t1.a)) as x Also this would require change to the ref_pointer_array as those should also point to the new substituted items in the select list

Varun Gupta (Inactive) added a comment - 2019-08-23 17:34

So there was some more progress on taking into consideration of an index to achieve ordering for the first non-const table (MDEV-19854).
So the case in the description now takes into consideration of the limit and we can the desired plan

Here is the output for the explain:
https://jira.mariadb.org/browse/MDEV-19854?focusedCommentId=132928&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-132928

Varun Gupta (Inactive) added a comment - 2019-08-23 17:34 So there was some more progress on taking into consideration of an index to achieve ordering for the first non-const table ( MDEV-19854 ). So the case in the description now takes into consideration of the limit and we can the desired plan Here is the output for the explain: https://jira.mariadb.org/browse/MDEV-19854?focusedCommentId=132928&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-132928

Varun Gupta (Inactive) added a comment - 2019-08-28 14:59 - edited

While testing with dbt3 I found an example where using sort-nest optimization is not picking the best accesses for the table

MariaDB [dbt3]> EXPLAIN SELECT   c_nationkey, c_name, o_orderDate, o_totalprice, c_acctbal, n_name FROM   orders, customer, nation  WHERE   c_custkey= o_custkey AND c_nationkey=n_nationkey AND   o_orderDATE >='1997-07-30' and o_orderDATE <= '1998-07-30'  ORDER BY o_orderDATE, o_totalprice DESC LIMIT 10;

+------+-------------+----------+--------+---------------------------+---------------+---------+-----------------------+------+----------------------------------------------------+

| id   | select_type | table    | type   | possible_keys             | key           | key_len | ref                   | rows | Extra                                              |

+------+-------------+----------+--------+---------------------------+---------------+---------+-----------------------+------+----------------------------------------------------+

|    1 | SIMPLE      | orders   | range  | i_o_orderdate,i_o_custkey | i_o_orderdate | 4       | NULL                  | 217  | Using index condition; Using where; Using filesort |

|    1 | SIMPLE      | customer | eq_ref | PRIMARY,i_c_nationkey     | PRIMARY       | 4       | dbt3.orders.o_custkey | 1    |                                                    |

|    1 | SIMPLE      | nation   | ALL    | PRIMARY                   | NULL          | NULL    | NULL                  | 25   | Using where                                        |

+------+-------------+----------+--------+---------------------------+---------------+---------+-----------------------+------+----------------------------------------------------+

3 rows in set (0.005 sec)

MariaDB [dbt3]> set use_sort_nest=0;

Query OK, 0 rows affected (0.000 sec)

MariaDB [dbt3]> EXPLAIN SELECT   c_nationkey, c_name, o_orderDate, o_totalprice, c_acctbal, n_name FROM   orders, customer, nation  WHERE   c_custkey= o_custkey AND c_nationkey=n_nationkey AND   o_orderDATE >='1997-07-30' and o_orderDATE <= '1998-07-30'  ORDER BY o_orderDATE, o_totalprice DESC LIMIT 10;

+------+-------------+----------+--------+---------------------------+---------------+---------+---------------------------+------+----------------------------------------------------+

| id   | select_type | table    | type   | possible_keys             | key           | key_len | ref                       | rows | Extra                                              |

+------+-------------+----------+--------+---------------------------+---------------+---------+---------------------------+------+----------------------------------------------------+

|    1 | SIMPLE      | orders   | range  | i_o_orderdate,i_o_custkey | i_o_orderdate | 4       | NULL                      | 217  | Using index condition; Using where; Using filesort |

|    1 | SIMPLE      | customer | eq_ref | PRIMARY,i_c_nationkey     | PRIMARY       | 4       | dbt3.orders.o_custkey     | 1    | Using where                                        |

|    1 | SIMPLE      | nation   | eq_ref | PRIMARY                   | PRIMARY       | 4       | dbt3.customer.c_nationkey | 1    |                                                    |

+------+-------------+----------+--------+---------------------------+---------------+---------+---------------------------+------+----------------------------------------------------+

3 rows in set (0.005 sec)

So with use_sort_nest=1 we use table scan on table nation while we could have use eq_ref access, somehow the optimizer is not picking it

Varun Gupta (Inactive) added a comment - 2019-08-28 15:05

The optimizer trace shows that table scan was preferred

                    "rest_of_plan": [

                        "plan_prefix": ["orders", "customer"],

                        "table": "nation",

                        "sort-nest": ["orders"],

                        "best_access_path": {

                          "considered_access_paths": [

                              "access_type": "eq_ref",

                              "index": "PRIMARY",

                              "rows": 1,

                              "cost": 217,

                              "chosen": true

},

                              "access_type": "scan",

                              "resulting_rows": 25,

                              "cost": 29,

                              "chosen": true

Varun Gupta (Inactive) added a comment - 2019-08-28 15:05 The optimizer trace shows that table scan was preferred "rest_of_plan": [ { "plan_prefix": ["orders", "customer"], "table": "nation", "sort-nest": ["orders"], "best_access_path": { "considered_access_paths": [ { "access_type": "eq_ref", "index": "PRIMARY", "rows": 1, "cost": 217, "chosen": true }, { "access_type": "scan", "resulting_rows": 25, "cost": 29, "chosen": true } ]

Varun Gupta (Inactive) made changes - 2019-08-31 08:42

Link

This issue relates to MDEV-20459 [ MDEV-20459 ]

Varun Gupta (Inactive) made changes - 2019-09-04 08:09

Assignee	Varun Gupta [ varun ]	Igor Babaev [ igor ]
Status	In Progress [ 3 ]	In Review [ 10002 ]

Varun Gupta (Inactive) made changes - 2019-09-10 09:36

Link

This issue relates to ~~MDEV-18094~~ [ ~~MDEV-18094~~ ]

Varun Gupta (Inactive) made changes - 2019-09-23 15:22

Summary

Poor optimization of JOIN and ORDER BY ... LIMIT

Complete cost-based optimization for ORDER BY with LIMIT

Varun Gupta (Inactive) added a comment - 2019-09-24 07:49

Implementation details

Finding if the Query can consider use the cost based approach for ORDER BY with LIMIT

sort_nest_allowed()
is_order_by_expensive()

Equality propagation for Order BY Items

Propagate the multiple equalities to all the items in the ORDER BY list, this
helps us to consider various other join order that can have prefix that can
resolve the ORDER BY clause

propagate_equal_field_for_orderby()
remove_const_from_order_by()

Find indexes that can resolve the ORDER BY clause

Walk through all the usable indexes for a table and find a map of keys for
table that can resolve the ORDER BY clause

find_keys_that_can_achieve_ordering()

Get the cardinality estimate for the join

Run the join planner to get the estimate of the cardinality of the join.
This is needed as we want to apply the limit selectivity later while calculating
the cost of the best plan.

estimate_cardinality_for_join()

Run the join planner to get the best plan that will consider using a sort nest

optimizer_prune_level=0
This means no heuristic pruning of partial plans,
only partial plans are pruned by the cost of the
best plan at that instance

For each prefix if the ORDER BY clause can be resolved by the tables
in the prefix then
a) We consider adding a nest around the prefix. A separate call is added
for best_extension_by_limited_search

consider_adding_sort_nest()
b) Add the cost of sorting.
sort_nest_oper_cost()

c) Apply the limit selectivity. This is done because after the nest
is considered we can shortcut the execution, we only consider
the fraction of rows that we would read from the nest to get
the resulting rows in the output. So this is how we take the cost
of the shortcut into consideration.

set_fraction_output_for_nest

d) Also special case when one can use indexes to resolve ORDER BY
clause are considered here as well. We calculate the cost to access
the index which can resolve the ORDER BY clause on the first non-const
table. This can help in shortcutting join execution.

get_best_index_for_order_by_limit()
find_cost_of_index_with_ordering()

After the best plan is picked, we create the sort-nest info structure if the best plan uses a sort-nest.

check_if_sort_nest_present()
create_sort_nest_info()

For the first non-const table that uses an index to resolve ordering setup range/index scan

setup_index_use_for_ordering()
setup_range_scan()

Creating the temp table for the sort-nest

We create a list of field items of the tables inside the nest which are needed
to be stored in the temporary table.
Then we also create a list of items of the fields of the temporary table

make_sort_nest()

Join buffering is disabled as it can break ordering

is_join_buffering_allowed()

Substitution of items belonging to tables inside the nest with the items of the temp table of the nest

Here we substitute the items of the tables inside the sort-nest with items
of the temporary table of the sort-nest.
The clauses where this substitution happens are:

SELECT LIST
ON-EXPR
REF ACCESS items [for SJM lookup also]
WHERE clause
ORDER BY clause

substitute_base_with_nest_field_items()
transform() is called with the function Item::replace_with_nest_items sent as a callback function

Materializing the result inside the temp table of the sort-nest

At the execution phase we materialize the result of the inner tables inside
the sort-nest and then continue the execution.

end_nest_materialization()

Varun Gupta (Inactive) added a comment - 2019-09-24 07:49 Implementation details Finding if the Query can consider use the cost based approach for ORDER BY with LIMIT sort_nest_allowed() is_order_by_expensive() Equality propagation for Order BY Items Propagate the multiple equalities to all the items in the ORDER BY list, this helps us to consider various other join order that can have prefix that can resolve the ORDER BY clause propagate_equal_field_for_orderby() remove_const_from_order_by() Find indexes that can resolve the ORDER BY clause Walk through all the usable indexes for a table and find a map of keys for table that can resolve the ORDER BY clause find_keys_that_can_achieve_ordering() Get the cardinality estimate for the join Run the join planner to get the estimate of the cardinality of the join. This is needed as we want to apply the limit selectivity later while calculating the cost of the best plan. estimate_cardinality_for_join() Run the join planner to get the best plan that will consider using a sort nest optimizer_prune_level=0 This means no heuristic pruning of partial plans, only partial plans are pruned by the cost of the best plan at that instance For each prefix if the ORDER BY clause can be resolved by the tables in the prefix then a) We consider adding a nest around the prefix. A separate call is added for best_extension_by_limited_search consider_adding_sort_nest() b) Add the cost of sorting. sort_nest_oper_cost() c) Apply the limit selectivity. This is done because after the nest is considered we can shortcut the execution, we only consider the fraction of rows that we would read from the nest to get the resulting rows in the output. So this is how we take the cost of the shortcut into consideration. set_fraction_output_for_nest d) Also special case when one can use indexes to resolve ORDER BY clause are considered here as well. We calculate the cost to access the index which can resolve the ORDER BY clause on the first non-const table. This can help in shortcutting join execution. get_best_index_for_order_by_limit() find_cost_of_index_with_ordering() After the best plan is picked, we create the sort-nest info structure if the best plan uses a sort-nest. check_if_sort_nest_present() create_sort_nest_info() For the first non-const table that uses an index to resolve ordering setup range/index scan setup_index_use_for_ordering() setup_range_scan() Creating the temp table for the sort-nest We create a list of field items of the tables inside the nest which are needed to be stored in the temporary table. Then we also create a list of items of the fields of the temporary table make_sort_nest() Join buffering is disabled as it can break ordering is_join_buffering_allowed() Substitution of items belonging to tables inside the nest with the items of the temp table of the nest Here we substitute the items of the tables inside the sort-nest with items of the temporary table of the sort-nest. The clauses where this substitution happens are: SELECT LIST ON-EXPR REF ACCESS items [for SJM lookup also] WHERE clause ORDER BY clause substitute_base_with_nest_field_items() transform() is called with the function Item::replace_with_nest_items sent as a callback function Materializing the result inside the temp table of the sort-nest At the execution phase we materialize the result of the inner tables inside the sort-nest and then continue the execution. end_nest_materialization()

Brad Jorgensen made changes - 2019-09-27 16:09

Link

This issue relates to MDEV-19808 [ MDEV-19808 ]

Varun Gupta (Inactive) made changes - 2020-01-03 04:00

Link

This issue duplicates MDEV-21408 [ MDEV-21408 ]

Varun Gupta (Inactive) made changes - 2020-01-07 09:07

Link

This issue duplicates MDEV-21408 [ MDEV-21408 ]

Varun Gupta (Inactive) made changes - 2020-01-07 09:07

Link

This issue relates to MDEV-21408 [ MDEV-21408 ]

Varun Gupta (Inactive) made changes - 2020-01-29 18:30

Link

This issue includes MDEV-8002 [ MDEV-8002 ]

Igor Babaev (Inactive) made changes - 2020-01-29 20:45

Status

In Review [ 10002 ]

Stalled [ 10000 ]

Igor Babaev (Inactive) made changes - 2020-01-29 20:47

Assignee

Igor Babaev [ igor ]

Varun Gupta [ varun ]

Igor Babaev (Inactive) made changes - 2020-01-29 20:47

Comment

[ Review is not needed as it is a header mdev now.
]

Igor Babaev (Inactive) made changes - 2020-01-29 20:48

Assignee

Varun Gupta [ varun ]

Igor Babaev [ igor ]

Igor Babaev (Inactive) made changes - 2020-01-29 20:48

Status

Stalled [ 10000 ]

Open [ 1 ]

Varun Gupta (Inactive) made changes - 2020-01-29 20:49

Status

Open [ 1 ]

Confirmed [ 10101 ]

Igor Babaev (Inactive) made changes - 2020-01-29 20:50

Status

Confirmed [ 10101 ]

In Review [ 10002 ]

Sergei Petrunia added a comment - 2020-01-31 15:03

Before MDEV-8306, best_extension_by_limited_search had this code:

        if (join->sort_by_table &&

            join->sort_by_table !=

            join->positions[join->const_tables].table->table)

/*

             We may have to make a temp table, note that this is only a

             heuristic since we cannot know for sure at this point.

             Hence it may be wrong.

*/

          current_read_time= COST_ADD(current_read_time, current_record_count);

Now, I see only the comment is left:

/*

              We may have to make a temp table, note that this is only a

              heuristic since we cannot know for sure at this point.

              Hence it may be wrong.

*/

            cost= current_record_count;

Does this mean we can't provide the old behavior? (Yes I know it is ugly and incorrect) Still, being able to provide old behavior has value and we should discuss if we could do it, here.

Sergei Petrunia added a comment - 2020-01-31 15:03 Before MDEV-8306 , best_extension_by_limited_search had this code: if (join->sort_by_table && join->sort_by_table != join->positions[join->const_tables].table->table) /* We may have to make a temp table, note that this is only a heuristic since we cannot know for sure at this point. Hence it may be wrong. */ current_read_time= COST_ADD(current_read_time, current_record_count); Now, I see only the comment is left: /* We may have to make a temp table, note that this is only a heuristic since we cannot know for sure at this point. Hence it may be wrong. */ cost= current_record_count; Does this mean we can't provide the old behavior? (Yes I know it is ugly and incorrect) Still, being able to provide old behavior has value and we should discuss if we could do it, here.

Varun Gupta (Inactive) added a comment - 2020-02-02 09:30

The entire snippet is

      else

/*

          'join' is either the best partial QEP with 'search_depth' relations,

          or the best complete QEP so far, whichever is smaller.

*/

        if (!nest_created &&

            ((join->sort_by_table &&

              join->sort_by_table !=

              join->positions[join->const_tables].table->table) ||

              join->sort_nest_possible))

          double cost;

          if (join->sort_nest_possible)

            ulong rec_len= cache_record_length_for_nest(join, idx);

            cost= join->sort_nest_oper_cost(partial_join_cardinality, idx,

                                            rec_len);

            trace_one_table.add("cost_of_sorting", cost);

          else

/*

              We may have to make a temp table, note that this is only a

              heuristic since we cannot know for sure at this point.

              Hence it may be wrong.

*/

            cost= current_record_count;

          current_read_time= COST_ADD(current_read_time, cost);

So when the ORDER BY LIMIT optimization is not enabled we enter the branch

else

/*

              We may have to make a temp table, note that this is only a

              heuristic since we cannot know for sure at this point.

              Hence it may be wrong.

*/

            cost= current_record_count;

And then this cost of sorting is added to the cost of the current plan

current_read_time= COST_ADD(current_read_time, cost);

So we still provide the same old behaviour when the ORDER BY LIMIT optimization is not enabled.

Varun Gupta (Inactive) added a comment - 2020-02-02 09:30 The entire snippet is else { /* 'join' is either the best partial QEP with 'search_depth' relations, or the best complete QEP so far, whichever is smaller. */ if (!nest_created && ((join->sort_by_table && join->sort_by_table != join->positions[join->const_tables].table->table) || join->sort_nest_possible)) { double cost; if (join->sort_nest_possible) { ulong rec_len= cache_record_length_for_nest(join, idx); cost= join->sort_nest_oper_cost(partial_join_cardinality, idx, rec_len); trace_one_table.add( "cost_of_sorting" , cost); } else { /* We may have to make a temp table, note that this is only a heuristic since we cannot know for sure at this point. Hence it may be wrong. */ cost= current_record_count; } current_read_time= COST_ADD(current_read_time, cost); } So when the ORDER BY LIMIT optimization is not enabled we enter the branch else { /* We may have to make a temp table, note that this is only a heuristic since we cannot know for sure at this point. Hence it may be wrong. */ cost= current_record_count; } And then this cost of sorting is added to the cost of the current plan current_read_time= COST_ADD(current_read_time, cost); So we still provide the same old behaviour when the ORDER BY LIMIT optimization is not enabled.

Varun Gupta (Inactive) added a comment - 2020-02-02 09:31

The rebased code on top of 10.5 is in the branch 10.5-mdev8306-2

Varun Gupta (Inactive) added a comment - 2020-02-02 09:31 The rebased code on top of 10.5 is in the branch 10.5-mdev8306-2

Sergei Golubchik made changes - 2020-02-03 16:18

Link

This issue relates to MDEV-21643 [ MDEV-21643 ]

Sergei Petrunia made changes - 2020-02-12 13:19

Link

This issue includes MDEV-21713 [ MDEV-21713 ]

Varun Gupta (Inactive) made changes - 2020-02-18 15:45

Assignee

Igor Babaev [ igor ]

Varun Gupta [ varun ]

Ralf Gebhardt made changes - 2020-03-27 17:05

Fix Version/s		10.6 [ 24028 ]
Fix Version/s	10.5 [ 23123 ]

Varun Gupta (Inactive) made changes - 2020-04-14 09:49

Status

In Review [ 10002 ]

Stalled [ 10000 ]

Varun Gupta (Inactive) made changes - 2020-04-24 09:00

Link

This issue includes MDEV-22360 [ MDEV-22360 ]

Sergei Golubchik made changes - 2020-08-16 20:03

Rank

Ranked higher

Sergei Petrunia added a comment - 2020-09-19 10:51

A related bug in MySQL to look at : https://bugs.mysql.com/bug.php?id=97001

Sergei Petrunia added a comment - 2020-09-19 10:51 A related bug in MySQL to look at : https://bugs.mysql.com/bug.php?id=97001

Varun Gupta (Inactive) made changes - 2021-02-09 11:53

Assignee	Varun Gupta [ varun ]	Sergei Petrunia [ psergey ]
Status	Stalled [ 10000 ]	In Review [ 10002 ]

Varun Gupta (Inactive) added a comment - 2021-02-09 12:33

The updated branch is on 10.6-order_by_limt
Also a push to buildbot has been made
http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.6-varun

Varun Gupta (Inactive) added a comment - 2021-02-09 12:33 The updated branch is on 10.6-order_by_limt Also a push to buildbot has been made http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.6-varun

Sergei Petrunia made changes - 2021-04-29 12:18

Fix Version/s		10.7 [ 24805 ]
Fix Version/s	10.6 [ 24028 ]

Rob Schwyzer (Inactive) made changes - 2021-05-27 18:59

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31251 ]

Rob Schwyzer (Inactive) made changes - 2021-06-01 22:55

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31310 ]

Rob Schwyzer (Inactive) made changes - 2021-06-03 00:23

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31321 ]

Rob Schwyzer (Inactive) made changes - 2021-06-07 20:48

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31339 ]

Rob Schwyzer (Inactive) made changes - 2021-06-10 16:59

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31351 ]

Rob Schwyzer (Inactive) made changes - 2021-06-15 23:07

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31371 ]

Rob Schwyzer (Inactive) made changes - 2021-06-17 18:26

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31382 ]

Rob Schwyzer (Inactive) made changes - 2021-06-17 20:35

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31382 ]

Rob Schwyzer (Inactive) made changes - 2021-06-23 00:37

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31396 ]

Rob Schwyzer (Inactive) made changes - 2021-06-25 17:21

Labels

optimizer order-by-optimization

ServiceNow optimizer order-by-optimization

Rob Schwyzer (Inactive) made changes - 2021-06-29 18:43

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31418 ]

Rob Schwyzer (Inactive) made changes - 2021-07-02 20:01

Labels

ServiceNow optimizer order-by-optimization

76qDvLB8Gju6Hs7nk3VY3EX42G795W5z optimizer order-by-optimization

Rob Schwyzer (Inactive) made changes - 2021-07-07 17:35

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31444 ]

Rob Schwyzer (Inactive) made changes - 2021-07-14 15:03

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31473 ]

Rob Schwyzer (Inactive) made changes - 2021-07-27 23:02

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31510 ]

Rob Schwyzer (Inactive) made changes - 2021-08-04 15:25

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31534 ]

Rob Schwyzer (Inactive) made changes - 2021-08-10 19:12

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31563 ]

Sergei Golubchik made changes - 2021-08-13 22:34

Labels

76qDvLB8Gju6Hs7nk3VY3EX42G795W5z optimizer order-by-optimization

optimizer order-by-optimization

Rob Schwyzer (Inactive) made changes - 2021-08-18 15:04

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31599 ]

Rob Schwyzer (Inactive) made changes - 2021-08-18 21:56

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31609 ]

Rob Schwyzer (Inactive) made changes - 2021-08-18 22:10

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31622 ]

Rob Schwyzer (Inactive) made changes - 2021-08-25 20:37

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31644 ]

Rob Schwyzer (Inactive) made changes - 2021-08-31 22:06

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31710 ]

Rob Schwyzer (Inactive) made changes - 2021-09-02 16:42

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31720 ]

Sergei Golubchik made changes - 2021-09-04 12:22

Priority

Critical [ 2 ]

Major [ 3 ]

Rob Schwyzer (Inactive) made changes - 2021-09-09 16:53

Remote Link

This issue links to "Page (Confluence)" [ 31737 ]

Rob Schwyzer (Inactive) made changes - 2021-09-16 16:48

Remote Link

This issue links to "Page (Confluence)" [ 31759 ]

Rob Schwyzer (Inactive) made changes - 2021-09-16 17:31

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31609 ]

Ralf Gebhardt made changes - 2021-10-14 13:11

Fix Version/s		10.8 [ 26121 ]
Fix Version/s	10.7 [ 24805 ]

Sergei Golubchik made changes - 2021-12-06 21:22

Workflow

MariaDB v3 [ 69896 ]

MariaDB v4 [ 131761 ]

Sergei Golubchik made changes - 2022-02-01 11:35

Fix Version/s		10.9 [ 26905 ]
Fix Version/s	10.8 [ 26121 ]

Ralf Gebhardt made changes - 2022-02-22 09:45

Fix Version/s		10.10 [ 27530 ]
Fix Version/s	10.9 [ 26905 ]

Sergei Golubchik made changes - 2022-06-15 13:36

Fix Version/s		10.11 [ 27614 ]
Fix Version/s	10.10 [ 27530 ]

AirFocus made changes - 2022-08-09 16:11

Description

A long standing (and informally known) issue:

Join optimizer makes its choices [almost] without regard for ORDER BY ... LIMIT clause. ORDER BY ... LIMIT optimizer is invoked when the join order is already fixed. If the picked join order doesn't allow to resolve ORDER BY ... LIMIT efficiently... then we end up with a very poor query plan.

Example:
{noformat}
select * from
  t_fact
    join dim1 on t_fact.dim1_id= dim1.dim1_id
    join dim2 on t_fact.dim2_id= dim2.dim2_id
order by
   t_fact.col1
limit 1000;
{noformat}
{noformat}
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
| 1 | SIMPLE | dim1 | ALL | PRIMARY | NULL | NULL | NULL | 500 | Using temporary; Using filesort |
| 1 | SIMPLE | t_fact | ref | dim1_id,dim2_id | dim1_id | 4 | j3.dim1.dim1_id | 1 | |
| 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
{noformat}

This uses filesort and takes ~8 sec.
Now, let's force the right join order:
{noformat}
select * from
  t_fact
    straight_join dim1 on t_fact.dim1_id= dim1.dim1_id
    straight_join dim2 on t_fact.dim2_id= dim2.dim2_id
order by
   t_fact.col1
limit 1000;
{noformat}

{noformat}
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
| 1 | SIMPLE | t_fact | index | dim1_id,dim2_id | col1 | 4 | NULL | 1000 | |
| 1 | SIMPLE | dim1 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim1_id | 1 | |
| 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
{noformat}
This uses index to resolve the ORDER BY ... LIMIT and the select takes 0.01 sec to execute.

Dataset:
{noformat}
create table ten(a int);
insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);

create table one_k(a int);
insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;

create table t_fact
(
  fact_id int not null,
  dim1_id int not null,
  dim2_id int not null,
  col1 int not null,
  primary key(fact_id),
  key(dim1_id),
  key(dim2_id),
  key(col1)
);

insert into t_fact
select
  A.a+1000*B.a+1000*1000*C.a,
  A.a,
  B.a,
  A.a+1000*B.a+1000*1000*C.a
from
  one_k A ,
  one_k B,
  ten C
where
A.a<500 and B.a<500
;

create table dim1
(
  dim1_id int not null primary key,
  col1 int
);

insert into dim1
select a,a from one_k where a<500;

create table dim2
(
  dim2_id int not null primary key,
  col1 int
);
insert into dim2
select a,a from one_k where a<500;
{noformat}

A long standing (and informally known) issue:

Join optimizer makes its choices \[almost\] without regard for ORDER BY ... LIMIT clause. ORDER BY ... LIMIT optimizer is invoked when the join order is already fixed. If the picked join order doesn't allow to resolve ORDER BY ... LIMIT efficiently... then we end up with a very poor query plan.

Example:

{noformat}
select * from
  t_fact
    join dim1 on t_fact.dim1_id= dim1.dim1_id
    join dim2 on t_fact.dim2_id= dim2.dim2_id
order by
   t_fact.col1
limit 1000;
{noformat}

{noformat}
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
| 1 | SIMPLE | dim1 | ALL | PRIMARY | NULL | NULL | NULL | 500 | Using temporary; Using filesort |
| 1 | SIMPLE | t_fact | ref | dim1_id,dim2_id | dim1_id | 4 | j3.dim1.dim1_id | 1 | |
| 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
{noformat}

This uses filesort and takes ~8 sec.
Now, let's force the right join order:

{noformat}
select * from
  t_fact
    straight_join dim1 on t_fact.dim1_id= dim1.dim1_id
    straight_join dim2 on t_fact.dim2_id= dim2.dim2_id
order by
   t_fact.col1
limit 1000;
{noformat}

{noformat}
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
| 1 | SIMPLE | t_fact | index | dim1_id,dim2_id | col1 | 4 | NULL | 1000 | |
| 1 | SIMPLE | dim1 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim1_id | 1 | |
| 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
{noformat}

This uses index to resolve the ORDER BY ... LIMIT and the select takes 0.01 sec to execute.

Dataset:

{noformat}
create table ten(a int);
insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);

create table one_k(a int);
insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;

create table t_fact
(
  fact_id int not null,
  dim1_id int not null,
  dim2_id int not null,
  col1 int not null,
  primary key(fact_id),
  key(dim1_id),
  key(dim2_id),
  key(col1)
);

insert into t_fact
select
  A.a+1000*B.a+1000*1000*C.a,
  A.a,
  B.a,
  A.a+1000*B.a+1000*1000*C.a
from
  one_k A ,
  one_k B,
  ten C
where
A.a<500 and B.a<500
;

create table dim1
(
  dim1_id int not null primary key,
  col1 int
);

insert into dim1
select a,a from one_k where a<500;

create table dim2
(
  dim2_id int not null primary key,
  col1 int
);
insert into dim2
select a,a from one_k where a<500;
{noformat}

Julien Fritsch made changes - 2022-08-10 08:03

Description

A long standing (and informally known) issue:

Join optimizer makes its choices \[almost\] without regard for ORDER BY ... LIMIT clause. ORDER BY ... LIMIT optimizer is invoked when the join order is already fixed. If the picked join order doesn't allow to resolve ORDER BY ... LIMIT efficiently... then we end up with a very poor query plan.

Example:

{noformat}
select * from
  t_fact
    join dim1 on t_fact.dim1_id= dim1.dim1_id
    join dim2 on t_fact.dim2_id= dim2.dim2_id
order by
   t_fact.col1
limit 1000;
{noformat}

{noformat}
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
| 1 | SIMPLE | dim1 | ALL | PRIMARY | NULL | NULL | NULL | 500 | Using temporary; Using filesort |
| 1 | SIMPLE | t_fact | ref | dim1_id,dim2_id | dim1_id | 4 | j3.dim1.dim1_id | 1 | |
| 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
{noformat}

This uses filesort and takes ~8 sec.
Now, let's force the right join order:

{noformat}
select * from
  t_fact
    straight_join dim1 on t_fact.dim1_id= dim1.dim1_id
    straight_join dim2 on t_fact.dim2_id= dim2.dim2_id
order by
   t_fact.col1
limit 1000;
{noformat}

{noformat}
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
| 1 | SIMPLE | t_fact | index | dim1_id,dim2_id | col1 | 4 | NULL | 1000 | |
| 1 | SIMPLE | dim1 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim1_id | 1 | |
| 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
{noformat}

This uses index to resolve the ORDER BY ... LIMIT and the select takes 0.01 sec to execute.

Dataset:

{noformat}
create table ten(a int);
insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);

create table one_k(a int);
insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;

create table t_fact
(
  fact_id int not null,
  dim1_id int not null,
  dim2_id int not null,
  col1 int not null,
  primary key(fact_id),
  key(dim1_id),
  key(dim2_id),
  key(col1)
);

insert into t_fact
select
  A.a+1000*B.a+1000*1000*C.a,
  A.a,
  B.a,
  A.a+1000*B.a+1000*1000*C.a
from
  one_k A ,
  one_k B,
  ten C
where
A.a<500 and B.a<500
;

create table dim1
(
  dim1_id int not null primary key,
  col1 int
);

insert into dim1
select a,a from one_k where a<500;

create table dim2
(
  dim2_id int not null primary key,
  col1 int
);
insert into dim2
select a,a from one_k where a<500;
{noformat}

A long standing (and informally known) issue:-

Join optimizer makes its choices \[almost\] without regard for ORDER BY ... LIMIT clause. ORDER BY ... LIMIT optimizer is invoked when the join order is already fixed. If the picked join order doesn't allow to resolve ORDER BY ... LIMIT efficiently... then we end up with a very poor query plan.

Example:

{noformat}
select * from
  t_fact
    join dim1 on t_fact.dim1_id= dim1.dim1_id
    join dim2 on t_fact.dim2_id= dim2.dim2_id
order by
   t_fact.col1
limit 1000;
{noformat}

{noformat}
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
| 1 | SIMPLE | dim1 | ALL | PRIMARY | NULL | NULL | NULL | 500 | Using temporary; Using filesort |
| 1 | SIMPLE | t_fact | ref | dim1_id,dim2_id | dim1_id | 4 | j3.dim1.dim1_id | 1 | |
| 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
{noformat}

This uses filesort and takes ~8 sec.
Now, let's force the right join order:

{noformat}
select * from
  t_fact
    straight_join dim1 on t_fact.dim1_id= dim1.dim1_id
    straight_join dim2 on t_fact.dim2_id= dim2.dim2_id
order by
   t_fact.col1
limit 1000;
{noformat}

{noformat}
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
| 1 | SIMPLE | t_fact | index | dim1_id,dim2_id | col1 | 4 | NULL | 1000 | |
| 1 | SIMPLE | dim1 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim1_id | 1 | |
| 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
{noformat}

This uses index to resolve the ORDER BY ... LIMIT and the select takes 0.01 sec to execute.

Dataset:

{noformat}
create table ten(a int);
insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);

create table one_k(a int);
insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;

create table t_fact
(
  fact_id int not null,
  dim1_id int not null,
  dim2_id int not null,
  col1 int not null,
  primary key(fact_id),
  key(dim1_id),
  key(dim2_id),
  key(col1)
);

insert into t_fact
select
  A.a+1000*B.a+1000*1000*C.a,
  A.a,
  B.a,
  A.a+1000*B.a+1000*1000*C.a
from
  one_k A ,
  one_k B,
  ten C
where
A.a<500 and B.a<500
;

create table dim1
(
  dim1_id int not null primary key,
  col1 int
);

insert into dim1
select a,a from one_k where a<500;

create table dim2
(
  dim2_id int not null primary key,
  col1 int
);
insert into dim2
select a,a from one_k where a<500;
{noformat}

Julien Fritsch made changes - 2022-08-10 08:04

Description

A long standing (and informally known) issue:-

Join optimizer makes its choices \[almost\] without regard for ORDER BY ... LIMIT clause. ORDER BY ... LIMIT optimizer is invoked when the join order is already fixed. If the picked join order doesn't allow to resolve ORDER BY ... LIMIT efficiently... then we end up with a very poor query plan.

Example:

{noformat}
select * from
  t_fact
    join dim1 on t_fact.dim1_id= dim1.dim1_id
    join dim2 on t_fact.dim2_id= dim2.dim2_id
order by
   t_fact.col1
limit 1000;
{noformat}

{noformat}
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
| 1 | SIMPLE | dim1 | ALL | PRIMARY | NULL | NULL | NULL | 500 | Using temporary; Using filesort |
| 1 | SIMPLE | t_fact | ref | dim1_id,dim2_id | dim1_id | 4 | j3.dim1.dim1_id | 1 | |
| 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
{noformat}

This uses filesort and takes ~8 sec.
Now, let's force the right join order:

{noformat}
select * from
  t_fact
    straight_join dim1 on t_fact.dim1_id= dim1.dim1_id
    straight_join dim2 on t_fact.dim2_id= dim2.dim2_id
order by
   t_fact.col1
limit 1000;
{noformat}

{noformat}
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
| 1 | SIMPLE | t_fact | index | dim1_id,dim2_id | col1 | 4 | NULL | 1000 | |
| 1 | SIMPLE | dim1 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim1_id | 1 | |
| 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
{noformat}

This uses index to resolve the ORDER BY ... LIMIT and the select takes 0.01 sec to execute.

Dataset:

{noformat}
create table ten(a int);
insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);

create table one_k(a int);
insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;

create table t_fact
(
  fact_id int not null,
  dim1_id int not null,
  dim2_id int not null,
  col1 int not null,
  primary key(fact_id),
  key(dim1_id),
  key(dim2_id),
  key(col1)
);

insert into t_fact
select
  A.a+1000*B.a+1000*1000*C.a,
  A.a,
  B.a,
  A.a+1000*B.a+1000*1000*C.a
from
  one_k A ,
  one_k B,
  ten C
where
A.a<500 and B.a<500
;

create table dim1
(
  dim1_id int not null primary key,
  col1 int
);

insert into dim1
select a,a from one_k where a<500;

create table dim2
(
  dim2_id int not null primary key,
  col1 int
);
insert into dim2
select a,a from one_k where a<500;
{noformat}

A long standing (and informally known) issue:

Join optimizer makes its choices [almost] without regard for ORDER BY ... LIMIT clause. ORDER BY ... LIMIT optimizer is invoked when the join order is already fixed. If the picked join order doesn't allow to resolve ORDER BY ... LIMIT efficiently... then we end up with a very poor query plan.

Example:

{noformat}
select * from
  t_fact
    join dim1 on t_fact.dim1_id= dim1.dim1_id
    join dim2 on t_fact.dim2_id= dim2.dim2_id
order by
   t_fact.col1
limit 1000;
{noformat}

{noformat}
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
| 1 | SIMPLE | dim1 | ALL | PRIMARY | NULL | NULL | NULL | 500 | Using temporary; Using filesort |
| 1 | SIMPLE | t_fact | ref | dim1_id,dim2_id | dim1_id | 4 | j3.dim1.dim1_id | 1 | |
| 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
{noformat}

This uses filesort and takes ~8 sec.
Now, let's force the right join order:

{noformat}
select * from
  t_fact
    straight_join dim1 on t_fact.dim1_id= dim1.dim1_id
    straight_join dim2 on t_fact.dim2_id= dim2.dim2_id
order by
   t_fact.col1
limit 1000;
{noformat}

{noformat}
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
| 1 | SIMPLE | t_fact | index | dim1_id,dim2_id | col1 | 4 | NULL | 1000 | |
| 1 | SIMPLE | dim1 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim1_id | 1 | |
| 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | j3.t_fact.dim2_id | 1 | |
+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
{noformat}

This uses index to resolve the ORDER BY ... LIMIT and the select takes 0.01 sec to execute.

Dataset:

{noformat}
create table ten(a int);
insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);

create table one_k(a int);
insert into one_k select A.a + B.a* 10 + C.a * 100 from ten A, ten B, ten C;

create table t_fact
(
  fact_id int not null,
  dim1_id int not null,
  dim2_id int not null,
  col1 int not null,
  primary key(fact_id),
  key(dim1_id),
  key(dim2_id),
  key(col1)
);

insert into t_fact
select
  A.a+1000*B.a+1000*1000*C.a,
  A.a,
  B.a,
  A.a+1000*B.a+1000*1000*C.a
from
  one_k A ,
  one_k B,
  ten C
where
A.a<500 and B.a<500
;

create table dim1
(
  dim1_id int not null primary key,
  col1 int
);

insert into dim1
select a,a from one_k where a<500;

create table dim2
(
  dim2_id int not null primary key,
  col1 int
);
insert into dim2
select a,a from one_k where a<500;
{noformat}

Julien Fritsch made changes - 2022-08-11 10:19

Priority

Major [ 3 ]

Critical [ 2 ]

Sergei Petrunia made changes - 2022-08-11 17:50

Fix Version/s		10.12 [ 28320 ]
Fix Version/s	10.11 [ 27614 ]

Sergei Golubchik made changes - 2022-11-08 15:40

Link

This issue is duplicated by ~~MDEV-6813~~ [ ~~MDEV-6813~~ ]

Sergei Petrunia made changes - 2022-12-09 10:46

Status

In Review [ 10002 ]

Stalled [ 10000 ]

Sergei Petrunia made changes - 2022-12-09 10:46

Priority

Critical [ 2 ]

Major [ 3 ]

Sergei Petrunia made changes - 2023-03-24 16:23

Fix Version/s		11.1 [ 28549 ]
Fix Version/s	11.0 [ 28320 ]

Sergei Petrunia made changes - 2023-03-24 16:24

Fix Version/s		11.2 [ 28603 ]
Fix Version/s	11.1 [ 28549 ]

Ralf Gebhardt made changes - 2023-08-03 07:45

Fix Version/s		11.3 [ 28565 ]
Fix Version/s	11.2 [ 28603 ]

Sergei Petrunia made changes - 2023-08-15 13:30

Fix Version/s		11.4 [ 29301 ]
Fix Version/s	11.3 [ 28565 ]

Sergei Petrunia made changes - 2023-11-03 08:06

Labels

optimizer order-by-optimization

optimizer optimizer-feature order-by-optimization

Julien Fritsch made changes - 2023-11-30 16:29

Issue Type

Task [ 3 ]

New Feature [ 2 ]

Sergei Petrunia made changes - 2023-12-12 15:16

Fix Version/s		11.5 [ 29506 ]
Fix Version/s	11.4 [ 29301 ]

Roel Van de Paar made changes - 2023-12-16 02:15

Link

This issue is blocked by MDEV-21643 [ MDEV-21643 ]

Roel Van de Paar made changes - 2023-12-16 02:15

Link

This issue is blocked by MDEV-21643 [ MDEV-21643 ]

Roel Van de Paar made changes - 2023-12-16 02:16

Link

This issue is blocked by MDEV-21643 [ MDEV-21643 ]

Roel Van de Paar made changes - 2023-12-16 02:16

Link

This issue relates to MDEV-21643 [ MDEV-21643 ]

Sergei Golubchik made changes - 2024-03-19 18:31

Fix Version/s		11.6 [ 29515 ]
Fix Version/s	11.5 [ 29506 ]

Ralf Gebhardt made changes - 2024-06-21 09:21

Fix Version/s		11.7 [ 29815 ]
Fix Version/s	11.6 [ 29515 ]

Jira Automation (IT) made changes - 2024-07-04 08:53

Zendesk Related Tickets		201658 202060 159278
Zendesk active tickets		201658

Sergei Golubchik made changes - 2024-09-24 13:54

Fix Version/s		11.8 [ 29921 ]
Fix Version/s	11.7 [ 29815 ]

Sergei Golubchik made changes - 2024-10-24 14:56

Link

This issue relates to ~~MDEV-35246~~ [ ~~MDEV-35246~~ ]

Alice Sherepa made changes - 2024-11-05 13:36

Link

This issue relates to MDEV-35280 [ MDEV-35280 ]

Alice Sherepa made changes - 2024-11-05 13:45

Link

This issue relates to MDEV-18079 [ MDEV-18079 ]

Sergei Golubchik made changes - 2024-11-16 12:12

Link

This issue blocks MDEV-33412 [ MDEV-33412 ]

Sergei Petrunia made changes - 2024-11-26 16:12

Fix Version/s		11.9 [ 29945 ]
Fix Version/s	11.8 [ 29921 ]

Sergei Petrunia made changes - 2025-02-26 08:40

Fix Version/s		12.1 [ 29992 ]
Fix Version/s	12.0 [ 29945 ]

Sergei Petrunia made changes - 2025-02-27 13:39

Link

This issue is blocked by MDEV-21643 [ MDEV-21643 ]

Sergei Petrunia made changes - 2025-02-27 13:40

Link

This issue includes MDEV-21643 [ MDEV-21643 ]

Sergei Petrunia made changes - 2025-02-27 13:52

Link

This issue relates to ~~MDEV-34720~~ [ ~~MDEV-34720~~ ]

Sergei Petrunia made changes - 2025-03-11 08:53

Link

This issue relates to ~~MDEV-25480~~ [ ~~MDEV-25480~~ ]

Julien Fritsch made changes - 2025-04-02 17:34

Sprint

Server 12.1 dev sprint [ 793 ]

Julien Fritsch made changes - Yesterday

Sprint

Server 12.1 dev sprint [ 793 ]

MariaDB Server

Complete cost-based optimization for ORDER BY with LIMIT

Details

Description

Attachments

Issue Links

Sub-Tasks

Activity

"Interesting orders"

Peculiar property of the LIMIT-n construct

Simplified approach - satisfy ORDER BY, ignore the LIMIT

General description

Optimizer part

Take the LIMIT part of ORDER BY LIMIT into account during join optimization

Add a sort operation when join prefix allows it

No join buffering in ordered-join mode

No overlaps with semi/outer joins

Plan refinement part

No linked join buffers across the bush bound

Create temp. table

Access to tables inside the sort nest from post-sorting context

Possible solutions:

The “Unpack” solution (aka copy-back)

The “refer-to-temptable like GROUP BY does” solution

The “refer to temptable but not like GROUP BY does it” solution.

Implementation details

Finding if the Query can consider use the cost based approach for ORDER BY with LIMIT

Equality propagation for Order BY Items

Find indexes that can resolve the ORDER BY clause

Get the cardinality estimate for the join

Run the join planner to get the best plan that will consider using a sort nest

After the best plan is picked, we create the sort-nest info structure if the best plan uses a sort-nest.

For the first non-const table that uses an index to resolve ordering setup range/index scan

Creating the temp table for the sort-nest

Join buffering is disabled as it can break ordering

Substitution of items belonging to tables inside the nest with the items of the temp table of the nest

Materializing the result inside the temp table of the sort-nest

People

Dates

Git Integration