Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
10.6
Description
Task setting
This is a subset of MDEV-8306 that hits the customer.
They have
select |
...
|
from |
big_table
|
join small_table1 on small_table1.key=big_table.key |
join small_table2 on small_table2.key=big_table.key |
... and so forth ... |
order by bug_table.key limit N |
for a very small value of N.
The optimizer sets
join->sort_by_table = (big_table);
|
so join optimization adds extra costs for query plans that do not start from big_table (denote this TMP-TABLE-COST).
But this doesn't help in all cases. Join orders of
big_table, small_table1, small_table_2 ....
|
and
small_table1, big_table (using ref access), small_table2, ...
|
have close costs. And indeed, the execution times for the two join orders (with ORDER BY ... LIMIT removed) are close.
TMP-TABLE-COST is a small fraction of join cost and doesn't make a difference (this seems to match the reality, too).
What does make a difference is short-cutting of execution due to LIMIT, but this is not taken into account.
Doing it in general case requires everything from MDEV-8306.
This MDEV is to have a fix for the join optimizer which adjusts the join order cost for cases with very small limit.
The new behavior is to be controlled by a setting.
If one runs OLTP-mostly workloads, enabling new behavior is an obvious choice.
Implementation
First, identify sort_by_table - the table that allows to use LIMIT short-cutting.
(Due to multiple equalities there can be multiple but we can assume it's just one table)
Do adjustment of cost/cardinality only for full join orders
Let's assume we've built a complete join order (ignoring ORDER BY ... LIMIT).
Having done that, we can compare join_output_size with the LIMIT number and see which "fraction" of join output we need and adjust the cost accordingly.
(An alternative to this: try to account for ORDER BY ...LIMIT earlier, while building join prefixes. We don't try to do this).
Adjusting join order cost and cardinality
Do it as follows:
1: If the join output is already produced in the required order, we can assume that $FRACT of join output will take $FRACT of the cost to produce. (Note: this is incorrect when using e.g. semi-join materialization or materialized derived tables. We ignore this fact for now).
Otherwise, we have two options:
2A: Change the access method for sort_by_table so that we do get rows in the desired order. After that, we can take a fraction of the cost like described above.
2B. Pass sort_by_table to filesort. We will need to spend the entire cost of reading sort_by_table but then we will be able to short-cut joining with subsequent tables.
Interplay with join optimizer pruning
So, we will adjust the costs only for complete join orders. The adjustment will reduce the cost and output-records.
Join pruning compares costs of prefix with full join order costs. Comparing un-adjusted costs with adjusted can produce wrong results.
- we can prune un-adjusted prefix that will be adjusted
Hooking into the join optimizer - variant 1
First, run the join optimization as usual. This produces QUERY_PLAN1 and a tight upper bound of how many records the join would produce. (If it's smaller than LIMIT, short-cutting will probably not be beneficial).
Save join->best_read and join->best_positions.
Then, start the optimization again:
set join->best_read= DBL_MAX,
Run the join optimization with first table=sort_by_table.
This produces QUERY_PLAN2 query plan that can take advantage of short-cutting.
Normally it's more expensive than QUERY_PLAN1.
Now, account for LIMIT short-cutting as specified in Adjusting join order's cost and cardinality section. This produces cost of QUERY_PLAN2-WITH-SHORTCUTTING. We can compare it with the cost of QUERY_PLAN1.
Hooking into the join optimizer - variant 2
Make the join optimizer first start with sort_table as the first table.
Once we have constructed a full join order, perform #rows/cost adjustments for LIMIT.
We can set the adjusted cost as join->best_read.
Then, proceed to run the join optimizer as usual.
Account for risks
QUERY_PLAN2-WITH-SHORTCUTTING is inherently more "risky". For example, if the join has a condition with selectivity=0.5 that the optimizer is not aware of, the actual cost of producing #LIMIT rows will be 1/0.5=2 times higher.
To account for this, we will add a multiplier, e.g. use QUERY_PLAN2-WITH-SHORTCUTTING only if promises a 100x speedup over QUERY_PLAN1.
Attachments
Issue Links
- causes
-
MDEV-35072 Assertion with optimizer_join_limit_pref_ratio and 1-table select
-
- Closed
-
-
MDEV-35164 optimizer_join_limit_pref_ratio: assertion when the ORDER BY table becomes constant
-
- Closed
-
- relates to
-
MDEV-8306 Complete cost-based optimization for ORDER BY with LIMIT
-
- Stalled
-
Activity
Field | Original Value | New Value |
---|---|---|
Description |
h2. Task setting
This is a subset of MDEV-8306 that hits the customer. They have {code:sql} select ... from big_table join small_table1 on small_table1.key=big_table.key join small_table2 on small_table2.key=big_table.key ... and so forth ... order by bug_table.key limit N {code} for a very small value of N. The optimizer sets {code:cpp} join->sort_by_table = (big_table); {code} so join optimization adds extra costs for query plans that do not start from big_table. But this doesn't help in all cases. Join orders of {code} big_table, small_table1, small_table_2 .... {code} and {code} small_table1, big_table (using ref access), small_table2, ... {code} have very close costs. What we need is a fix that takes the LIMIT clause into account. Doing it in general case requires everything from MDEV-8306. This MDEV is to have a fix for the join optimizer which adjusts the join order cost for cases with very small limit. The new behavior is controlled by a setting. If one runs OLTP-mostly workloads, enabling new behavior is an obvious choice. h2. Implementation |
h2. Task setting
This is a subset of MDEV-8306 that hits the customer. They have {code:sql} select ... from big_table join small_table1 on small_table1.key=big_table.key join small_table2 on small_table2.key=big_table.key ... and so forth ... order by bug_table.key limit N {code} for a very small value of N. The optimizer sets {code:cpp} join->sort_by_table = (big_table); {code} so join optimization adds extra costs for query plans that do not start from big_table. But this doesn't help in all cases. Join orders of {code} big_table, small_table1, small_table_2 .... {code} and {code} small_table1, big_table (using ref access), small_table2, ... {code} have very close costs. What we need is a fix that takes the LIMIT clause into account. Doing it in general case requires everything from MDEV-8306. This MDEV is to have a fix for the join optimizer which adjusts the join order cost for cases with very small limit. The new behavior is controlled by a setting. If one runs OLTP-mostly workloads, enabling new behavior is an obvious choice. h2. Implementation - variant1 First, run the optimization as usual. This gives an idea about how much records the join would produce. Second, run the optimization starting from the table that could use short-cutting. Note that there are two variants: 1. Use an index that will produce rows in the desired order 2. Feed the first table to filesort(), then read the sorted result. We will still be able to short-cut joining with other tables. In any case, we will get a join order that can take advantage of short-cutting. |
Assignee | Sergei Petrunia [ psergey ] |
Affects Version/s | 10.6 [ 24028 ] |
Fix Version/s | 10.6 [ 24028 ] |
Labels | order-by-optimization |
Link | This issue relates to TODO-4807 [ TODO-4807 ] |
Description |
h2. Task setting
This is a subset of MDEV-8306 that hits the customer. They have {code:sql} select ... from big_table join small_table1 on small_table1.key=big_table.key join small_table2 on small_table2.key=big_table.key ... and so forth ... order by bug_table.key limit N {code} for a very small value of N. The optimizer sets {code:cpp} join->sort_by_table = (big_table); {code} so join optimization adds extra costs for query plans that do not start from big_table. But this doesn't help in all cases. Join orders of {code} big_table, small_table1, small_table_2 .... {code} and {code} small_table1, big_table (using ref access), small_table2, ... {code} have very close costs. What we need is a fix that takes the LIMIT clause into account. Doing it in general case requires everything from MDEV-8306. This MDEV is to have a fix for the join optimizer which adjusts the join order cost for cases with very small limit. The new behavior is controlled by a setting. If one runs OLTP-mostly workloads, enabling new behavior is an obvious choice. h2. Implementation - variant1 First, run the optimization as usual. This gives an idea about how much records the join would produce. Second, run the optimization starting from the table that could use short-cutting. Note that there are two variants: 1. Use an index that will produce rows in the desired order 2. Feed the first table to filesort(), then read the sorted result. We will still be able to short-cut joining with other tables. In any case, we will get a join order that can take advantage of short-cutting. |
h2. Task setting
This is a subset of MDEV-8306 that hits the customer. They have {code:sql} select ... from big_table join small_table1 on small_table1.key=big_table.key join small_table2 on small_table2.key=big_table.key ... and so forth ... order by bug_table.key limit N {code} for a very small value of N. The optimizer sets {code:cpp} join->sort_by_table = (big_table); {code} so join optimization adds extra costs for query plans that do not start from big_table. But this doesn't help in all cases. Join orders of {code} big_table, small_table1, small_table_2 .... {code} and {code} small_table1, big_table (using ref access), small_table2, ... {code} have very close costs. What we need is a fix that takes the LIMIT clause into account. Doing it in general case requires everything from MDEV-8306. This MDEV is to have a fix for the join optimizer which adjusts the join order cost for cases with very small limit. The new behavior is controlled by a setting. If one runs OLTP-mostly workloads, enabling new behavior is an obvious choice. h2. Implementation - variant1 First, *run the join optimization as usual*. This produces QUERY_PLAN1 and a tight upper bound of how many records the join would produce. (If it's smaller than LIMIT, short-cutting will probably not be beneficial). Then determine sort_by_table - the table that allows to use LIMIT short-cutting. Then, *run the join optimization with first table=sort_by_table"*. This produces QUERY_PLAN2 query plan that can take advantage of short-cutting. Normally it's more expensive than QUERY_PLAN2 (note that we don't account for short-cutting yet). *Now, account for short-cutting* 1. If QUERY_PLAN2 uses an index that produces rows in the desired order, we can just take a fraction of it to see how much it would take to produce #LIMIT rows. (e.g. plan produces 100 records, we have LIMIT 20 -> multiply plan cost by 20/100=0.2). If QUERY_PLAN2 doesn't produce rows in the desired order, then 2A: consider changing access method for sort_by_table so that we do get rows in the desired order. After that, we can take a fraction of the cost like described above. 2B. consider passing {{sort_by_table}} to filesort. We will need to spend the entire cost of reading {{sort_by_table}} but then we will be able to short-cut the join operation. |
Description |
h2. Task setting
This is a subset of MDEV-8306 that hits the customer. They have {code:sql} select ... from big_table join small_table1 on small_table1.key=big_table.key join small_table2 on small_table2.key=big_table.key ... and so forth ... order by bug_table.key limit N {code} for a very small value of N. The optimizer sets {code:cpp} join->sort_by_table = (big_table); {code} so join optimization adds extra costs for query plans that do not start from big_table. But this doesn't help in all cases. Join orders of {code} big_table, small_table1, small_table_2 .... {code} and {code} small_table1, big_table (using ref access), small_table2, ... {code} have very close costs. What we need is a fix that takes the LIMIT clause into account. Doing it in general case requires everything from MDEV-8306. This MDEV is to have a fix for the join optimizer which adjusts the join order cost for cases with very small limit. The new behavior is controlled by a setting. If one runs OLTP-mostly workloads, enabling new behavior is an obvious choice. h2. Implementation - variant1 First, *run the join optimization as usual*. This produces QUERY_PLAN1 and a tight upper bound of how many records the join would produce. (If it's smaller than LIMIT, short-cutting will probably not be beneficial). Then determine sort_by_table - the table that allows to use LIMIT short-cutting. Then, *run the join optimization with first table=sort_by_table"*. This produces QUERY_PLAN2 query plan that can take advantage of short-cutting. Normally it's more expensive than QUERY_PLAN2 (note that we don't account for short-cutting yet). *Now, account for short-cutting* 1. If QUERY_PLAN2 uses an index that produces rows in the desired order, we can just take a fraction of it to see how much it would take to produce #LIMIT rows. (e.g. plan produces 100 records, we have LIMIT 20 -> multiply plan cost by 20/100=0.2). If QUERY_PLAN2 doesn't produce rows in the desired order, then 2A: consider changing access method for sort_by_table so that we do get rows in the desired order. After that, we can take a fraction of the cost like described above. 2B. consider passing {{sort_by_table}} to filesort. We will need to spend the entire cost of reading {{sort_by_table}} but then we will be able to short-cut the join operation. |
h2. Task setting
This is a subset of MDEV-8306 that hits the customer. They have {code:sql} select ... from big_table join small_table1 on small_table1.key=big_table.key join small_table2 on small_table2.key=big_table.key ... and so forth ... order by bug_table.key limit N {code} for a very small value of N. The optimizer sets {code:cpp} join->sort_by_table = (big_table); {code} so join optimization adds extra costs for query plans that do not start from big_table. But this doesn't help in all cases. Join orders of {code} big_table, small_table1, small_table_2 .... {code} and {code} small_table1, big_table (using ref access), small_table2, ... {code} have very close costs. What we need is a fix that takes the LIMIT clause into account. Doing it in general case requires everything from MDEV-8306. This MDEV is to have a fix for the join optimizer which adjusts the join order cost for cases with very small limit. The new behavior is controlled by a setting. If one runs OLTP-mostly workloads, enabling new behavior is an obvious choice. h2. Implementation - variant1 First, *run the join optimization as usual*. This produces QUERY_PLAN1 and a tight upper bound of how many records the join would produce. (If it's smaller than LIMIT, short-cutting will probably not be beneficial). Then determine sort_by_table - the table that allows to use LIMIT short-cutting. Then, *run the join optimization with first table=sort_by_table*. This produces QUERY_PLAN2 query plan that can take advantage of short-cutting. Normally it's more expensive than QUERY_PLAN2 (note that we don't account for short-cutting yet). *Now, account for short-cutting* 1. If QUERY_PLAN2 uses an index that produces rows in the desired order, we can just take a fraction of it to see how much it would take to produce #LIMIT rows. (e.g. plan produces 100 records, we have LIMIT 20 -> multiply plan cost by 20/100=0.2). If QUERY_PLAN2 doesn't produce rows in the desired order, then 2A: consider changing access method for sort_by_table so that we do get rows in the desired order. After that, we can take a fraction of the cost like described above. 2B. consider passing {{sort_by_table}} to filesort. We will need to spend the entire cost of reading {{sort_by_table}} but then we will be able to short-cut the join operation. |
Description |
h2. Task setting
This is a subset of MDEV-8306 that hits the customer. They have {code:sql} select ... from big_table join small_table1 on small_table1.key=big_table.key join small_table2 on small_table2.key=big_table.key ... and so forth ... order by bug_table.key limit N {code} for a very small value of N. The optimizer sets {code:cpp} join->sort_by_table = (big_table); {code} so join optimization adds extra costs for query plans that do not start from big_table. But this doesn't help in all cases. Join orders of {code} big_table, small_table1, small_table_2 .... {code} and {code} small_table1, big_table (using ref access), small_table2, ... {code} have very close costs. What we need is a fix that takes the LIMIT clause into account. Doing it in general case requires everything from MDEV-8306. This MDEV is to have a fix for the join optimizer which adjusts the join order cost for cases with very small limit. The new behavior is controlled by a setting. If one runs OLTP-mostly workloads, enabling new behavior is an obvious choice. h2. Implementation - variant1 First, *run the join optimization as usual*. This produces QUERY_PLAN1 and a tight upper bound of how many records the join would produce. (If it's smaller than LIMIT, short-cutting will probably not be beneficial). Then determine sort_by_table - the table that allows to use LIMIT short-cutting. Then, *run the join optimization with first table=sort_by_table*. This produces QUERY_PLAN2 query plan that can take advantage of short-cutting. Normally it's more expensive than QUERY_PLAN2 (note that we don't account for short-cutting yet). *Now, account for short-cutting* 1. If QUERY_PLAN2 uses an index that produces rows in the desired order, we can just take a fraction of it to see how much it would take to produce #LIMIT rows. (e.g. plan produces 100 records, we have LIMIT 20 -> multiply plan cost by 20/100=0.2). If QUERY_PLAN2 doesn't produce rows in the desired order, then 2A: consider changing access method for sort_by_table so that we do get rows in the desired order. After that, we can take a fraction of the cost like described above. 2B. consider passing {{sort_by_table}} to filesort. We will need to spend the entire cost of reading {{sort_by_table}} but then we will be able to short-cut the join operation. |
h2. Task setting
This is a subset of MDEV-8306 that hits the customer. They have {code:sql} select ... from big_table join small_table1 on small_table1.key=big_table.key join small_table2 on small_table2.key=big_table.key ... and so forth ... order by bug_table.key limit N {code} for a very small value of N. The optimizer sets {code:cpp} join->sort_by_table = (big_table); {code} so join optimization adds extra costs for query plans that do not start from big_table. But this doesn't help in all cases. Join orders of {code} big_table, small_table1, small_table_2 .... {code} and {code} small_table1, big_table (using ref access), small_table2, ... {code} have very close costs. What we need is a fix that takes the LIMIT clause into account. Doing it in general case requires everything from MDEV-8306. This MDEV is to have a fix for the join optimizer which adjusts the join order cost for cases with very small limit. The new behavior is controlled by a setting. If one runs OLTP-mostly workloads, enabling new behavior is an obvious choice. h2. Implementation - variant1 First, *run the join optimization as usual*. This produces QUERY_PLAN1 and a tight upper bound of how many records the join would produce. (If it's smaller than LIMIT, short-cutting will probably not be beneficial). Then determine sort_by_table - the table that allows to use LIMIT short-cutting. Then, *run the join optimization with first table=sort_by_table*. This produces QUERY_PLAN2 query plan that can take advantage of short-cutting. Normally it's more expensive than QUERY_PLAN2 (note that we don't account for short-cutting yet). *Now, account for short-cutting* 1. If QUERY_PLAN2 uses an index that produces rows in the desired order, we can just take a fraction of it to see how much it would take to produce #LIMIT rows. (e.g. plan produces 100 records, we have LIMIT 20 -> multiply plan cost by 20/100=0.2). If QUERY_PLAN2 doesn't produce rows in the desired order, then 2A: consider changing access method for sort_by_table so that we do get rows in the desired order. After that, we can take a fraction of the cost like described above. 2B. consider passing {{sort_by_table}} to filesort. We will need to spend the entire cost of reading {{sort_by_table}} but then we will be able to short-cut the join operation. In any case, we produce cost of QUERY_PLAN2-WITH-SHORTCUTTING and can compare it with cost of QUERY_PLAN1 h3. Account for risks QUERY_PLAN2-WITH-SHORTCUTTING is inherently more "risky". For example, if the join has a condition with selectivity=0.5 that the optimizer is not aware of, the actual cost of producing #LIMIT rows will be 1/0.5=2 times higher. To account for this, we will add a multiplier, e.g. use QUERY_PLAN2-WITH-SHORTCUTTING only if promises a 100x speedup over QUERY_PLAN1. |
Summary | Poor plan choice for large JOIN and small LIMIT. | Poor plan choice for large JOIN with ORDER BY and small LIMIT |
Attachment | psergey-mdev34720-1.diff [ 73919 ] |
Status | Open [ 1 ] | In Progress [ 3 ] |
Description |
h2. Task setting
This is a subset of MDEV-8306 that hits the customer. They have {code:sql} select ... from big_table join small_table1 on small_table1.key=big_table.key join small_table2 on small_table2.key=big_table.key ... and so forth ... order by bug_table.key limit N {code} for a very small value of N. The optimizer sets {code:cpp} join->sort_by_table = (big_table); {code} so join optimization adds extra costs for query plans that do not start from big_table. But this doesn't help in all cases. Join orders of {code} big_table, small_table1, small_table_2 .... {code} and {code} small_table1, big_table (using ref access), small_table2, ... {code} have very close costs. What we need is a fix that takes the LIMIT clause into account. Doing it in general case requires everything from MDEV-8306. This MDEV is to have a fix for the join optimizer which adjusts the join order cost for cases with very small limit. The new behavior is controlled by a setting. If one runs OLTP-mostly workloads, enabling new behavior is an obvious choice. h2. Implementation - variant1 First, *run the join optimization as usual*. This produces QUERY_PLAN1 and a tight upper bound of how many records the join would produce. (If it's smaller than LIMIT, short-cutting will probably not be beneficial). Then determine sort_by_table - the table that allows to use LIMIT short-cutting. Then, *run the join optimization with first table=sort_by_table*. This produces QUERY_PLAN2 query plan that can take advantage of short-cutting. Normally it's more expensive than QUERY_PLAN2 (note that we don't account for short-cutting yet). *Now, account for short-cutting* 1. If QUERY_PLAN2 uses an index that produces rows in the desired order, we can just take a fraction of it to see how much it would take to produce #LIMIT rows. (e.g. plan produces 100 records, we have LIMIT 20 -> multiply plan cost by 20/100=0.2). If QUERY_PLAN2 doesn't produce rows in the desired order, then 2A: consider changing access method for sort_by_table so that we do get rows in the desired order. After that, we can take a fraction of the cost like described above. 2B. consider passing {{sort_by_table}} to filesort. We will need to spend the entire cost of reading {{sort_by_table}} but then we will be able to short-cut the join operation. In any case, we produce cost of QUERY_PLAN2-WITH-SHORTCUTTING and can compare it with cost of QUERY_PLAN1 h3. Account for risks QUERY_PLAN2-WITH-SHORTCUTTING is inherently more "risky". For example, if the join has a condition with selectivity=0.5 that the optimizer is not aware of, the actual cost of producing #LIMIT rows will be 1/0.5=2 times higher. To account for this, we will add a multiplier, e.g. use QUERY_PLAN2-WITH-SHORTCUTTING only if promises a 100x speedup over QUERY_PLAN1. |
h2. Task setting
This is a subset of MDEV-8306 that hits the customer. They have {code:sql} select ... from big_table join small_table1 on small_table1.key=big_table.key join small_table2 on small_table2.key=big_table.key ... and so forth ... order by bug_table.key limit N {code} for a very small value of N. The optimizer sets {code:cpp} join->sort_by_table = (big_table); {code} so join optimization adds extra costs for query plans that do not start from big_table (denote this TMP-TABLE-COST). But this doesn't help in all cases. Join orders of {code} big_table, small_table1, small_table_2 .... {code} and {code} small_table1, big_table (using ref access), small_table2, ... {code} have close costs. And indeed, the execution times for the two join orders (with ORDER BY ... LIMIT removed) are close. TMP-TABLE-COST is a small fraction of join cost and doesn't make a difference (this seems to match the reality, too). What does make a difference is short-cutting of execution due to LIMIT, but this is not taken into account. Doing it in general case requires everything from MDEV-8306. This MDEV is to have a fix for the join optimizer which adjusts the join order cost for cases with very small limit. The new behavior is to be controlled by a setting. If one runs OLTP-mostly workloads, enabling new behavior is an obvious choice. h2. Implementation - variant1 First, *run the join optimization as usual*. This produces QUERY_PLAN1 and a tight upper bound of how many records the join would produce. (If it's smaller than LIMIT, short-cutting will probably not be beneficial). Then determine sort_by_table - the table that allows to use LIMIT short-cutting. Then, *run the join optimization with first table=sort_by_table*. This produces QUERY_PLAN2 query plan that can take advantage of short-cutting. Normally it's more expensive than QUERY_PLAN2 (note that we don't account for short-cutting yet). *Now, account for short-cutting* 1. If QUERY_PLAN2 uses an index that produces rows in the desired order, we can just take a fraction of it to see how much it would take to produce #LIMIT rows. (e.g. plan produces 100 records, we have LIMIT 20 -> multiply plan cost by 20/100=0.2). If QUERY_PLAN2 doesn't produce rows in the desired order, then 2A: consider changing access method for sort_by_table so that we do get rows in the desired order. After that, we can take a fraction of the cost like described above. 2B. consider passing {{sort_by_table}} to filesort. We will need to spend the entire cost of reading {{sort_by_table}} but then we will be able to short-cut the join operation. In any case, we produce cost of QUERY_PLAN2-WITH-SHORTCUTTING and can compare it with cost of QUERY_PLAN1 h3. Account for risks QUERY_PLAN2-WITH-SHORTCUTTING is inherently more "risky". For example, if the join has a condition with selectivity=0.5 that the optimizer is not aware of, the actual cost of producing #LIMIT rows will be 1/0.5=2 times higher. To account for this, we will add a multiplier, e.g. use QUERY_PLAN2-WITH-SHORTCUTTING only if promises a 100x speedup over QUERY_PLAN1. |
Description |
h2. Task setting
This is a subset of MDEV-8306 that hits the customer. They have {code:sql} select ... from big_table join small_table1 on small_table1.key=big_table.key join small_table2 on small_table2.key=big_table.key ... and so forth ... order by bug_table.key limit N {code} for a very small value of N. The optimizer sets {code:cpp} join->sort_by_table = (big_table); {code} so join optimization adds extra costs for query plans that do not start from big_table (denote this TMP-TABLE-COST). But this doesn't help in all cases. Join orders of {code} big_table, small_table1, small_table_2 .... {code} and {code} small_table1, big_table (using ref access), small_table2, ... {code} have close costs. And indeed, the execution times for the two join orders (with ORDER BY ... LIMIT removed) are close. TMP-TABLE-COST is a small fraction of join cost and doesn't make a difference (this seems to match the reality, too). What does make a difference is short-cutting of execution due to LIMIT, but this is not taken into account. Doing it in general case requires everything from MDEV-8306. This MDEV is to have a fix for the join optimizer which adjusts the join order cost for cases with very small limit. The new behavior is to be controlled by a setting. If one runs OLTP-mostly workloads, enabling new behavior is an obvious choice. h2. Implementation - variant1 First, *run the join optimization as usual*. This produces QUERY_PLAN1 and a tight upper bound of how many records the join would produce. (If it's smaller than LIMIT, short-cutting will probably not be beneficial). Then determine sort_by_table - the table that allows to use LIMIT short-cutting. Then, *run the join optimization with first table=sort_by_table*. This produces QUERY_PLAN2 query plan that can take advantage of short-cutting. Normally it's more expensive than QUERY_PLAN2 (note that we don't account for short-cutting yet). *Now, account for short-cutting* 1. If QUERY_PLAN2 uses an index that produces rows in the desired order, we can just take a fraction of it to see how much it would take to produce #LIMIT rows. (e.g. plan produces 100 records, we have LIMIT 20 -> multiply plan cost by 20/100=0.2). If QUERY_PLAN2 doesn't produce rows in the desired order, then 2A: consider changing access method for sort_by_table so that we do get rows in the desired order. After that, we can take a fraction of the cost like described above. 2B. consider passing {{sort_by_table}} to filesort. We will need to spend the entire cost of reading {{sort_by_table}} but then we will be able to short-cut the join operation. In any case, we produce cost of QUERY_PLAN2-WITH-SHORTCUTTING and can compare it with cost of QUERY_PLAN1 h3. Account for risks QUERY_PLAN2-WITH-SHORTCUTTING is inherently more "risky". For example, if the join has a condition with selectivity=0.5 that the optimizer is not aware of, the actual cost of producing #LIMIT rows will be 1/0.5=2 times higher. To account for this, we will add a multiplier, e.g. use QUERY_PLAN2-WITH-SHORTCUTTING only if promises a 100x speedup over QUERY_PLAN1. |
h2. Task setting
This is a subset of MDEV-8306 that hits the customer. They have {code:sql} select ... from big_table join small_table1 on small_table1.key=big_table.key join small_table2 on small_table2.key=big_table.key ... and so forth ... order by bug_table.key limit N {code} for a very small value of N. The optimizer sets {code:cpp} join->sort_by_table = (big_table); {code} so join optimization adds extra costs for query plans that do not start from big_table (denote this TMP-TABLE-COST). But this doesn't help in all cases. Join orders of {code} big_table, small_table1, small_table_2 .... {code} and {code} small_table1, big_table (using ref access), small_table2, ... {code} have close costs. And indeed, the execution times for the two join orders (with ORDER BY ... LIMIT removed) are close. TMP-TABLE-COST is a small fraction of join cost and doesn't make a difference (this seems to match the reality, too). What does make a difference is short-cutting of execution due to LIMIT, but this is not taken into account. Doing it in general case requires everything from MDEV-8306. This MDEV is to have a fix for the join optimizer which adjusts the join order cost for cases with very small limit. The new behavior is to be controlled by a setting. If one runs OLTP-mostly workloads, enabling new behavior is an obvious choice. h2. Implementation - variant1 First, *run the join optimization as usual*. This produces QUERY_PLAN1 and a tight upper bound of how many records the join would produce. (If it's smaller than LIMIT, short-cutting will probably not be beneficial). Then determine sort_by_table - the table that allows to use LIMIT short-cutting. Then, *run the join optimization with first table=sort_by_table*. This produces QUERY_PLAN2 query plan that can take advantage of short-cutting. Normally it's more expensive than QUERY_PLAN1 (note that we don't account for short-cutting yet). *Now, account for short-cutting* 1. If QUERY_PLAN2 uses an index that produces rows in the desired order, we can just take a fraction of it to see how much it would take to produce #LIMIT rows. (e.g. plan produces 100 records, we have LIMIT 20 -> multiply plan cost by 20/100=0.2). If QUERY_PLAN2 doesn't produce rows in the desired order, then 2A: consider changing access method for sort_by_table so that we do get rows in the desired order. After that, we can take a fraction of the cost like described above. 2B. consider passing {{sort_by_table}} to filesort. We will need to spend the entire cost of reading {{sort_by_table}} but then we will be able to short-cut the join operation. In any case, we produce cost of QUERY_PLAN2-WITH-SHORTCUTTING and can compare it with cost of QUERY_PLAN1 h3. Account for risks QUERY_PLAN2-WITH-SHORTCUTTING is inherently more "risky". For example, if the join has a condition with selectivity=0.5 that the optimizer is not aware of, the actual cost of producing #LIMIT rows will be 1/0.5=2 times higher. To account for this, we will add a multiplier, e.g. use QUERY_PLAN2-WITH-SHORTCUTTING only if promises a 100x speedup over QUERY_PLAN1. |
Description |
h2. Task setting
This is a subset of MDEV-8306 that hits the customer. They have {code:sql} select ... from big_table join small_table1 on small_table1.key=big_table.key join small_table2 on small_table2.key=big_table.key ... and so forth ... order by bug_table.key limit N {code} for a very small value of N. The optimizer sets {code:cpp} join->sort_by_table = (big_table); {code} so join optimization adds extra costs for query plans that do not start from big_table (denote this TMP-TABLE-COST). But this doesn't help in all cases. Join orders of {code} big_table, small_table1, small_table_2 .... {code} and {code} small_table1, big_table (using ref access), small_table2, ... {code} have close costs. And indeed, the execution times for the two join orders (with ORDER BY ... LIMIT removed) are close. TMP-TABLE-COST is a small fraction of join cost and doesn't make a difference (this seems to match the reality, too). What does make a difference is short-cutting of execution due to LIMIT, but this is not taken into account. Doing it in general case requires everything from MDEV-8306. This MDEV is to have a fix for the join optimizer which adjusts the join order cost for cases with very small limit. The new behavior is to be controlled by a setting. If one runs OLTP-mostly workloads, enabling new behavior is an obvious choice. h2. Implementation - variant1 First, *run the join optimization as usual*. This produces QUERY_PLAN1 and a tight upper bound of how many records the join would produce. (If it's smaller than LIMIT, short-cutting will probably not be beneficial). Then determine sort_by_table - the table that allows to use LIMIT short-cutting. Then, *run the join optimization with first table=sort_by_table*. This produces QUERY_PLAN2 query plan that can take advantage of short-cutting. Normally it's more expensive than QUERY_PLAN1 (note that we don't account for short-cutting yet). *Now, account for short-cutting* 1. If QUERY_PLAN2 uses an index that produces rows in the desired order, we can just take a fraction of it to see how much it would take to produce #LIMIT rows. (e.g. plan produces 100 records, we have LIMIT 20 -> multiply plan cost by 20/100=0.2). If QUERY_PLAN2 doesn't produce rows in the desired order, then 2A: consider changing access method for sort_by_table so that we do get rows in the desired order. After that, we can take a fraction of the cost like described above. 2B. consider passing {{sort_by_table}} to filesort. We will need to spend the entire cost of reading {{sort_by_table}} but then we will be able to short-cut the join operation. In any case, we produce cost of QUERY_PLAN2-WITH-SHORTCUTTING and can compare it with cost of QUERY_PLAN1 h3. Account for risks QUERY_PLAN2-WITH-SHORTCUTTING is inherently more "risky". For example, if the join has a condition with selectivity=0.5 that the optimizer is not aware of, the actual cost of producing #LIMIT rows will be 1/0.5=2 times higher. To account for this, we will add a multiplier, e.g. use QUERY_PLAN2-WITH-SHORTCUTTING only if promises a 100x speedup over QUERY_PLAN1. |
h2. Task setting
This is a subset of MDEV-8306 that hits the customer. They have {code:sql} select ... from big_table join small_table1 on small_table1.key=big_table.key join small_table2 on small_table2.key=big_table.key ... and so forth ... order by bug_table.key limit N {code} for a very small value of N. The optimizer sets {code:cpp} join->sort_by_table = (big_table); {code} so join optimization adds extra costs for query plans that do not start from big_table (denote this TMP-TABLE-COST). But this doesn't help in all cases. Join orders of {code} big_table, small_table1, small_table_2 .... {code} and {code} small_table1, big_table (using ref access), small_table2, ... {code} have close costs. And indeed, the execution times for the two join orders (with ORDER BY ... LIMIT removed) are close. TMP-TABLE-COST is a small fraction of join cost and doesn't make a difference (this seems to match the reality, too). What does make a difference is short-cutting of execution due to LIMIT, but this is not taken into account. Doing it in general case requires everything from MDEV-8306. This MDEV is to have a fix for the join optimizer which adjusts the join order cost for cases with very small limit. The new behavior is to be controlled by a setting. If one runs OLTP-mostly workloads, enabling new behavior is an obvious choice. h2. Implementation First, identify {{sort_by_table}} - the table that allows to use LIMIT short-cutting. (Due to multiple equalities there can be multiple but we can assume it's just one table) h3. Do adjustment of cost/cardinality only for full join orders Let's assume we've built a *complete* join order (ignoring ORDER BY ... LIMIT). Having done that, we can compare join_output_size with the LIMIT number and see which "fraction" of join output we need and adjust the cost accordingly. (An alternative to this: try to account for ORDER BY ...LIMIT earlier, while building join prefixes. We don't try to do this). h3. Adjusting join order cost and cardinality Do it as follows: 1: If the join output is already produced in the required order, we can assume that $FRACT of join output will take $FRACT of the cost to produce. (Note: this is incorrect when using e.g. semi-join materialization or materialized derived tables. We ignore this fact for now). Otherwise, we have two options: 2A: Change the access method for {{sort_by_table}} so that we do get rows in the desired order. After that, we can take a fraction of the cost like described above. 2B. Pass {{sort_by_table}} to filesort. We will need to spend the entire cost of reading {{sort_by_table}} but then we will be able to short-cut joining with subsequent tables. h3. Join optimizer pruning TODO h3. Hooking into the join optimizer - variant1 First, *run the join optimization as usual*. This produces QUERY_PLAN1 and a tight upper bound of how many records the join would produce. (If it's smaller than LIMIT, short-cutting will probably not be beneficial). Save {{join->best_read}} and {{join->best_positions}}. Then, start the optimization again: set join->best_read= DBL_MAX, *Run the join optimization with first table=sort_by_table*. This produces QUERY_PLAN2 query plan that can take advantage of short-cutting. Normally it's more expensive than QUERY_PLAN1. *Now, account for LIMIT short-cutting* as specified in _Adjusting join order's cost and cardinality_ section. This produces cost of QUERY_PLAN2-WITH-SHORTCUTTING. We can compare it with the cost of QUERY_PLAN1. h3. Account for risks QUERY_PLAN2-WITH-SHORTCUTTING is inherently more "risky". For example, if the join has a condition with selectivity=0.5 that the optimizer is not aware of, the actual cost of producing #LIMIT rows will be 1/0.5=2 times higher. To account for this, we will add a multiplier, e.g. use QUERY_PLAN2-WITH-SHORTCUTTING only if promises a 100x speedup over QUERY_PLAN1. |
Description |
h2. Task setting
This is a subset of MDEV-8306 that hits the customer. They have {code:sql} select ... from big_table join small_table1 on small_table1.key=big_table.key join small_table2 on small_table2.key=big_table.key ... and so forth ... order by bug_table.key limit N {code} for a very small value of N. The optimizer sets {code:cpp} join->sort_by_table = (big_table); {code} so join optimization adds extra costs for query plans that do not start from big_table (denote this TMP-TABLE-COST). But this doesn't help in all cases. Join orders of {code} big_table, small_table1, small_table_2 .... {code} and {code} small_table1, big_table (using ref access), small_table2, ... {code} have close costs. And indeed, the execution times for the two join orders (with ORDER BY ... LIMIT removed) are close. TMP-TABLE-COST is a small fraction of join cost and doesn't make a difference (this seems to match the reality, too). What does make a difference is short-cutting of execution due to LIMIT, but this is not taken into account. Doing it in general case requires everything from MDEV-8306. This MDEV is to have a fix for the join optimizer which adjusts the join order cost for cases with very small limit. The new behavior is to be controlled by a setting. If one runs OLTP-mostly workloads, enabling new behavior is an obvious choice. h2. Implementation First, identify {{sort_by_table}} - the table that allows to use LIMIT short-cutting. (Due to multiple equalities there can be multiple but we can assume it's just one table) h3. Do adjustment of cost/cardinality only for full join orders Let's assume we've built a *complete* join order (ignoring ORDER BY ... LIMIT). Having done that, we can compare join_output_size with the LIMIT number and see which "fraction" of join output we need and adjust the cost accordingly. (An alternative to this: try to account for ORDER BY ...LIMIT earlier, while building join prefixes. We don't try to do this). h3. Adjusting join order cost and cardinality Do it as follows: 1: If the join output is already produced in the required order, we can assume that $FRACT of join output will take $FRACT of the cost to produce. (Note: this is incorrect when using e.g. semi-join materialization or materialized derived tables. We ignore this fact for now). Otherwise, we have two options: 2A: Change the access method for {{sort_by_table}} so that we do get rows in the desired order. After that, we can take a fraction of the cost like described above. 2B. Pass {{sort_by_table}} to filesort. We will need to spend the entire cost of reading {{sort_by_table}} but then we will be able to short-cut joining with subsequent tables. h3. Join optimizer pruning TODO h3. Hooking into the join optimizer - variant1 First, *run the join optimization as usual*. This produces QUERY_PLAN1 and a tight upper bound of how many records the join would produce. (If it's smaller than LIMIT, short-cutting will probably not be beneficial). Save {{join->best_read}} and {{join->best_positions}}. Then, start the optimization again: set join->best_read= DBL_MAX, *Run the join optimization with first table=sort_by_table*. This produces QUERY_PLAN2 query plan that can take advantage of short-cutting. Normally it's more expensive than QUERY_PLAN1. *Now, account for LIMIT short-cutting* as specified in _Adjusting join order's cost and cardinality_ section. This produces cost of QUERY_PLAN2-WITH-SHORTCUTTING. We can compare it with the cost of QUERY_PLAN1. h3. Account for risks QUERY_PLAN2-WITH-SHORTCUTTING is inherently more "risky". For example, if the join has a condition with selectivity=0.5 that the optimizer is not aware of, the actual cost of producing #LIMIT rows will be 1/0.5=2 times higher. To account for this, we will add a multiplier, e.g. use QUERY_PLAN2-WITH-SHORTCUTTING only if promises a 100x speedup over QUERY_PLAN1. |
h2. Task setting
This is a subset of MDEV-8306 that hits the customer. They have {code:sql} select ... from big_table join small_table1 on small_table1.key=big_table.key join small_table2 on small_table2.key=big_table.key ... and so forth ... order by bug_table.key limit N {code} for a very small value of N. The optimizer sets {code:cpp} join->sort_by_table = (big_table); {code} so join optimization adds extra costs for query plans that do not start from big_table (denote this TMP-TABLE-COST). But this doesn't help in all cases. Join orders of {code} big_table, small_table1, small_table_2 .... {code} and {code} small_table1, big_table (using ref access), small_table2, ... {code} have close costs. And indeed, the execution times for the two join orders (with ORDER BY ... LIMIT removed) are close. TMP-TABLE-COST is a small fraction of join cost and doesn't make a difference (this seems to match the reality, too). What does make a difference is short-cutting of execution due to LIMIT, but this is not taken into account. Doing it in general case requires everything from MDEV-8306. This MDEV is to have a fix for the join optimizer which adjusts the join order cost for cases with very small limit. The new behavior is to be controlled by a setting. If one runs OLTP-mostly workloads, enabling new behavior is an obvious choice. h2. Implementation First, identify {{sort_by_table}} - the table that allows to use LIMIT short-cutting. (Due to multiple equalities there can be multiple but we can assume it's just one table) h3. Do adjustment of cost/cardinality only for full join orders Let's assume we've built a *complete* join order (ignoring ORDER BY ... LIMIT). Having done that, we can compare join_output_size with the LIMIT number and see which "fraction" of join output we need and adjust the cost accordingly. (An alternative to this: try to account for ORDER BY ...LIMIT earlier, while building join prefixes. We don't try to do this). h3. Adjusting join order cost and cardinality Do it as follows: 1: If the join output is already produced in the required order, we can assume that $FRACT of join output will take $FRACT of the cost to produce. (Note: this is incorrect when using e.g. semi-join materialization or materialized derived tables. We ignore this fact for now). Otherwise, we have two options: 2A: Change the access method for {{sort_by_table}} so that we do get rows in the desired order. After that, we can take a fraction of the cost like described above. 2B. Pass {{sort_by_table}} to filesort. We will need to spend the entire cost of reading {{sort_by_table}} but then we will be able to short-cut joining with subsequent tables. h3. Interplay with join optimizer pruning So, we will adjust the costs only for complete join orders. The adjustment will reduce the cost and output-records. Join pruning compares costs of prefix with full join order costs. Comparing un-adjusted costs with adjusted can produce wrong results. * we can prune un-adjusted prefix that will be adjusted h3. Hooking into the join optimizer - variant1 First, *run the join optimization as usual*. This produces QUERY_PLAN1 and a tight upper bound of how many records the join would produce. (If it's smaller than LIMIT, short-cutting will probably not be beneficial). Save {{join->best_read}} and {{join->best_positions}}. Then, start the optimization again: set join->best_read= DBL_MAX, *Run the join optimization with first table=sort_by_table*. This produces QUERY_PLAN2 query plan that can take advantage of short-cutting. Normally it's more expensive than QUERY_PLAN1. *Now, account for LIMIT short-cutting* as specified in _Adjusting join order's cost and cardinality_ section. This produces cost of QUERY_PLAN2-WITH-SHORTCUTTING. We can compare it with the cost of QUERY_PLAN1. h3. Account for risks QUERY_PLAN2-WITH-SHORTCUTTING is inherently more "risky". For example, if the join has a condition with selectivity=0.5 that the optimizer is not aware of, the actual cost of producing #LIMIT rows will be 1/0.5=2 times higher. To account for this, we will add a multiplier, e.g. use QUERY_PLAN2-WITH-SHORTCUTTING only if promises a 100x speedup over QUERY_PLAN1. |
Description |
h2. Task setting
This is a subset of MDEV-8306 that hits the customer. They have {code:sql} select ... from big_table join small_table1 on small_table1.key=big_table.key join small_table2 on small_table2.key=big_table.key ... and so forth ... order by bug_table.key limit N {code} for a very small value of N. The optimizer sets {code:cpp} join->sort_by_table = (big_table); {code} so join optimization adds extra costs for query plans that do not start from big_table (denote this TMP-TABLE-COST). But this doesn't help in all cases. Join orders of {code} big_table, small_table1, small_table_2 .... {code} and {code} small_table1, big_table (using ref access), small_table2, ... {code} have close costs. And indeed, the execution times for the two join orders (with ORDER BY ... LIMIT removed) are close. TMP-TABLE-COST is a small fraction of join cost and doesn't make a difference (this seems to match the reality, too). What does make a difference is short-cutting of execution due to LIMIT, but this is not taken into account. Doing it in general case requires everything from MDEV-8306. This MDEV is to have a fix for the join optimizer which adjusts the join order cost for cases with very small limit. The new behavior is to be controlled by a setting. If one runs OLTP-mostly workloads, enabling new behavior is an obvious choice. h2. Implementation First, identify {{sort_by_table}} - the table that allows to use LIMIT short-cutting. (Due to multiple equalities there can be multiple but we can assume it's just one table) h3. Do adjustment of cost/cardinality only for full join orders Let's assume we've built a *complete* join order (ignoring ORDER BY ... LIMIT). Having done that, we can compare join_output_size with the LIMIT number and see which "fraction" of join output we need and adjust the cost accordingly. (An alternative to this: try to account for ORDER BY ...LIMIT earlier, while building join prefixes. We don't try to do this). h3. Adjusting join order cost and cardinality Do it as follows: 1: If the join output is already produced in the required order, we can assume that $FRACT of join output will take $FRACT of the cost to produce. (Note: this is incorrect when using e.g. semi-join materialization or materialized derived tables. We ignore this fact for now). Otherwise, we have two options: 2A: Change the access method for {{sort_by_table}} so that we do get rows in the desired order. After that, we can take a fraction of the cost like described above. 2B. Pass {{sort_by_table}} to filesort. We will need to spend the entire cost of reading {{sort_by_table}} but then we will be able to short-cut joining with subsequent tables. h3. Interplay with join optimizer pruning So, we will adjust the costs only for complete join orders. The adjustment will reduce the cost and output-records. Join pruning compares costs of prefix with full join order costs. Comparing un-adjusted costs with adjusted can produce wrong results. * we can prune un-adjusted prefix that will be adjusted h3. Hooking into the join optimizer - variant1 First, *run the join optimization as usual*. This produces QUERY_PLAN1 and a tight upper bound of how many records the join would produce. (If it's smaller than LIMIT, short-cutting will probably not be beneficial). Save {{join->best_read}} and {{join->best_positions}}. Then, start the optimization again: set join->best_read= DBL_MAX, *Run the join optimization with first table=sort_by_table*. This produces QUERY_PLAN2 query plan that can take advantage of short-cutting. Normally it's more expensive than QUERY_PLAN1. *Now, account for LIMIT short-cutting* as specified in _Adjusting join order's cost and cardinality_ section. This produces cost of QUERY_PLAN2-WITH-SHORTCUTTING. We can compare it with the cost of QUERY_PLAN1. h3. Account for risks QUERY_PLAN2-WITH-SHORTCUTTING is inherently more "risky". For example, if the join has a condition with selectivity=0.5 that the optimizer is not aware of, the actual cost of producing #LIMIT rows will be 1/0.5=2 times higher. To account for this, we will add a multiplier, e.g. use QUERY_PLAN2-WITH-SHORTCUTTING only if promises a 100x speedup over QUERY_PLAN1. |
h2. Task setting
This is a subset of MDEV-8306 that hits the customer. They have {code:sql} select ... from big_table join small_table1 on small_table1.key=big_table.key join small_table2 on small_table2.key=big_table.key ... and so forth ... order by bug_table.key limit N {code} for a very small value of N. The optimizer sets {code:cpp} join->sort_by_table = (big_table); {code} so join optimization adds extra costs for query plans that do not start from big_table (denote this TMP-TABLE-COST). But this doesn't help in all cases. Join orders of {code} big_table, small_table1, small_table_2 .... {code} and {code} small_table1, big_table (using ref access), small_table2, ... {code} have close costs. And indeed, the execution times for the two join orders (with ORDER BY ... LIMIT removed) are close. TMP-TABLE-COST is a small fraction of join cost and doesn't make a difference (this seems to match the reality, too). What does make a difference is short-cutting of execution due to LIMIT, but this is not taken into account. Doing it in general case requires everything from MDEV-8306. This MDEV is to have a fix for the join optimizer which adjusts the join order cost for cases with very small limit. The new behavior is to be controlled by a setting. If one runs OLTP-mostly workloads, enabling new behavior is an obvious choice. h2. Implementation First, identify {{sort_by_table}} - the table that allows to use LIMIT short-cutting. (Due to multiple equalities there can be multiple but we can assume it's just one table) h3. Do adjustment of cost/cardinality only for full join orders Let's assume we've built a *complete* join order (ignoring ORDER BY ... LIMIT). Having done that, we can compare join_output_size with the LIMIT number and see which "fraction" of join output we need and adjust the cost accordingly. (An alternative to this: try to account for ORDER BY ...LIMIT earlier, while building join prefixes. We don't try to do this). h3. Adjusting join order cost and cardinality Do it as follows: 1: If the join output is already produced in the required order, we can assume that $FRACT of join output will take $FRACT of the cost to produce. (Note: this is incorrect when using e.g. semi-join materialization or materialized derived tables. We ignore this fact for now). Otherwise, we have two options: 2A: Change the access method for {{sort_by_table}} so that we do get rows in the desired order. After that, we can take a fraction of the cost like described above. 2B. Pass {{sort_by_table}} to filesort. We will need to spend the entire cost of reading {{sort_by_table}} but then we will be able to short-cut joining with subsequent tables. h3. Interplay with join optimizer pruning So, we will adjust the costs only for complete join orders. The adjustment will reduce the cost and output-records. Join pruning compares costs of prefix with full join order costs. Comparing un-adjusted costs with adjusted can produce wrong results. * we can prune un-adjusted prefix that will be adjusted h3. Hooking into the join optimizer - variant 1 First, *run the join optimization as usual*. This produces QUERY_PLAN1 and a tight upper bound of how many records the join would produce. (If it's smaller than LIMIT, short-cutting will probably not be beneficial). Save {{join->best_read}} and {{join->best_positions}}. Then, start the optimization again: set join->best_read= DBL_MAX, *Run the join optimization with first table=sort_by_table*. This produces QUERY_PLAN2 query plan that can take advantage of short-cutting. Normally it's more expensive than QUERY_PLAN1. *Now, account for LIMIT short-cutting* as specified in _Adjusting join order's cost and cardinality_ section. This produces cost of QUERY_PLAN2-WITH-SHORTCUTTING. We can compare it with the cost of QUERY_PLAN1. h3. Hooking into the join optimizer - variant 2 Make the join optimizer first start with {{sort_table}} as the first table. Once we have constructed a full join order, perform #rows/cost adjustments for LIMIT. We can set the adjusted cost as {{join->best_read}}. Then, proceed to run the join optimizer as usual. h3. Account for risks QUERY_PLAN2-WITH-SHORTCUTTING is inherently more "risky". For example, if the join has a condition with selectivity=0.5 that the optimizer is not aware of, the actual cost of producing #LIMIT rows will be 1/0.5=2 times higher. To account for this, we will add a multiplier, e.g. use QUERY_PLAN2-WITH-SHORTCUTTING only if promises a 100x speedup over QUERY_PLAN1. |
Link | This issue is part of TODO-4816 [ TODO-4816 ] |
Link | This issue blocks TODO-4816 [ TODO-4816 ] |
Link | This issue is part of TODO-4816 [ TODO-4816 ] |
Priority | Major [ 3 ] | Critical [ 2 ] |
Assignee | Sergei Petrunia [ psergey ] | Michael Widenius [ monty ] |
Status | In Progress [ 3 ] | In Review [ 10002 ] |
Status | In Review [ 10002 ] | Stalled [ 10000 ] |
Link | This issue blocks MENT-2124 [ MENT-2124 ] |
Assignee | Michael Widenius [ monty ] | Sergei Petrunia [ psergey ] |
Fix Version/s | 10.6.20 [ 29903 ] | |
Fix Version/s | 10.11.10 [ 29904 ] | |
Fix Version/s | 11.2.6 [ 29906 ] | |
Fix Version/s | 11.4.4 [ 29907 ] | |
Fix Version/s | 11.6.2 [ 29908 ] | |
Fix Version/s | 10.6 [ 24028 ] | |
Resolution | Fixed [ 1 ] | |
Status | Stalled [ 10000 ] | Closed [ 6 ] |
Link |
This issue causes |
Link |
This issue causes |
bb-10.6-mdev34720 has a patch that is a work-in-progress