Complete cost-based optimization for ORDER BY with LIMIT
(MDEV-8306)
|
|
| Status: | Stalled |
| Project: | MariaDB Server |
| Component/s: | Optimizer |
| Affects Version/s: | 10.5 |
| Fix Version/s: | 11.5 |
| Type: | Technical task | Priority: | Major |
| Reporter: | Varun Gupta (Inactive) | Assignee: | Sergei Petrunia |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Description |
|
In the function best_access_path, where we try to find the best way to read a tables (index[range, ref, scan] and table scan) Current code but does not consider that a particular access method considered can achieve ordering. To consider this we would need to check for all the indexes of the table that is ordering achieved by them or not. We do something similar in test_for_cheaper_ordering, where we go through all the indexes of the first non-const table and check if we can achieve the ordering. If the ordering can be satisfied then we consider range or index scan on the given index. I think we can also think of adding this to best_access_path. With such cases we would not need to add of costing and we can shortcut for limit from the point onwards. The query is
BAD PLAN
GOOD PLAN
The BAD PLAN takes a very long time to execute , it does the full join and then applies the limit |
| Comments |
| Comment by Varun Gupta (Inactive) [ 2019-08-22 ] | |||||||||||||||||||||||||||||||||||||
|
How to do it:
| |||||||||||||||||||||||||||||||||||||
| Comment by Varun Gupta (Inactive) [ 2019-08-23 ] | |||||||||||||||||||||||||||||||||||||
|
Here is an example of how the index that was selected shows in the best_access_plan , pasting from the trace how this looks Query is : select * from t1, t2 where t1.a=1 and t1.c=t2.b order by t1.b limit 2; So we have for table t1 access methods ref on index(a) , table scan and index_scan on index(b) that would achieve the ordering.
So best_index here is 1 that is index b of table t1, that is selected for index scan. | |||||||||||||||||||||||||||||||||||||
| Comment by Varun Gupta (Inactive) [ 2019-08-23 ] | |||||||||||||||||||||||||||||||||||||
|
Now the another task would be to make sure that we set for this case to use index scan as the access method for the JOIN_TAB
| |||||||||||||||||||||||||||||||||||||
| Comment by Varun Gupta (Inactive) [ 2019-08-23 ] | |||||||||||||||||||||||||||||||||||||
|
Finally got the desired plan in the description by the adding cost based approach for an index to achieve the ordering
The execution time for the select is 0.29s | |||||||||||||||||||||||||||||||||||||
| Comment by Varun Gupta (Inactive) [ 2019-11-14 ] | |||||||||||||||||||||||||||||||||||||
|
This has been implemented along with MDEV-8306, so it is in review | |||||||||||||||||||||||||||||||||||||
| Comment by Rick James [ 2022-08-31 ] | |||||||||||||||||||||||||||||||||||||
|
It's a tough decision. You have given an example where focusing on the ORDER BY is a winner. But, if there are a million rows in the table you are sorting on, but less than the 1000 pass muster when WHERE is applied, you have scanned 1000000 rows in order to avoid a sort of less than 1000. No, I don't see a 'safe' way to discover which optimization to use when. With "index hints" the programmer can choose. So, maybe this is a "won't fix". |