[MDEV-19854] Using indexes that can achieve ordering - Jira

Details

Type: Technical task
Status: Stalled (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: 10.5
Fix Version/s: 12.1
Component/s: Optimizer
Labels:
None

Description

In the function best_access_path, where we try to find the best way to read a tables (index[range, ref, scan] and table scan)

Current code but does not consider that a particular access method considered can achieve ordering.

To consider this we would need to check for all the indexes of the table that is ordering achieved by them or not. We do something similar in test_for_cheaper_ordering, where we go through all the indexes of the first non-const table and check if we can achieve the ordering. If the ordering can be satisfied then we consider range or index scan on the given index.

I think we can also think of adding this to best_access_path. With such cases we would not need to add of costing and we can shortcut for limit from the point onwards.
An example of this is the test case of mdev-8306

The query is

select * from

  t_fact

    join dim1 on t_fact.dim1_id= dim1.dim1_id

    join dim2 on t_fact.dim2_id= dim2.dim2_id

order by

   t_fact.col1

limit 1000;

BAD PLAN

+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+

| id   | select_type | table  | type   | possible_keys   | key     | key_len | ref               | rows | Extra                           |

+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+

|    1 | SIMPLE      | dim1   | ALL    | PRIMARY         | NULL    | NULL    | NULL              |  500 | Using temporary; Using filesort |

|    1 | SIMPLE      | t_fact | ref    | dim1_id,dim2_id | dim1_id | 4       | j3.dim1.dim1_id   |    1 |                                 |

|    1 | SIMPLE      | dim2   | eq_ref | PRIMARY         | PRIMARY | 4       | j3.t_fact.dim2_id |    1 |                                 |

+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+

GOOD PLAN

+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+

| id   | select_type | table  | type   | possible_keys   | key     | key_len | ref               | rows | Extra |

+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+

|    1 | SIMPLE      | t_fact | index  | dim1_id,dim2_id | col1    | 4       | NULL              | 1000 |       |

|    1 | SIMPLE      | dim1   | eq_ref | PRIMARY         | PRIMARY | 4       | j3.t_fact.dim1_id |    1 |       |

|    1 | SIMPLE      | dim2   | eq_ref | PRIMARY         | PRIMARY | 4       | j3.t_fact.dim2_id |    1 |       |

+------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+

The BAD PLAN takes a very long time to execute , it does the full join and then applies the limit
While in the GOOD PLAN(we apply STRAIGHT JOIN to get the best order), we pick the index that achieves the ordering and then shortcut it for limit

Attachments

Activity

Ascending order - Click to sort in descending order

Varun Gupta (Inactive) added a comment - 2019-08-22 18:44 - edited

How to do it:

Get cardinality of the join (this is done be running the join optimizer again)
Get the fanout, which should be cardinality/records_of_first_table
Adjust the limit to take the fanout into consideration
Adjust the limit to take into account the condition selectivity for the first table
Calculate cost to read the adjusted number of records

Varun Gupta (Inactive) added a comment - 2019-08-22 18:44 - edited How to do it: Get cardinality of the join (this is done be running the join optimizer again) Get the fanout, which should be cardinality/records_of_first_table Adjust the limit to take the fanout into consideration Adjust the limit to take into account the condition selectivity for the first table Calculate cost to read the adjusted number of records

Varun Gupta (Inactive) added a comment - 2019-08-23 11:47 - edited

Here is an example of how the index that was selected shows in the best_access_plan , pasting from the trace how this looks

Query is : select * from t1, t2 where t1.a=1 and t1.c=t2.b order by t1.b limit 2;
The tables are like
CREATE TABLE t1 (a int, b int, c int, KEY(a), key(b);
create table t2 (a int, b int, key(b));

So we have for table t1 access methods ref on index(a) , table scan and index_scan on index(b) that would achieve the ordering.

"best_access_path":

                "considered_access_paths":

                        "access_type": "ref",

                        "index": "a",

                        "used_range_estimates": true,

                        "rows": 20,

                        "cost": 5.7788,

                        "chosen": true

},

                        "access_type": "scan",

                        "resulting_rows": 20,

                        "cost": 2.0635,

                        "chosen": true

},

                        "considered_indexes":

                                "index": "b",

                                "updated_limit": 1,

                                "index_scan_time": 1

],

                        "best_index": 1,

                        "best_cost": 1

},

So best_index here is 1 that is index b of table t1, that is selected for index scan.

Varun Gupta (Inactive) added a comment - 2019-08-23 11:47 - edited Here is an example of how the index that was selected shows in the best_access_plan , pasting from the trace how this looks Query is : select * from t1, t2 where t1.a=1 and t1.c=t2.b order by t1.b limit 2; The tables are like CREATE TABLE t1 (a int, b int, c int, KEY(a), key(b); create table t2 (a int, b int, key(b)); So we have for table t1 access methods ref on index(a) , table scan and index_scan on index(b) that would achieve the ordering. "best_access_path": { "considered_access_paths": [ { "access_type": "ref", "index": "a", "used_range_estimates": true, "rows": 20, "cost": 5.7788, "chosen": true }, { "access_type": "scan", "resulting_rows": 20, "cost": 2.0635, "chosen": true }, { "considered_indexes": [ { "index": "b", "updated_limit": 1, "index_scan_time": 1 } ], "best_index": 1, "best_cost": 1 } ] }, So best_index here is 1 that is index b of table t1, that is selected for index scan.

Varun Gupta (Inactive) added a comment - 2019-08-23 11:52

Now the another task would be to make sure that we set for this case to use index scan as the access method for the JOIN_TAB
Currently even if it pick index scan in best access path it does not use the index, instead does a full table scan

MariaDB [test]> explain select * from t1, t2 where t1.a=1 and t1.c=t2.b order by t1.b limit 2;

+------+-------------+-------+------+---------------+------+---------+-----------+------+-----------------------------+

| id   | select_type | table | type | possible_keys | key  | key_len | ref       | rows | Extra                       |

+------+-------------+-------+------+---------------+------+---------+-----------+------+-----------------------------+

|    1 | SIMPLE      | t1    | ALL  | a             | NULL | NULL    | NULL      | 20   | Using where; Using filesort |

|    1 | SIMPLE      | t2    | ref  | b             | b    | 5       | test.t1.c | 2    |                             |

+------+-------------+-------+------+---------------+------+---------+-----------+------+-----------------------------+

2 rows in set (0.001 sec)

Varun Gupta (Inactive) added a comment - 2019-08-23 11:52 Now the another task would be to make sure that we set for this case to use index scan as the access method for the JOIN_TAB Currently even if it pick index scan in best access path it does not use the index, instead does a full table scan MariaDB [test]> explain select * from t1, t2 where t1.a=1 and t1.c=t2.b order by t1.b limit 2; +------+-------------+-------+------+---------------+------+---------+-----------+------+-----------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+------+---------+-----------+------+-----------------------------+ | 1 | SIMPLE | t1 | ALL | a | NULL | NULL | NULL | 20 | Using where; Using filesort | | 1 | SIMPLE | t2 | ref | b | b | 5 | test.t1.c | 2 | | +------+-------------+-------+------+---------------+------+---------+-----------+------+-----------------------------+ 2 rows in set (0.001 sec)

Varun Gupta (Inactive) added a comment - 2019-08-23 17:03 - edited

Finally got the desired plan in the description by the adding cost based approach for an index to achieve the ordering

MariaDB [test]> explain select * from    t_fact      join dim1 on t_fact.dim1_id= dim1.dim1_id     join dim2 on t_fact.dim2_id= dim2.dim2_id order by    t_fact.col1 limit 1000;

+------+-------------+--------+--------+-----------------+---------+---------+---------------------+------+-------+

| id   | select_type | table  | type   | possible_keys   | key     | key_len | ref                 | rows | Extra |

+------+-------------+--------+--------+-----------------+---------+---------+---------------------+------+-------+

|    1 | SIMPLE      | t_fact | index  | dim1_id,dim2_id | col1    | 4       | NULL                | 2004 |       |

|    1 | SIMPLE      | dim2   | eq_ref | PRIMARY         | PRIMARY | 4       | test.t_fact.dim2_id | 1    |       |

|    1 | SIMPLE      | dim1   | eq_ref | PRIMARY         | PRIMARY | 4       | test.t_fact.dim1_id | 1    |       |

+------+-------------+--------+--------+-----------------+---------+---------+---------------------+------+-------+

3 rows in set (0.001 sec)

The execution time for the select is 0.29s

Varun Gupta (Inactive) added a comment - 2019-11-14 16:43

This has been implemented along with MDEV-8306, so it is in review

Varun Gupta (Inactive) added a comment - 2019-11-14 16:43 This has been implemented along with MDEV-8306 , so it is in review

Rick James added a comment - 2022-08-31 04:41

It's a tough decision. You have given an example where focusing on the ORDER BY is a winner.

But, if there are a million rows in the table you are sorting on, but less than the 1000 pass muster when WHERE is applied, you have scanned 1000000 rows in order to avoid a sort of less than 1000.

No, I don't see a 'safe' way to discover which optimization to use when.

With "index hints" the programmer can choose. So, maybe this is a "won't fix".

Rick James added a comment - 2022-08-31 04:41 It's a tough decision. You have given an example where focusing on the ORDER BY is a winner. But, if there are a million rows in the table you are sorting on, but less than the 1000 pass muster when WHERE is applied, you have scanned 1000000 rows in order to avoid a sort of less than 1000. No, I don't see a 'safe' way to discover which optimization to use when. With "index hints" the programmer can choose. So, maybe this is a "won't fix".

People

Assignee:: Sergei Petrunia

Reporter:: Varun Gupta (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 2019-06-25 06:14

Updated:: 2025-04-07 13:26

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server