Details

    • Technical task
    • Status: Stalled (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.5
    • 12.1
    • Optimizer
    • None

    Description

      In the function best_access_path, where we try to find the best way to read a tables (index[range, ref, scan] and table scan)

      Current code but does not consider that a particular access method considered can achieve ordering.

      To consider this we would need to check for all the indexes of the table that is ordering achieved by them or not. We do something similar in test_for_cheaper_ordering, where we go through all the indexes of the first non-const table and check if we can achieve the ordering. If the ordering can be satisfied then we consider range or index scan on the given index.

      I think we can also think of adding this to best_access_path. With such cases we would not need to add of costing and we can shortcut for limit from the point onwards.
      An example of this is the test case of mdev-8306

      The query is

      select * from
        t_fact
          join dim1 on t_fact.dim1_id= dim1.dim1_id
          join dim2 on t_fact.dim2_id= dim2.dim2_id
      order by
         t_fact.col1
      limit 1000;

      BAD PLAN

       
      +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
      | id   | select_type | table  | type   | possible_keys   | key     | key_len | ref               | rows | Extra                           |
      +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+
      |    1 | SIMPLE      | dim1   | ALL    | PRIMARY         | NULL    | NULL    | NULL              |  500 | Using temporary; Using filesort |
      |    1 | SIMPLE      | t_fact | ref    | dim1_id,dim2_id | dim1_id | 4       | j3.dim1.dim1_id   |    1 |                                 |
      |    1 | SIMPLE      | dim2   | eq_ref | PRIMARY         | PRIMARY | 4       | j3.t_fact.dim2_id |    1 |                                 |
      +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+---------------------------------+

      GOOD PLAN

      +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
      | id   | select_type | table  | type   | possible_keys   | key     | key_len | ref               | rows | Extra |
      +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+
      |    1 | SIMPLE      | t_fact | index  | dim1_id,dim2_id | col1    | 4       | NULL              | 1000 |       |
      |    1 | SIMPLE      | dim1   | eq_ref | PRIMARY         | PRIMARY | 4       | j3.t_fact.dim1_id |    1 |       |
      |    1 | SIMPLE      | dim2   | eq_ref | PRIMARY         | PRIMARY | 4       | j3.t_fact.dim2_id |    1 |       |
      +------+-------------+--------+--------+-----------------+---------+---------+-------------------+------+-------+

      The BAD PLAN takes a very long time to execute , it does the full join and then applies the limit
      While in the GOOD PLAN(we apply STRAIGHT JOIN to get the best order), we pick the index that achieves the ordering and then shortcut it for limit

      Attachments

        Activity

          varun Varun Gupta (Inactive) added a comment - - edited

          How to do it:

          • Get cardinality of the join (this is done be running the join optimizer again)
          • Get the fanout, which should be cardinality/records_of_first_table
          • Adjust the limit to take the fanout into consideration
          • Adjust the limit to take into account the condition selectivity for the first table
          • Calculate cost to read the adjusted number of records
          varun Varun Gupta (Inactive) added a comment - - edited How to do it: Get cardinality of the join (this is done be running the join optimizer again) Get the fanout, which should be cardinality/records_of_first_table Adjust the limit to take the fanout into consideration Adjust the limit to take into account the condition selectivity for the first table Calculate cost to read the adjusted number of records
          varun Varun Gupta (Inactive) added a comment - - edited

          Here is an example of how the index that was selected shows in the best_access_plan , pasting from the trace how this looks

          Query is : select * from t1, t2 where t1.a=1 and t1.c=t2.b order by t1.b limit 2;
          The tables are like
          CREATE TABLE t1 (a int, b int, c int, KEY(a), key(b);
          create table t2 (a int, b int, key(b));

          So we have for table t1 access methods ref on index(a) , table scan and index_scan on index(b) that would achieve the ordering.

          "best_access_path": 
                      {
                          "considered_access_paths": 
                          [
                              
                              {
                                  "access_type": "ref",
                                  "index": "a",
                                  "used_range_estimates": true,
                                  "rows": 20,
                                  "cost": 5.7788,
                                  "chosen": true
                              },
                              
                              {
                                  "access_type": "scan",
                                  "resulting_rows": 20,
                                  "cost": 2.0635,
                                  "chosen": true
                              },
                              
                              {
                                  "considered_indexes": 
                                  [
                                      
                                      {
                                          "index": "b",
                                          "updated_limit": 1,
                                          "index_scan_time": 1
                                      }
                                  ],
                                  "best_index": 1,
                                  "best_cost": 1
                              }
                          ]
                      },
          
          

          So best_index here is 1 that is index b of table t1, that is selected for index scan.

          varun Varun Gupta (Inactive) added a comment - - edited Here is an example of how the index that was selected shows in the best_access_plan , pasting from the trace how this looks Query is : select * from t1, t2 where t1.a=1 and t1.c=t2.b order by t1.b limit 2; The tables are like CREATE TABLE t1 (a int, b int, c int, KEY(a), key(b); create table t2 (a int, b int, key(b)); So we have for table t1 access methods ref on index(a) , table scan and index_scan on index(b) that would achieve the ordering. "best_access_path": { "considered_access_paths": [ { "access_type": "ref", "index": "a", "used_range_estimates": true, "rows": 20, "cost": 5.7788, "chosen": true }, { "access_type": "scan", "resulting_rows": 20, "cost": 2.0635, "chosen": true }, { "considered_indexes": [ { "index": "b", "updated_limit": 1, "index_scan_time": 1 } ], "best_index": 1, "best_cost": 1 } ] }, So best_index here is 1 that is index b of table t1, that is selected for index scan.

          Now the another task would be to make sure that we set for this case to use index scan as the access method for the JOIN_TAB
          Currently even if it pick index scan in best access path it does not use the index, instead does a full table scan

          MariaDB [test]> explain select * from t1, t2 where t1.a=1 and t1.c=t2.b order by t1.b limit 2;
          +------+-------------+-------+------+---------------+------+---------+-----------+------+-----------------------------+
          | id   | select_type | table | type | possible_keys | key  | key_len | ref       | rows | Extra                       |
          +------+-------------+-------+------+---------------+------+---------+-----------+------+-----------------------------+
          |    1 | SIMPLE      | t1    | ALL  | a             | NULL | NULL    | NULL      | 20   | Using where; Using filesort |
          |    1 | SIMPLE      | t2    | ref  | b             | b    | 5       | test.t1.c | 2    |                             |
          +------+-------------+-------+------+---------------+------+---------+-----------+------+-----------------------------+
          2 rows in set (0.001 sec)
          

          varun Varun Gupta (Inactive) added a comment - Now the another task would be to make sure that we set for this case to use index scan as the access method for the JOIN_TAB Currently even if it pick index scan in best access path it does not use the index, instead does a full table scan MariaDB [test]> explain select * from t1, t2 where t1.a=1 and t1.c=t2.b order by t1.b limit 2; +------+-------------+-------+------+---------------+------+---------+-----------+------+-----------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+------+---------+-----------+------+-----------------------------+ | 1 | SIMPLE | t1 | ALL | a | NULL | NULL | NULL | 20 | Using where; Using filesort | | 1 | SIMPLE | t2 | ref | b | b | 5 | test.t1.c | 2 | | +------+-------------+-------+------+---------------+------+---------+-----------+------+-----------------------------+ 2 rows in set (0.001 sec)
          varun Varun Gupta (Inactive) added a comment - - edited

          Finally got the desired plan in the description by the adding cost based approach for an index to achieve the ordering

          MariaDB [test]> explain select * from    t_fact      join dim1 on t_fact.dim1_id= dim1.dim1_id     join dim2 on t_fact.dim2_id= dim2.dim2_id order by    t_fact.col1 limit 1000;
          +------+-------------+--------+--------+-----------------+---------+---------+---------------------+------+-------+
          | id   | select_type | table  | type   | possible_keys   | key     | key_len | ref                 | rows | Extra |
          +------+-------------+--------+--------+-----------------+---------+---------+---------------------+------+-------+
          |    1 | SIMPLE      | t_fact | index  | dim1_id,dim2_id | col1    | 4       | NULL                | 2004 |       |
          |    1 | SIMPLE      | dim2   | eq_ref | PRIMARY         | PRIMARY | 4       | test.t_fact.dim2_id | 1    |       |
          |    1 | SIMPLE      | dim1   | eq_ref | PRIMARY         | PRIMARY | 4       | test.t_fact.dim1_id | 1    |       |
          +------+-------------+--------+--------+-----------------+---------+---------+---------------------+------+-------+
          3 rows in set (0.001 sec)
          
          

          The execution time for the select is 0.29s

          varun Varun Gupta (Inactive) added a comment - - edited Finally got the desired plan in the description by the adding cost based approach for an index to achieve the ordering MariaDB [test]> explain select * from t_fact join dim1 on t_fact.dim1_id= dim1.dim1_id join dim2 on t_fact.dim2_id= dim2.dim2_id order by t_fact.col1 limit 1000; +------+-------------+--------+--------+-----------------+---------+---------+---------------------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+--------+--------+-----------------+---------+---------+---------------------+------+-------+ | 1 | SIMPLE | t_fact | index | dim1_id,dim2_id | col1 | 4 | NULL | 2004 | | | 1 | SIMPLE | dim2 | eq_ref | PRIMARY | PRIMARY | 4 | test.t_fact.dim2_id | 1 | | | 1 | SIMPLE | dim1 | eq_ref | PRIMARY | PRIMARY | 4 | test.t_fact.dim1_id | 1 | | +------+-------------+--------+--------+-----------------+---------+---------+---------------------+------+-------+ 3 rows in set (0.001 sec) The execution time for the select is 0.29s

          This has been implemented along with MDEV-8306, so it is in review

          varun Varun Gupta (Inactive) added a comment - This has been implemented along with MDEV-8306 , so it is in review
          rjasdfiii Rick James added a comment -

          It's a tough decision. You have given an example where focusing on the ORDER BY is a winner.

          But, if there are a million rows in the table you are sorting on, but less than the 1000 pass muster when WHERE is applied, you have scanned 1000000 rows in order to avoid a sort of less than 1000.

          No, I don't see a 'safe' way to discover which optimization to use when.

          With "index hints" the programmer can choose. So, maybe this is a "won't fix".

          rjasdfiii Rick James added a comment - It's a tough decision. You have given an example where focusing on the ORDER BY is a winner. But, if there are a million rows in the table you are sorting on, but less than the 1000 pass muster when WHERE is applied, you have scanned 1000000 rows in order to avoid a sort of less than 1000. No, I don't see a 'safe' way to discover which optimization to use when. With "index hints" the programmer can choose. So, maybe this is a "won't fix".

          People

            psergei Sergei Petrunia
            varun Varun Gupta (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.