[MDEV-21626] Optimizer misses the details about the picked join order Created: 2020-02-01  Updated: 2020-06-05  Resolved: 2020-06-05

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.5
Fix Version/s: 10.5.4, 10.4.14

Type: Bug Priority: Minor
Reporter: Sergei Petrunia Assignee: Varun Gupta (Inactive)
Resolution: Fixed Votes: 0
Labels: optimizer_trace


 Description   

Take the DBT-3 dataset, make sure you have these indexes:

CREATE INDEX i_o_orderdate ON orders (o_orderdate);
CREATE INDEX i_o_custkey ON orders (o_custkey);

Run these two queries. The queries are the same, except for the ORDER BY list.
However this difference causes different join orders to be picked:

MariaDB [dbt3]> explain select * from orders,customer where o_custkey=c_custkey order by o_orderdate limit 10;
+------+-------------+----------+--------+---------------+---------------+---------+-----------------------+------+-------------+
| id   | select_type | table    | type   | possible_keys | key           | key_len | ref                   | rows | Extra       |
+------+-------------+----------+--------+---------------+---------------+---------+-----------------------+------+-------------+
|    1 | SIMPLE      | orders   | index  | i_o_custkey   | i_o_orderdate | 4       | NULL                  | 10   | Using where |
|    1 | SIMPLE      | customer | eq_ref | PRIMARY       | PRIMARY       | 4       | dbt3.orders.o_custkey | 1    |             |
+------+-------------+----------+--------+---------------+---------------+---------+-----------------------+------+-------------+

MariaDB [dbt3]> explain select * from orders,customer where o_custkey=c_custkey order by c_acctbal limit 10; 
+------+-------------+----------+-------+---------------+-------------+---------+-------------------------+------+-------+
| id   | select_type | table    | type  | possible_keys | key         | key_len | ref                     | rows | Extra |
+------+-------------+----------+-------+---------------+-------------+---------+-------------------------+------+-------+
|    1 | SIMPLE      | customer | index | PRIMARY       | c_acctbal   | 9       | NULL                    | 1    |       |
|    1 | SIMPLE      | orders   | ref   | i_o_custkey   | i_o_custkey | 5       | dbt3.customer.c_custkey | 15   |       |
+------+-------------+----------+-------+---------------+-------------+---------+-------------------------+------+-------+

However, looking at the optimizer trace, one can see that the join optimization parts are identical:

 MariaDB [dbt3]> select * from information_schema.optimizer_trace\G
 *************************** 1. row ***************************
-                            QUERY: explain select * from orders,customer where o_custkey=c_custkey order by o_orderdate limit 10
+                            QUERY: explain select * from orders,customer where o_custkey=c_custkey order by c_acctbal limit 10
                             TRACE: {
   "steps": [
     {
@@ -18,7 +17,7 @@
         "select_id": 1,
         "steps": [
           {
-            "expanded_query": ...
+            "expanded_query": ...
           } 
         ] 
       } 
@@ -177,15 +176,15 @@
           },
           {
             "attaching_conditions_to_tables": {
-              "original_condition": "customer.c_custkey = orders.o_custkey",
+              "original_condition": "orders.o_custkey = customer.c_custkey",
               "attached_conditions_computation": [],
               "attached_conditions_summary": [
                 {
-                  "table": "orders",
-                  "attached": "orders.o_custkey is not null"
+                  "table": "customer",
+                  "attached": null
                 },

The first difference is in "attaching_conditions_to_tables" ?



 Comments   
Comment by Sergei Petrunia [ 2020-02-01 ]

The test was done on the mdev8306-orig branch... were there any improvements to the trace since then?

Comment by Varun Gupta (Inactive) [ 2020-05-31 ]

The issue here is the cost of adding a temp table is done when the first table doesn't resolve the ORDER BY clause

      { 
        if (join->sort_by_table &&
            join->sort_by_table !=
            join->positions[join->const_tables].table->table)
          /*
             We may have to make a temp table, note that this is only a
             heuristic since we cannot know for sure at this point.
             Hence it may be wrong.
          */
          current_read_time= COST_ADD(current_read_time, current_record_count);

So I guess it would be ok to add this cost to the trace.

Comment by Varun Gupta (Inactive) [ 2020-06-03 ]

Discussed this with psergey, it would be fine to add cost of sorting to 10.4 (the one seen in the code snippet) and we can improve this in 10.6 (with MDEV-8306)

Comment by Varun Gupta (Inactive) [ 2020-06-04 ]

Patch
https://github.com/MariaDB/server/commit/6404645980db51fdc1e5dae2ac94eca57804284b

Comment by Sergei Petrunia [ 2020-06-04 ]

Ok to push

Generated at Thu Feb 08 09:08:34 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.