Details
-
Task
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Done
-
None
Description
This query uses an apparently sub-optimal query plan with rowid-filter:
explain
|
select count(*) from lineitem where l_orderkey=130 and l_shipdate='1992-07-01';
|
id select_type table type possible_keys key key_len ref rows Extra
|
1 SIMPLE lineitem ref|filter PRIMARY,i_l_shipdate,i_l_orderkey,i_l_orderkey_quantity i_l_orderkey|i_l_shipdate 4|8 const 5 (0%) Using where; Using rowid filter
|
Reasoning for why it is suboptimal:
Table keys:
CREATE TABLE lineitem ( |
...
|
PRIMARY KEY (l_orderkey,l_linenumber), |
KEY i_l_shipdate (l_shipDATE), |
KEY i_l_orderkey (l_orderkey), |
...
|
);
|
select count(*) from lineitem where l_orderkey=130 and l_shipdate='1992-07-01'; |
Note that due to extended keys, the indexes actually are:
KEY i_l_orderkey (l_orderkey, l_linenumber )
KEY i_l_shipdate (l_shipDATE, l_orderkey, l_linenumber)
i_l_shipdate is covering for the query.
Now, think of two plans:
Plan-A.
use an index-only range scan on i_l_shipdate, pass the rows to output
Plan-B.
Use an index-only scan on i_l_shipdate, build the rowid filter $FILTER
Use scan on i_l_orderkey with $FILTER.
The optimizer picks plan B, while plan A is obviously cheaper.
Relevant parts of the optimizer trace:
"analyzing_range_alternatives": {
|
"range_scan_alternatives": [
|
...
|
{
|
"index": "i_l_shipdate",
|
"ranges": [
|
"(1992-07-01,130) <= (l_shipDATE,l_orderkey) <= (1992-07-01,130)"
|
],
|
"rowid_ordered": true,
|
"using_mrr": false,
|
"index_only": true,
|
"rows": 1,
|
"cost": 0.3251,
|
"chosen": true
|
},
|
{
|
"index": "i_l_orderkey",
|
"ranges": ["(130) <= (l_orderkey) <= (130)"],
|
"rowid_ordered": true,
|
"using_mrr": false,
|
"index_only": false,
|
"rows": 5,
|
"cost": 6.1257,
|
"chosen": false,
|
"cause": "cost"
|
},
|
No issues so far.
"best_access_path": {
|
{
|
"access_type": "ref",
|
"index": "i_l_shipdate",
|
"used_range_estimates": true,
|
"rows": 1,
|
"cost": 1.0001,
|
"chosen": true
|
},
|
Here, the #rows estimate was reused from the range access, but the cost is
different. Denote this as "I_L_SHIPDATE-MISMATCH".
{
|
"access_type": "ref",
|
"index": "i_l_orderkey",
|
"used_range_estimates": true,
|
"rows": 5,
|
"cost": 0.1308,
|
"chosen": true
|
},
|
(Here the cost is also different from the one we got for range access, but here
the cost takes into account using the rowid filter (which reduces the cost)
(TODO: use of rowid filter should be shown in the trace!)