Details
-
Task
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Won't Fix
-
None
-
None
Description
Currently:
- range optimizer produces quick_condition_rows
- they are used by best_access_path(), but only if it considers
full table scan acccess (see matching_candidates_in_table() call)
We need to make it so that quick_condition_rows is used for any access
method.
=== Details ===
Currently, quick_condition_rows is only used for full table scans. This makes the code simple.
When we start using quick_condition_rows with arbitrary acess method, we will be faced with this problem:
1. quick_condition_rows estimate was made from a certain part of WHERE clause (the part of WHERE that was used to construct a range)
2. access method that we're using also was made from a certain part of WHERE clause.
question: are these two correlated? When does the estimate of the fanout of an access method take into account quick_condition_rows?
We will adopt this answer:
- Don't take quick_condition_rows into account if we're using a quick select for table access,
or if we're using ref(const) (in this case we would construct a quick select and take its estimates into account). - Do adopt it otherwise.
THere are more complex cases like
t.keypart1=const AND t.keypart2 BETWEEN 'foo' AND 'bar' AND t.keypart2=othertbl.col
Here:
quick_condition_rows is made from "t.keypart1=const AND t.keypart2 BETWEEN ..."
ref access will be on "t.keypart1=const AND t.keypart2=othertbl.col"
when part of WHERE will be appled,
- t.keypart1=const – will not filter anything out (because it's guaranteed to be true
by ref access) - t.keypart2 BETWEEN ... – will filter something out.
Available estimates:
E1. rec_per_key(t.keypart1=...)
E2. rec_per_key("t.keypart1=... AND t.keypart2=...")
E3. records_in_range("t.keypart1=const AND t.keypart2 BETWEEN ...")
Note, that the estimates will not necessary be in sync. For instance, we saw real-world cases where E3 > E1, even if it is clear that E3 < E1 for any dataset.
How to compute the fanout of the WHERE predicates not covered by an access method:
- if an access method covers all WHERE predicates, then the WHERE clause doesn't have any selectivity on its own.
- if WHERE's selectivity hasn't been "absorbed" by the employed access method, then it has selecivity on its own.
We assume that:
- ref(non-const) access's selecitivity is orthogonal to quick_condition_rows selecitivity.
- full table scan has no selectivity => WHERE condition will apply quick_condition_rows selecitivity.
- That's it, for now. More complex analysis is possible but will be done outside of this task.
For any type of access other than the above two, assume quick_condition_rows doesn't have any additional selectivity - selectivity(WHERE)=1.
== Implementation ==
- All code should be in best_access_path().
- QUESTIONABLE:: The result should be recorded in POSITION::records_read.
(Yes, there will be cases where POSITION::records_read < 1)
APPARENT: POSITION::prefix_record_count is not what is needed.
- The result should be visible in EXPLAIN EXTENDED, in "filtered" column.
== Implementation step #1 ==
- Take the problematic query from mdev-402
- Remove the scalar subquery predicate.
- Convert both IN-subqueries into a JOIN ( I don't care if it's not equivalent)
- Use STRAIGHT_JOIN hints, if necessary to produce the join order of
nation,supplier,partsupp,part. - Make it so that for table part, in EXPLAIN:
= access is eq_ref
= rows==1 (it's eq_ref)
= "filtered shows" 0.0118 * 100 = 1.18 (percent).
This may require introducing extra members in POSITION (investigate).
Attachments
Issue Links
- is part of
-
MDEV-402 Take into account the cost of subqueries during optimization
-
- Closed
-
Traced in GDB the produced selectivity estimate for the following query (as suggested by Sergey):
explain extended
select sql_calc_found_rows
s_name, s_address
from nation straight_join supplier straight_join part straight_join partsupp
where s_suppkey = ps_suppkey
and ps_partkey = p_partkey
and p_name like 'forest%'
and s_nationkey = n_nationkey
and n_name = 'CANADA'
order by s_name
limit 10;
There is an index on table nation(n_name).
The plan for DBT3 scale 10 is:
-----
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------Notice the "0" % filtered rows for part. This is due to JOIN_TAB::records_read being an integer instead of double.
The actual estimate that can be seen in GDB is: 0.0242. If one stops in the end of best_access_path, the estimates are:
(gdb) p records
$20 = 1
(gdb) p s->table->quick_condition_rows
$21 = 48334
(gdb) p s->table->file->stats.records
$22 = 1996869
The value of 's->table->quick_condition_rows' is what records_in_range() gives us for the predicate "p_name like 'forest%'" on index i_p_name.
The actual value is 24037.
The numbers for scale 1 are:
(gdb) p s->table->quick_condition_rows
$73 = 2387
(gdb) p s->table->alias->Ptr
$74 = 0x7f73120 "part"
(gdb) p s->table->file->stats.records
$75 = 200252
(gdb) p pos->where_part_selectivity
$76 = 0.011919980824161557
This is 1.19% selectivity, which was the expected result.
Various problems in EXPLAIN prevent it from showing the correct result (not only rounding).