[MDEV-446] Reuse quick_condition_rows in fanout computations Created: 2012-08-10  Updated: 2013-03-26  Resolved: 2013-03-26

Status: Closed
Project: MariaDB Server
Component/s: None
Fix Version/s: 10.0.2

Type: Task Priority: Major
Reporter: Sergei Petrunia Assignee: Timour Katchaounov (Inactive)
Resolution: Won't Fix Votes: 0
Labels: None

Issue Links:
PartOf
is part of MDEV-402 Take into account the cost of subquer... Closed
Sub-Tasks:
Key
Summary
Type
Status
Assignee
MDEV-3858 Change JOIN_TAB::records_read from ha... Technical task Closed Timour Katchaounov  

 Description   

Currently:

  • range optimizer produces quick_condition_rows
  • they are used by best_access_path(), but only if it considers
    full table scan acccess (see matching_candidates_in_table() call)

We need to make it so that quick_condition_rows is used for any access
method.

=== Details ===
Currently, quick_condition_rows is only used for full table scans. This makes the code simple.
When we start using quick_condition_rows with arbitrary acess method, we will be faced with this problem:

1. quick_condition_rows estimate was made from a certain part of WHERE clause (the part of WHERE that was used to construct a range)
2. access method that we're using also was made from a certain part of WHERE clause.

question: are these two correlated? When does the estimate of the fanout of an access method take into account quick_condition_rows?

We will adopt this answer:

  • Don't take quick_condition_rows into account if we're using a quick select for table access,
    or if we're using ref(const) (in this case we would construct a quick select and take its estimates into account).
  • Do adopt it otherwise.

THere are more complex cases like
t.keypart1=const AND t.keypart2 BETWEEN 'foo' AND 'bar' AND t.keypart2=othertbl.col

Here:
quick_condition_rows is made from "t.keypart1=const AND t.keypart2 BETWEEN ..."
ref access will be on "t.keypart1=const AND t.keypart2=othertbl.col"

when part of WHERE will be appled,

  • t.keypart1=const – will not filter anything out (because it's guaranteed to be true
    by ref access)
  • t.keypart2 BETWEEN ... – will filter something out.

Available estimates:
E1. rec_per_key(t.keypart1=...)
E2. rec_per_key("t.keypart1=... AND t.keypart2=...")
E3. records_in_range("t.keypart1=const AND t.keypart2 BETWEEN ...")

Note, that the estimates will not necessary be in sync. For instance, we saw real-world cases where E3 > E1, even if it is clear that E3 < E1 for any dataset.

How to compute the fanout of the WHERE predicates not covered by an access method:

  • if an access method covers all WHERE predicates, then the WHERE clause doesn't have any selectivity on its own.
  • if WHERE's selectivity hasn't been "absorbed" by the employed access method, then it has selecivity on its own.

We assume that:

  • ref(non-const) access's selecitivity is orthogonal to quick_condition_rows selecitivity.
  • full table scan has no selectivity => WHERE condition will apply quick_condition_rows selecitivity.
  • That's it, for now. More complex analysis is possible but will be done outside of this task.
    For any type of access other than the above two, assume quick_condition_rows doesn't have any additional selectivity - selectivity(WHERE)=1.

== Implementation ==

  • All code should be in best_access_path().
  • QUESTIONABLE:: The result should be recorded in POSITION::records_read.
    (Yes, there will be cases where POSITION::records_read < 1)

APPARENT: POSITION::prefix_record_count is not what is needed.

  • The result should be visible in EXPLAIN EXTENDED, in "filtered" column.

== Implementation step #1 ==

  • Take the problematic query from mdev-402
  • Remove the scalar subquery predicate.
  • Convert both IN-subqueries into a JOIN ( I don't care if it's not equivalent)
  • Use STRAIGHT_JOIN hints, if necessary to produce the join order of
    nation,supplier,partsupp,part.
  • Make it so that for table part, in EXPLAIN:
    = access is eq_ref
    = rows==1 (it's eq_ref)
    = "filtered shows" 0.0118 * 100 = 1.18 (percent).
    This may require introducing extra members in POSITION (investigate).


 Comments   
Comment by Timour Katchaounov (Inactive) [ 2012-08-16 ]

Traced in GDB the produced selectivity estimate for the following query (as suggested by Sergey):

explain extended
select sql_calc_found_rows
s_name, s_address
from nation straight_join supplier straight_join part straight_join partsupp
where s_suppkey = ps_suppkey
and ps_partkey = p_partkey
and p_name like 'forest%'
and s_nationkey = n_nationkey
and n_name = 'CANADA'
order by s_name
limit 10;

There is an index on table nation(n_name).

The plan for DBT3 scale 10 is:

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

id select_type table type possible_keys key key_len ref rows filtered Extra

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

1 SIMPLE nation ref PRIMARY,i_n_name i_n_name 26 const 1 100.00 Using where; Using index; Using temporary; Using filesort
1 SIMPLE supplier ref PRIMARY,i_s_nationkey i_s_nationkey 5 dbt3.nation.n_nationkey 1070 100.00  
1 SIMPLE partsupp ref PRIMARY,i_ps_partkey,i_ps_suppkey i_ps_suppkey 4 dbt3.supplier.s_suppkey 39 100.00 Using index
1 SIMPLE part eq_ref PRIMARY,i_p_name PRIMARY 4 dbt3.partsupp.ps_partkey 0 0.00 Using where

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Notice the "0" % filtered rows for part. This is due to JOIN_TAB::records_read being an integer instead of double.
The actual estimate that can be seen in GDB is: 0.0242. If one stops in the end of best_access_path, the estimates are:

(gdb) p records
$20 = 1
(gdb) p s->table->quick_condition_rows
$21 = 48334
(gdb) p s->table->file->stats.records
$22 = 1996869

The value of 's->table->quick_condition_rows' is what records_in_range() gives us for the predicate "p_name like 'forest%'" on index i_p_name.
The actual value is 24037.

The numbers for scale 1 are:

(gdb) p s->table->quick_condition_rows
$73 = 2387
(gdb) p s->table->alias->Ptr
$74 = 0x7f73120 "part"
(gdb) p s->table->file->stats.records
$75 = 200252
(gdb) p pos->where_part_selectivity
$76 = 0.011919980824161557

This is 1.19% selectivity, which was the expected result.

Various problems in EXPLAIN prevent it from showing the correct result (not only rounding).

Comment by Timour Katchaounov (Inactive) [ 2013-03-26 ]

This task is no longer needed because MWL#253 provides a generic solution for predicate selectivity.

Generated at Thu Feb 08 06:28:48 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.