
Type: Task

Status: In Progress (View Workflow)

Priority: Major

Resolution: Unresolved

Fix Version/s: 10.7

Component/s: Optimizer

Labels:
Let Q be a join query over tables T1,...,Tn with the where condition C.
In the following cases the calculation of the cardinality of the result set of Q is considered as accurate:
a) C is a multiple equality such that for each column of it the number of distinct values is known
Example:
SELECT * from t1,t2 WHERE t1.a=t2.a; 
We just need to know NDV, which can be found in 2 ways
 there is index on t1.a and t2.a
 EITS statistics is available for t1.a and t2.a
b) C is a conjunction of formulas from a)
c) C is either inequality predicate or [NOT] BETWEEN predicate or [NOT] IN predicate or [NOT] LIKE predicate or IS [NOT] NULL predicate who has
 only one nonconstant argument and this argument a reference to a column that is
used either as the first component of an index with known statistical data or a column
for which a histogram is provided
(for [NOT] IN / [NOT] LIKE predicate this column must be the first argument)  all other arguments must be inexpensive constant items that are converted to simple
constant (cached) (by the time we check the conditions)
(in the case of [NOT] LIKE predicate the constant argument must be of a special form)
Examples here would be:
SELECT * from t1,t2 WHERE t1.a op const 
Where op can be >/>=/</<=/=/<>
Also the other cases are with
[NOT] IN predicate, [NOT] NULL predicate, LIKE predicate
d) C is a conjunction formulas of a) and ANDOR formulas over predicates from b) such that
each of these formulas depends (directly or indirectly through equalities inferred from the
conjuncted multiple equalities) only on one column.
SELECT * from t1,t2 WHERE (t1.a =1 or t1.a=4); // directly depends on one column 
SELECT * from t1,t2 WHERE t1.a=t2.b and (t1.a > 5 or t2.b < 1); 
indirectly depeneds on one column, we use muliple equality ME(t1.a,t2.b).
So after substitution the query would look like
SELECT * from t1,t2 WHERE t1.a=t2.b and (t1.a > 5 or t1.a < 1); 
and the entire OR conjunct (t1.a > 5 or t1.a < 1) can be resolved by one column
The above sufficient conditions can easily be checked for any WHERE clause just by traversing the corresponding item.
It's clear that if C guarantees accurateness of calculation of the cardinality of the result set for Q then the cardinality of any partial join over tables Ti1,...,Tik with conditions pushed from C to these tables can be calculated accurately.
 is part of

MDEV8306 Complete costbased optimization for ORDER BY with LIMIT
 In Review