Details
-
New Feature
-
Status: Stalled (View Workflow)
-
Major
-
Resolution: Unresolved
-
None
Description
Let Q be a join query over tables T1,...,Tn with the where condition C.
In the following cases the calculation of the cardinality of the result set of Q is considered as accurate:
a) C is a multiple equality such that for each column of it the number of distinct values is known
Example:
SELECT * from t1,t2 WHERE t1.a=t2.a; |
We just need to know NDV, which can be found in 2 ways
- there is index on t1.a and t2.a
- EITS statistics is available for t1.a and t2.a
b) C is a conjunction of formulas from a)
c) C is either inequality predicate or [NOT] BETWEEN predicate or [NOT] IN predicate or [NOT] LIKE predicate or IS [NOT] NULL predicate who has
- only one non-constant argument and this argument a reference to a column that is
used either as the first component of an index with known statistical data or a column
for which a histogram is provided
(for [NOT] IN / [NOT] LIKE predicate this column must be the first argument) - all other arguments must be inexpensive constant items that are converted to simple
constant (cached) (by the time we check the conditions)
(in the case of [NOT] LIKE predicate the constant argument must be of a special form)
Examples here would be:
SELECT * from t1,t2 WHERE t1.a op const |
Where op can be >/>=/</<=/=/<>
Also the other cases are with
[NOT] IN predicate, [NOT] NULL predicate, LIKE predicate
d) C is a conjunction formulas of a) and AND-OR formulas over predicates from b) such that
each of these formulas depends (directly or indirectly through equalities inferred from the
conjuncted multiple equalities) only on one column.
SELECT * from t1,t2 WHERE (t1.a =1 or t1.a=4); // directly depends on one column |
SELECT * from t1,t2 WHERE t1.a=t2.b and (t1.a > 5 or t2.b < 1); |
indirectly depeneds on one column, we use muliple equality ME(t1.a,t2.b).
So after substitution the query would look like
SELECT * from t1,t2 WHERE t1.a=t2.b and (t1.a > 5 or t1.a < 1); |
and the entire OR conjunct (t1.a > 5 or t1.a < 1) can be resolved by one column
The above sufficient conditions can easily be checked for any WHERE clause just by traversing the corresponding item.
It's clear that if C guarantees accurateness of calculation of the cardinality of the result set for Q then the cardinality of any partial join over tables Ti1,...,Tik with conditions pushed from C to these tables can be calculated accurately.
Attachments
Issue Links
- is part of
-
MDEV-8306 Complete cost-based optimization for ORDER BY with LIMIT
- Stalled
- relates to
-
MDEV-31327 Histogram range selectivity estimates should be merged, not added.
- Stalled