Details
-
New Feature
-
Status: Stalled (View Workflow)
-
Critical
-
Resolution: Unresolved
-
None
Description
Let Q be a join query over tables T1,...,Tn with the where condition C.
In the following cases the calculation of the cardinality of the result set of Q is considered as accurate:
a) C is a multiple equality such that for each column of it the number of distinct values is known
Example:
SELECT * from t1,t2 WHERE t1.a=t2.a; |
We just need to know NDV, which can be found in 2 ways
- there is index on t1.a and t2.a
- EITS statistics is available for t1.a and t2.a
b) C is a conjunction of formulas from a)
c) C is either inequality predicate or [NOT] BETWEEN predicate or [NOT] IN predicate or [NOT] LIKE predicate or IS [NOT] NULL predicate who has
- only one non-constant argument and this argument a reference to a column that is
used either as the first component of an index with known statistical data or a column
for which a histogram is provided
(for [NOT] IN / [NOT] LIKE predicate this column must be the first argument) - all other arguments must be inexpensive constant items that are converted to simple
constant (cached) (by the time we check the conditions)
(in the case of [NOT] LIKE predicate the constant argument must be of a special form)
Examples here would be:
SELECT * from t1,t2 WHERE t1.a op const |
Where op can be >/>=/</<=/=/<>
Also the other cases are with
[NOT] IN predicate, [NOT] NULL predicate, LIKE predicate
d) C is a conjunction formulas of a) and AND-OR formulas over predicates from b) such that
each of these formulas depends (directly or indirectly through equalities inferred from the
conjuncted multiple equalities) only on one column.
SELECT * from t1,t2 WHERE (t1.a =1 or t1.a=4); // directly depends on one column |
SELECT * from t1,t2 WHERE t1.a=t2.b and (t1.a > 5 or t2.b < 1); |
indirectly depeneds on one column, we use muliple equality ME(t1.a,t2.b).
So after substitution the query would look like
SELECT * from t1,t2 WHERE t1.a=t2.b and (t1.a > 5 or t1.a < 1); |
and the entire OR conjunct (t1.a > 5 or t1.a < 1) can be resolved by one column
The above sufficient conditions can easily be checked for any WHERE clause just by traversing the corresponding item.
It's clear that if C guarantees accurateness of calculation of the cardinality of the result set for Q then the cardinality of any partial join over tables Ti1,...,Tik with conditions pushed from C to these tables can be calculated accurately.
Attachments
Issue Links
- is part of
-
MDEV-8306 Complete cost-based optimization for ORDER BY with LIMIT
-
- Stalled
-
- relates to
-
MDEV-31327 Histogram range selectivity estimates should be merged, not added.
-
- Stalled
-
Has started with implementing the part c) mentioned in the description, looks simple to implement
The part that is going to be complex to implement is part (a), need more clarity as to how we are going to take selectivity for equality conditions(both parts are non-const) into consideration