 # Sufficient conditions for accurate calculation of join cardinality

XMLWordPrintable

#### Details

• Type: Task
• Status: In Progress
• Priority: Major
• Resolution: Unresolved
• Fix Version/s:
• Component/s:
• Labels:
None

#### Description

Let Q be a join query over tables T1,...,Tn with the where condition C.

In the following cases the calculation of the cardinality of the result set of Q is considered as accurate:

a) C is a multiple equality such that for each column of it the number of distinct values is known

Example:

 ```SELECT * from t1,t2 WHERE t1.a=t2.a; ```

We just need to know NDV, which can be found in 2 ways

• there is index on t1.a and t2.a
• EITS statistics is available for t1.a and t2.a

b) C is a conjunction of formulas from a)

c) C is either inequality predicate or [NOT] BETWEEN predicate or [NOT] IN predicate or [NOT] LIKE predicate or IS [NOT] NULL predicate who has

• only one non-constant argument and this argument a reference to a column that is
used either as the first component of an index with known statistical data or a column
for which a histogram is provided
(for [NOT] IN / [NOT] LIKE predicate this column must be the first argument)
• all other arguments must be inexpensive constant items that are converted to simple
constant (cached) (by the time we check the conditions)
(in the case of [NOT] LIKE predicate the constant argument must be of a special form)

Examples here would be:

 ```SELECT * from t1,t2 WHERE t1.a op const ```

Where op can be >/>=/</<=/=/<>

Also the other cases are with
[NOT] IN predicate, [NOT] NULL predicate, LIKE predicate

d) C is a conjunction formulas of a) and AND-OR formulas over predicates from b) such that
each of these formulas depends (directly or indirectly through equalities inferred from the
conjuncted multiple equalities) only on one column.

 ```SELECT * from t1,t2 WHERE (t1.a =1 or t1.a=4); // directly depends on one column ```

 ```SELECT * from t1,t2 WHERE t1.a=t2.b and (t1.a > 5 or t2.b < 1); ```

indirectly depeneds on one column, we use muliple equality ME(t1.a,t2.b).
So after substitution the query would look like

 ```SELECT * from t1,t2 WHERE t1.a=t2.b and (t1.a > 5 or t1.a < 1); ```

and the entire OR conjunct (t1.a > 5 or t1.a < 1) can be resolved by one column

The above sufficient conditions can easily be checked for any WHERE clause just by traversing the corresponding item.

It's clear that if C guarantees accurateness of calculation of the cardinality of the result set for Q then the cardinality of any partial join over tables Ti1,...,Tik with conditions pushed from C to these tables can be calculated accurately.

#### People

Assignee: Sergei Petrunia
Reporter: Igor Babaev