[MDEV-22360] Sufficient conditions for accurate calculation of join cardinality Created: 2020-04-24 Updated: 2023-12-12 |
|
| Status: | Stalled |
| Project: | MariaDB Server |
| Component/s: | Optimizer |
| Fix Version/s: | 11.5 |
| Type: | New Feature | Priority: | Major |
| Reporter: | Igor Babaev | Assignee: | Rex Johnston |
| Resolution: | Unresolved | Votes: | 1 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||
| Description |
|
Let Q be a join query over tables T1,...,Tn with the where condition C. In the following cases the calculation of the cardinality of the result set of Q is considered as accurate: a) C is a multiple equality such that for each column of it the number of distinct values is known Example:
We just need to know NDV, which can be found in 2 ways
b) C is a conjunction of formulas from a) c) C is either inequality predicate or [NOT] BETWEEN predicate or [NOT] IN predicate or [NOT] LIKE predicate or IS [NOT] NULL predicate who has
Examples here would be:
Where op can be >/>=/</<=/=/<> Also the other cases are with d) C is a conjunction formulas of a) and AND-OR formulas over predicates from b) such that
indirectly depeneds on one column, we use muliple equality ME(t1.a,t2.b).
and the entire OR conjunct (t1.a > 5 or t1.a < 1) can be resolved by one column The above sufficient conditions can easily be checked for any WHERE clause just by traversing the corresponding item. It's clear that if C guarantees accurateness of calculation of the cardinality of the result set for Q then the cardinality of any partial join over tables Ti1,...,Tik with conditions pushed from C to these tables can be calculated accurately. |
| Comments |
| Comment by Varun Gupta (Inactive) [ 2020-05-20 ] | ||||||||||||||||||||||||||||
|
Has started with implementing the part c) mentioned in the description, looks simple to implement The part that is going to be complex to implement is part (a), need more clarity as to how we are going to take selectivity for equality conditions(both parts are non-const) into consideration | ||||||||||||||||||||||||||||
| Comment by Varun Gupta (Inactive) [ 2020-05-20 ] | ||||||||||||||||||||||||||||
|
Found out that for NOT IN predicate the selectivity estimates are not correct Here is a simple dataset:
For the IN predicate we get accurate selectivity of 10%
But here with NOT IN predicate the selectivity is 100 % which is incorrect, it should be 90%. Also from the optimizer trace I see
Here selectivity_from_histogram > 1 which is again INCORRECT. | ||||||||||||||||||||||||||||
| Comment by Varun Gupta (Inactive) [ 2020-05-21 ] | ||||||||||||||||||||||||||||
|
Considering about part a) , the implementation is restricted to the case The distinct values for a column is known via Eg:
so for the equality condition t1.a=t2.a if EITS is not available then t1.a and t2.a should be the first component of an index from table t1 and t2 respectively so that we could get the accurate estimate. This approach would obviously disallow cases like
where lets say the plan is t1,t2 and t2 makes a ref access with index key1(a,b) | ||||||||||||||||||||||||||||
| Comment by Varun Gupta (Inactive) [ 2020-05-27 ] | ||||||||||||||||||||||||||||
|
The branch where the code is 10.6-mdev22360 | ||||||||||||||||||||||||||||
| Comment by Varun Gupta (Inactive) [ 2020-06-16 ] | ||||||||||||||||||||||||||||
|
The approach that we are using to check if the estimates are accurate for join cardinality: Lets take a generic example with OR-AND conjuncts WHERE clause is defined as: (COND1 AND COND2) OR (COND3 AND (COND4 OR COND5)) and each CONDx can be an OR-AND conjunct itself. So the algorithm to check if the selectivity estimates are accurate is Lets denote the WHERE CLAUSE as conds
So lets break this down into more details, the condition at the top level can be of 2 cases: Case 1: WHERE clause is an AND conjunct For an AND item at the top level, we need to walk over all the top level conjuncts and call walk individually on For this AND item we will have accurate selectivities. Eg: t1.a = t2.a AND ( (t1.a > 5 AND t2.a < 10) OR t1.a <= 0) Case 22a) OR item For an OR item at the top level, we need to make sure that all the columns inside the OR conjunct need to Eg: t1.a=t2.b and (t2.b > 5 or t1.a < 0); 2b) Single predicate at the top levelEg:
| ||||||||||||||||||||||||||||
| Comment by Varun Gupta (Inactive) [ 2020-06-16 ] | ||||||||||||||||||||||||||||
|
The latest patch is here |