|
3. Average cost of fetching one record by ref access when using a filter
Let's consider a ref access to table T by secondary index I when filter F is used. Let c_fa(I) be the expected cost of fetching one record by a random key using index (or index prefix if I is a multi-component index) . Let k be the expected number of index entries for one key. Let consider. Then the cost of fetching records by one key lookup will be k*c_fa(I). When fetching these records we do one look-up into index I, read k index entries and perform k random accesses to read record data. One index look-up requires accessing a certain number of index blocks depending only on k and the length of the key entries. We have k*c_fa(I)=c_ioa(I)+k*c_ra(T) where c is the cost of reading index entries for one key access and c_ra is the cost of reading one random record from data file for table T. From the above we have c_fa(I)=c_ioa(I)/k + c_ra(T). Let's designate c_ioa(I)/k as c_tioa. Then we have c_tioa/c_fa(I)=1-c_ra(T)/c_fa(I). For the given index I and a random key the ratio c_ra(T)/c_fa(I) is a constant. Accordingly the ratio c_tioa/c_fa(I) is a constant (its own for each index I). Let's designate this ratio as c_ios(I).
The cost of performing a ref access by a random key for Index I without usage of any filter is k*c_fa. When filter F is employed for this access the corresponding cost will be
equal to
|
k*c_ios(I)*c_fa(I)+k*c_ch(F)+(k*s(F))*c_fa(I)*(1-c_ios)
|
Here k*c_ios(I)*c_fa(I) is the cost of index only key access by a random key, k*c_ch(F) is the cost of k look-ups into the filter, and (k*s(F)) is the expected number of the records to be fetched for this key access.
From the above we have the following expression for a(I,F) if we take into account that for each record rejected by filter F we don't pay the cost c_cond for checking the condition pushed into table T
a(I,F)=c_fa(I)-(c_ios(I)*c_fa(I)+c_ch(F)+s(F)*c_fa(I)*(1-c_ios)) + c_cond*(1-s(F))
|
Re-writing the above formula we get
a(I,F)=c_fa(I)*(1-c_ios(I)) - c_fa(I)(1-c_ios)*s(F)-c_ch(F)+c_cond*(1-s(F)) =>
|
a(I,F)= (c_fa(I)*(1-c_ios(I))+c_cond)*(1-s(F))-c_ch(F)
|
|
|
4. Pruning filters
The benefits if usage filter F with the given ref access by index I to join table T depends on the expected cardinality of the partial join PJi to which T and fanout of the ref access k. Let's designate ||PJi||*k as N.
As we saw above the gain can be calculated by the expression of the form a(I,F) * N - b(F) where a(I,F). This expression has to be evaluated many times while we are looking for the best execution plan with different value of N. And we have to do it for each filter to choose the filter that promises the best gain with the given ref access and
the given value N.That's why it's important to exclude from checking those filters that never provide the best gain for the given ref access.
For the given ref access that uses by index I l let's consider two filters F1 and F2 and the functions of gain for them:
g1(x)=a(I,F1)*x-b(F1) and g2(x)=a(I,F2)*x-b(b2) such that a(I,F1) != a(I,F2) and b(F1) != b(F2).
|
Let's find x0 where g1(x0)=g2(x0). If g1(x0) is negative and a(I,F1) > a(I,F2) then for any value of N filter F1 promises a better positive gain than filter F2. It's easy to infer that
x0=(b(F1)-b(F2))/(a(I,F1)-a(I,F2))
|
Thus for the given ref access and two possible filters F1,F2 for T such that potentially it makes sense to consider both of them if x0 > 0 and x0* a(I,F1) - b(F1) < 0 and a(I,F1) > a(I,F2) then F2 can be excluded from consideration when a(I,F1) > a(I,F2) because usage of F1 promises a better gain whenever the gain is positive.
It turns out that a filter F can be excluded from consideration even in the cases when there is no other filter F' such that (b(F')-b(F))/(a(I,F')-a(I,F))*a(I,F)-b(F) < 0, in other words when for any other filter F' the graph of the gain function g' for F' crosses the graph of the gain function g for F in some point above the axis X. (As any gain function is a linear function there can't be more than one such point).
Let's consider 3 filters F1,F2,F3 such as a(I,F1) < a(I,F2) < a(I,F3) such that the crossing points for the graphs of the corresponding gain functions g1, g2, g3 are situated above the axis X. Let's designate the X coordinates of these points as x12, x13, and x23. They can be calculated by the following formulas:
x12=(b2(F2)-b1(F2))/(a(I,F2)-a(I,F1)) -- X coordinate of the crossing g1(x) and g2(x)
|
x13=(b2(F3)-b1(F1))/(a(I,F3)-a(I,F1)) -- X coordinate of the crossing g1(x) and g3(x)
|
x23=(b2(F3)-b1(F2))/(a(I,F3)-a(I,F2)) -- X coordinate of the crossing g2(x) and g3(x)
|
It's easy to see that if x13<x12 then x23<x12 because a(I,F1), a(I,F2), a(I,F3) are all positive, a(I,F1)<a(I,F2)< a(I,F3).
Thus in the case when x13<x12:
if 0<x <x12 then g1(x)>g2(x) and g1(x)>g(x3)
|
if x>=x12 then g3(x)>g1(x) and g3(x)>g2(x)
|
It means that filter F2 can be excluded from consideration for the given ref access.
The analysis above shows that if after all possible exclusions of filters for the remaining set of filters
{F1,...,Fm}
the following holds if we designate the X coordinate of the crossing point of the gain functions gi and gj as x[i,j]
x[i-1,i] < x[i,i+1] for any i such that 0<i<m if a(I,F1) < ...< a(i,Fm)
|
and
Fj is the filter that promises the best gain when N is between x[j-1,j] and x[j,j+1] for each j such that 0<j<=m. Here we assume that x[0,1] is b1[F1]/a(I,F1) and x[m,m+1] is +inf.
|
|
5. Caveats of pruning filter.
When discussed pruning we assumed that when checking whether for a pair of filter F1,F2 one can be excluded from consideration when evaluating the cost of the given ref access together with the best filter the values of a(I,F1) and a(I,F1) are constants that do not depend on the join prefix against which the ref access is evaluated.
Unfortunately this is not the case in the current code of the optimizer. The fact is that in a general case the constant c_fa(I) used in the calculation of a(I,F) may depend on N. In a general case we have to use the expression
f_adj(N)*c_fa(I)*(1-c_ios(I)) - c_fa(I)(1-c_ios)*s(F)-c_ch(F)+c_cond*(1-s(F))
|
Let's consider the expression for x0 when we use the above expression for a(I,F1) and a(I,F)
x0=(b(F1)-b(F2))/((f_adj(N)*c_fa(I)*(1-c_ios(I)) - c_fa(I)(1-c_ios)*s(F1)-c_ch(F1)+c_cond*(1-s(F1)))-(f_adj(N)*c_fa(I)*(1-c_ios(I)) - (f_adj(N)*c_fa(I)(1-c_ios)*s(F2)-c_ch(F2)+c_cond*(1-s(F2))))
|
Let's assume that c_ch(F1)=c_ch(F2). (If blob filters are employed for F1 and F2 this equality is quite valid.) in this case we have
x0=(b(F1)-b(F2))/(f_adj(N)*c_fa(I)(1-c_ios(I))*(s(F1)-s(F2))+c_cond*(s(F2)-s(F1)))
|
|