[MDEV-26955] Improve the cost model for usage of in-memory range rowid / primary key filters - Jira

Details

Type: Task
Status: Stalled (View Workflow)
Priority: Major
Resolution: Unresolved
Fix Version/s: None
Component/s: Optimizer
Labels:
None

Description

The current implementation of range rowid filters introduced in 10.4 (see ~~MDEV-16188~~) used a quite simplistic cost model. In particular the cost of index only access was not taken into account properly. This task intends to remove all defects of the current implementation due to improper estimation of the gain promising by usage of rowid range filter. The full new cost model for usage of range rowid filters will be presented in the comments below.

Attachments

Issue Links

relates to

MDEV-33875 ORDER BY DESC causes ROWID Filter optimization performance degradation

Closed

Activity

Ascending order - Click to sort in descending order

Igor Babaev added a comment - 2021-11-02 06:34 - edited

1. Introduction
A rowid / primary key filter is a container with rowids / primary keys collected for a subset of rows of a given join table T. Instead of rowid / primary keys themselves the container might contain the result of hashing of them with some hash function. The only requirement for such filter is to provide an answer whether a given rowid / primary key or the result of hashing for it was surely not been included into the container.
If the subset of the collected rows contains exactly all rows satisfying a range condition imposed by the executed join query Q on table T then the filter is called range rowid / primary key filter (for simplicity further we will call it
just rowid filter).
It may make sense to use a rowid filter in a couple with a ref access or with a range access that employs a non-clustered index. In this case a row is fetched from the table in two steps: look-up into the index chosen for the access and fetching the rowid and then fetching the row data by the fetched rowid. Is a rowid filter can be employed then before fetching the data row the rowid fetched from the index could be checked against the filter and if the check is negative the second step could be skipped as unnecessary. Moreover in this case the check of the conditions pushed into the accessed table also becomes unnecessary. So for fetching a row by the used index I on average we may pay less when using the filter F. Let designate our gain in cost here as a(I,F). If without usage of the filter we were going to fetch N rows then the total gain in accessing rows will be a(I,F) * N. Yet at the same time we have to pay for building the filter. Let's designate the cost of it as b(F). If a(I,F) * N - b(F) is greater than 0 than it makes sense to employ the filter F to join table T using the evaluated ref/range access by index I to the given partial join of tables. Otherwise usage of this filter F won't be beneficial. Usage of other filters still might be beneficial in the evaluated ref/range access though.

Igor Babaev added a comment - 2021-11-02 06:34 - edited 1. Introduction A rowid / primary key filter is a container with rowids / primary keys collected for a subset of rows of a given join table T. Instead of rowid / primary keys themselves the container might contain the result of hashing of them with some hash function. The only requirement for such filter is to provide an answer whether a given rowid / primary key or the result of hashing for it was surely not been included into the container. If the subset of the collected rows contains exactly all rows satisfying a range condition imposed by the executed join query Q on table T then the filter is called range rowid / primary key filter (for simplicity further we will call it just rowid filter). It may make sense to use a rowid filter in a couple with a ref access or with a range access that employs a non-clustered index. In this case a row is fetched from the table in two steps: look-up into the index chosen for the access and fetching the rowid and then fetching the row data by the fetched rowid. Is a rowid filter can be employed then before fetching the data row the rowid fetched from the index could be checked against the filter and if the check is negative the second step could be skipped as unnecessary. Moreover in this case the check of the conditions pushed into the accessed table also becomes unnecessary. So for fetching a row by the used index I on average we may pay less when using the filter F. Let designate our gain in cost here as a(I,F). If without usage of the filter we were going to fetch N rows then the total gain in accessing rows will be a(I,F) * N. Yet at the same time we have to pay for building the filter. Let's designate the cost of it as b(F). If a(I,F) * N - b(F) is greater than 0 than it makes sense to employ the filter F to join table T using the evaluated ref/range access by index I to the given partial join of tables. Otherwise usage of this filter F won't be beneficial. Usage of other filters still might be beneficial in the evaluated ref/range access though.

Igor Babaev added a comment - 2021-11-03 05:11 - edited

2. Cost of building filter.
If there is a range access by index I to table T than it's easy to build a rowid filter using the corresponding index only scan for the range R. The estimate of the number of rows in the range will provide us an upper bound for the number of elements in the filter ||F|| and the memory required for the filter M(F). Of course the size of this memory of depends on the type of the container used for the filter:
if the container is a sorted array of rowid (PK) then M(F)=||F||*sizeof(rowid/PK),
if the container is a hash table then M(F)=1.4*||F||*(sizeof(rowid/PK)+sizeof(hash)),
if we use a bloom filter then M(F)=10*||F||.
If M(F) exceeds the the maximum allowed size for a filter the corresponding range access is excluded from consideration. If we have a limit for the total size of the filters used in the query then obviously it makes sense to exclude the filter that promises the least gain and then repeat this process until the total size of all remaining filters becomes within the limit.
The ratio ||F||/||T|| is called selectivity of the filter and is designated as s(F). If a sorted array or a hash table is used for the filter then the probability of the negative answer for the test checking whether the rowid of some random row from T has been included into the filter is (1-s(F)). The probability of the positive answer for the same test is equal corresponding to s(F). If a bloom filter is employed than the probability for the negative answer is (1-s(F)-eps) where eps is the probability of the false-positive answer. If a bloom filter contains 10*||F|| bits and it uses at least 7 funetions for hashing then eps < 0.01*||F||. So it can be ignored in our calculations because our estimate of s(F) is calculated with much worse precision.
The cost of building a given filter is the sum of the cost of the index only scan for the range condition and the cost of of filling the search data structure used by the filter:

 b(F)=C(R)+s(F)*||T||*c_fill(F)

Here C(R) is the cost of index only scan that fetches rowids from the scanned index entries and c_fill is the cost of adding one element to the search data structure. It's easy to see that s(R)=s(F) => s(R) * ||T||=s(F) * ||T|| => b(F)=C(R)+||F||*c_fill.
If a sorted array is used by filter F as the search data structure then c_fill(F)=c_cmp*log(||F||)+c_wsa, where c_cmp is the cost of comparison of two rowids and c_wsa is the cost of adding one rowid to the used sorted array.
If a hash table is used by filter as the search data structure then c_fill(F)=c_h+c_wht, where c_h is the cost of calculation of hash function for one rowid and c_wht iss the cost of adding one rowid to the used hash table.
If filter F employs a bloom filter as the search data structure the c_fill(F)=k*c_h where k is the number of hash functions used to fill the bit array of the bloom filter and c_h is the cost of calculation of one such function.
The cost of index only scan over the range R impacts the total building cost b(F) of filter F. This cost mainly depends on the the number of index blocks rather than on ||F||. The former in its turn depends on the type of the range scan. The cost of an index only scan over one range with n1 index entries may be much bigger than the cost of several small ranges with n2 index entries in total even when n2<< n1.

Igor Babaev added a comment - 2021-11-03 05:11 - edited 2. Cost of building filter. If there is a range access by index I to table T than it's easy to build a rowid filter using the corresponding index only scan for the range R. The estimate of the number of rows in the range will provide us an upper bound for the number of elements in the filter ||F|| and the memory required for the filter M(F). Of course the size of this memory of depends on the type of the container used for the filter: if the container is a sorted array of rowid (PK) then M(F)=||F||*sizeof(rowid/PK), if the container is a hash table then M(F)=1.4*||F||*(sizeof(rowid/PK)+sizeof(hash)), if we use a bloom filter then M(F)=10*||F||. If M(F) exceeds the the maximum allowed size for a filter the corresponding range access is excluded from consideration. If we have a limit for the total size of the filters used in the query then obviously it makes sense to exclude the filter that promises the least gain and then repeat this process until the total size of all remaining filters becomes within the limit. The ratio ||F||/||T|| is called selectivity of the filter and is designated as s(F). If a sorted array or a hash table is used for the filter then the probability of the negative answer for the test checking whether the rowid of some random row from T has been included into the filter is (1-s(F)). The probability of the positive answer for the same test is equal corresponding to s(F). If a bloom filter is employed than the probability for the negative answer is (1-s(F)-eps) where eps is the probability of the false-positive answer. If a bloom filter contains 10*||F|| bits and it uses at least 7 funetions for hashing then eps < 0.01*||F||. So it can be ignored in our calculations because our estimate of s(F) is calculated with much worse precision. The cost of building a given filter is the sum of the cost of the index only scan for the range condition and the cost of of filling the search data structure used by the filter: b(F)=C(R)+s(F)*||T||*c_fill(F) Here C(R) is the cost of index only scan that fetches rowids from the scanned index entries and c_fill is the cost of adding one element to the search data structure. It's easy to see that s(R)=s(F) => s(R) * ||T||=s(F) * ||T|| => b(F)=C(R)+||F||*c_fill. If a sorted array is used by filter F as the search data structure then c_fill(F)=c_cmp*log(||F||)+c_wsa, where c_cmp is the cost of comparison of two rowids and c_wsa is the cost of adding one rowid to the used sorted array. If a hash table is used by filter as the search data structure then c_fill(F)=c_h+c_wht, where c_h is the cost of calculation of hash function for one rowid and c_wht iss the cost of adding one rowid to the used hash table. If filter F employs a bloom filter as the search data structure the c_fill(F)=k*c_h where k is the number of hash functions used to fill the bit array of the bloom filter and c_h is the cost of calculation of one such function. The cost of index only scan over the range R impacts the total building cost b(F) of filter F. This cost mainly depends on the the number of index blocks rather than on ||F||. The former in its turn depends on the type of the range scan. The cost of an index only scan over one range with n1 index entries may be much bigger than the cost of several small ranges with n2 index entries in total even when n2<< n1.

Igor Babaev added a comment - 2021-11-09 05:42 - edited

3. Average cost of fetching one record by ref access when using a filter
Let's consider a ref access to table T by secondary index I when filter F is used. Let c_fa(I) be the expected cost of fetching one record by a random key using index (or index prefix if I is a multi-component index) . Let k be the expected number of index entries for one key. Let consider. Then the cost of fetching records by one key lookup will be k*c_fa(I). When fetching these records we do one look-up into index I, read k index entries and perform k random accesses to read record data. One index look-up requires accessing a certain number of index blocks depending only on k and the length of the key entries. We have k*c_fa(I)=c_ioa(I)+k*c_ra(T) where c is the cost of reading index entries for one key access and c_ra is the cost of reading one random record from data file for table T. From the above we have c_fa(I)=c_ioa(I)/k + c_ra(T). Let's designate c_ioa(I)/k as c_tioa. Then we have c_tioa/c_fa(I)=1-c_ra(T)/c_fa(I). For the given index I and a random key the ratio c_ra(T)/c_fa(I) is a constant. Accordingly the ratio c_tioa/c_fa(I) is a constant (its own for each index I). Let's designate this ratio as c_ios(I).
The cost of performing a ref access by a random key for Index I without usage of any filter is k*c_fa. When filter F is employed for this access the corresponding cost will be
equal to

k*c_ios(I)*c_fa(I)+k*c_ch(F)+(k*s(F))*c_fa(I)*(1-c_ios)

Here k*c_ios(I)*c_fa(I) is the cost of index only key access by a random key, k*c_ch(F) is the cost of k look-ups into the filter, and (k*s(F)) is the expected number of the records to be fetched for this key access.
From the above we have the following expression for a(I,F) if we take into account that for each record rejected by filter F we don't pay the cost c_cond for checking the condition pushed into table T

a(I,F)=c_fa(I)-(c_ios(I)*c_fa(I)+c_ch(F)+s(F)*c_fa(I)*(1-c_ios)) + c_cond*(1-s(F))

Re-writing the above formula we get

a(I,F)=c_fa(I)*(1-c_ios(I)) - c_fa(I)(1-c_ios)*s(F)-c_ch(F)+c_cond*(1-s(F))  =>

a(I,F)= (c_fa(I)*(1-c_ios(I))+c_cond)*(1-s(F))-c_ch(F)

Igor Babaev added a comment - 2021-11-09 05:42 - edited 3. Average cost of fetching one record by ref access when using a filter Let's consider a ref access to table T by secondary index I when filter F is used. Let c_fa(I) be the expected cost of fetching one record by a random key using index (or index prefix if I is a multi-component index) . Let k be the expected number of index entries for one key. Let consider. Then the cost of fetching records by one key lookup will be k*c_fa(I). When fetching these records we do one look-up into index I, read k index entries and perform k random accesses to read record data. One index look-up requires accessing a certain number of index blocks depending only on k and the length of the key entries. We have k*c_fa(I)=c_ioa(I)+k*c_ra(T) where c is the cost of reading index entries for one key access and c_ra is the cost of reading one random record from data file for table T. From the above we have c_fa(I)=c_ioa(I)/k + c_ra(T). Let's designate c_ioa(I)/k as c_tioa. Then we have c_tioa/c_fa(I)=1-c_ra(T)/c_fa(I). For the given index I and a random key the ratio c_ra(T)/c_fa(I) is a constant. Accordingly the ratio c_tioa/c_fa(I) is a constant (its own for each index I). Let's designate this ratio as c_ios(I). The cost of performing a ref access by a random key for Index I without usage of any filter is k*c_fa. When filter F is employed for this access the corresponding cost will be equal to k*c_ios(I)*c_fa(I)+k*c_ch(F)+(k*s(F))*c_fa(I)*(1-c_ios) Here k*c_ios(I)*c_fa(I) is the cost of index only key access by a random key, k*c_ch(F) is the cost of k look-ups into the filter, and (k*s(F)) is the expected number of the records to be fetched for this key access. From the above we have the following expression for a(I,F) if we take into account that for each record rejected by filter F we don't pay the cost c_cond for checking the condition pushed into table T a(I,F)=c_fa(I)-(c_ios(I)*c_fa(I)+c_ch(F)+s(F)*c_fa(I)*(1-c_ios)) + c_cond*(1-s(F)) Re-writing the above formula we get a(I,F)=c_fa(I)*(1-c_ios(I)) - c_fa(I)(1-c_ios)*s(F)-c_ch(F)+c_cond*(1-s(F)) => a(I,F)= (c_fa(I)*(1-c_ios(I))+c_cond)*(1-s(F))-c_ch(F)

Igor Babaev added a comment - 2021-11-11 14:49 - edited

4. Pruning filters
The benefits if usage filter F with the given ref access by index I to join table T depends on the expected cardinality of the partial join PJi to which T and fanout of the ref access k. Let's designate ||PJi||*k as N.
As we saw above the gain can be calculated by the expression of the form a(I,F) * N - b(F) where a(I,F). This expression has to be evaluated many times while we are looking for the best execution plan with different value of N. And we have to do it for each filter to choose the filter that promises the best gain with the given ref access and
the given value N.That's why it's important to exclude from checking those filters that never provide the best gain for the given ref access.
For the given ref access that uses by index I l let's consider two filters F1 and F2 and the functions of gain for them:

g1(x)=a(I,F1)*x-b(F1) and g2(x)=a(I,F2)*x-b(b2) such that a(I,F1) != a(I,F2) and b(F1) != b(F2).

Let's find x0 where g1(x0)=g2(x0). If g1(x0) is negative and a(I,F1) > a(I,F2) then for any value of N filter F1 promises a better positive gain than filter F2. It's easy to infer that

  x0=(b(F1)-b(F2))/(a(I,F1)-a(I,F2))

Thus for the given ref access and two possible filters F1,F2 for T such that potentially it makes sense to consider both of them if x0 > 0 and x0* a(I,F1) - b(F1) < 0 and a(I,F1) > a(I,F2) then F2 can be excluded from consideration when a(I,F1) > a(I,F2) because usage of F1 promises a better gain whenever the gain is positive.
It turns out that a filter F can be excluded from consideration even in the cases when there is no other filter F' such that (b(F')-b(F))/(a(I,F')-a(I,F))*a(I,F)-b(F) < 0, in other words when for any other filter F' the graph of the gain function g' for F' crosses the graph of the gain function g for F in some point above the axis X. (As any gain function is a linear function there can't be more than one such point).
Let's consider 3 filters F1,F2,F3 such as a(I,F1) < a(I,F2) < a(I,F3) such that the crossing points for the graphs of the corresponding gain functions g1, g2, g3 are situated above the axis X. Let's designate the X coordinates of these points as x12, x13, and x23. They can be calculated by the following formulas:

x12=(b2(F2)-b1(F2))/(a(I,F2)-a(I,F1))   -- X coordinate of the crossing g1(x) and g2(x)

x13=(b2(F3)-b1(F1))/(a(I,F3)-a(I,F1))   -- X coordinate of the crossing g1(x) and g3(x)

x23=(b2(F3)-b1(F2))/(a(I,F3)-a(I,F2))   -- X coordinate of the crossing g2(x) and g3(x)

It's easy to see that if x13<x12 then x23<x12 because a(I,F1), a(I,F2), a(I,F3) are all positive, a(I,F1)<a(I,F2)< a(I,F3).
Thus in the case when x13<x12:

if 0<x <x12 then g1(x)>g2(x) and g1(x)>g(x3)

if x>=x12 then g3(x)>g1(x) and g3(x)>g2(x)

It means that filter F2 can be excluded from consideration for the given ref access.
The analysis above shows that if after all possible exclusions of filters for the remaining set of filters

{F1,...,Fm}

the following holds if we designate the X coordinate of the crossing point of the gain functions gi and gj as x[i,j]

x[i-1,i] < x[i,i+1] for any i such that 0<i<m if a(I,F1) < ...< a(i,Fm)

and
Fj is the filter that promises the best gain when N is between x[j-1,j] and x[j,j+1] for each j such that 0<j<=m. Here we assume that x[0,1] is b1[F1]/a(I,F1) and x[m,m+1] is +inf.

Igor Babaev added a comment - 2021-11-11 14:49 - edited 4. Pruning filters The benefits if usage filter F with the given ref access by index I to join table T depends on the expected cardinality of the partial join PJi to which T and fanout of the ref access k. Let's designate ||PJi||*k as N. As we saw above the gain can be calculated by the expression of the form a(I,F) * N - b(F) where a(I,F). This expression has to be evaluated many times while we are looking for the best execution plan with different value of N. And we have to do it for each filter to choose the filter that promises the best gain with the given ref access and the given value N.That's why it's important to exclude from checking those filters that never provide the best gain for the given ref access. For the given ref access that uses by index I l let's consider two filters F1 and F2 and the functions of gain for them: g1(x)=a(I,F1)*x-b(F1) and g2(x)=a(I,F2)*x-b(b2) such that a(I,F1) != a(I,F2) and b(F1) != b(F2). Let's find x0 where g1(x0)=g2(x0). If g1(x0) is negative and a(I,F1) > a(I,F2) then for any value of N filter F1 promises a better positive gain than filter F2. It's easy to infer that x0=(b(F1)-b(F2))/(a(I,F1)-a(I,F2)) Thus for the given ref access and two possible filters F1,F2 for T such that potentially it makes sense to consider both of them if x0 > 0 and x0* a(I,F1) - b(F1) < 0 and a(I,F1) > a(I,F2) then F2 can be excluded from consideration when a(I,F1) > a(I,F2) because usage of F1 promises a better gain whenever the gain is positive. It turns out that a filter F can be excluded from consideration even in the cases when there is no other filter F' such that (b(F')-b(F))/(a(I,F')-a(I,F))*a(I,F)-b(F) < 0, in other words when for any other filter F' the graph of the gain function g' for F' crosses the graph of the gain function g for F in some point above the axis X. (As any gain function is a linear function there can't be more than one such point). Let's consider 3 filters F1,F2,F3 such as a(I,F1) < a(I,F2) < a(I,F3) such that the crossing points for the graphs of the corresponding gain functions g1, g2, g3 are situated above the axis X. Let's designate the X coordinates of these points as x12, x13, and x23. They can be calculated by the following formulas: x12=(b2(F2)-b1(F2))/(a(I,F2)-a(I,F1)) -- X coordinate of the crossing g1(x) and g2(x) x13=(b2(F3)-b1(F1))/(a(I,F3)-a(I,F1)) -- X coordinate of the crossing g1(x) and g3(x) x23=(b2(F3)-b1(F2))/(a(I,F3)-a(I,F2)) -- X coordinate of the crossing g2(x) and g3(x) It's easy to see that if x13<x12 then x23<x12 because a(I,F1), a(I,F2), a(I,F3) are all positive, a(I,F1)<a(I,F2)< a(I,F3). Thus in the case when x13<x12: if 0<x <x12 then g1(x)>g2(x) and g1(x)>g(x3) if x>=x12 then g3(x)>g1(x) and g3(x)>g2(x) It means that filter F2 can be excluded from consideration for the given ref access. The analysis above shows that if after all possible exclusions of filters for the remaining set of filters {F1,...,Fm} the following holds if we designate the X coordinate of the crossing point of the gain functions gi and gj as x [i,j] x[i-1,i] < x[i,i+1] for any i such that 0<i<m if a(I,F1) < ...< a(i,Fm) and Fj is the filter that promises the best gain when N is between x [j-1,j] and x [j,j+1] for each j such that 0<j<=m. Here we assume that x [0,1] is b1 [F1] /a(I,F1) and x [m,m+1] is +inf.

Igor Babaev added a comment - 2021-11-13 03:48

5. Caveats of pruning filter.
When discussed pruning we assumed that when checking whether for a pair of filter F1,F2 one can be excluded from consideration when evaluating the cost of the given ref access together with the best filter the values of a(I,F1) and a(I,F1) are constants that do not depend on the join prefix against which the ref access is evaluated.
Unfortunately this is not the case in the current code of the optimizer. The fact is that in a general case the constant c_fa(I) used in the calculation of a(I,F) may depend on N. In a general case we have to use the expression

f_adj(N)*c_fa(I)*(1-c_ios(I)) - c_fa(I)(1-c_ios)*s(F)-c_ch(F)+c_cond*(1-s(F))

Let's consider the expression for x0 when we use the above expression for a(I,F1) and a(I,F)

x0=(b(F1)-b(F2))/((f_adj(N)*c_fa(I)*(1-c_ios(I)) - c_fa(I)(1-c_ios)*s(F1)-c_ch(F1)+c_cond*(1-s(F1)))-(f_adj(N)*c_fa(I)*(1-c_ios(I)) - (f_adj(N)*c_fa(I)(1-c_ios)*s(F2)-c_ch(F2)+c_cond*(1-s(F2))))

Let's assume that c_ch(F1)=c_ch(F2). (If blob filters are employed for F1 and F2 this equality is quite valid.) in this case we have

x0=(b(F1)-b(F2))/(f_adj(N)*c_fa(I)(1-c_ios(I))*(s(F1)-s(F2))+c_cond*(s(F2)-s(F1)))

Igor Babaev added a comment - 2021-11-13 03:48 5. Caveats of pruning filter. When discussed pruning we assumed that when checking whether for a pair of filter F1,F2 one can be excluded from consideration when evaluating the cost of the given ref access together with the best filter the values of a(I,F1) and a(I,F1) are constants that do not depend on the join prefix against which the ref access is evaluated. Unfortunately this is not the case in the current code of the optimizer. The fact is that in a general case the constant c_fa(I) used in the calculation of a(I,F) may depend on N. In a general case we have to use the expression f_adj(N)*c_fa(I)*(1-c_ios(I)) - c_fa(I)(1-c_ios)*s(F)-c_ch(F)+c_cond*(1-s(F)) Let's consider the expression for x0 when we use the above expression for a(I,F1) and a(I,F) x0=(b(F1)-b(F2))/((f_adj(N)*c_fa(I)*(1-c_ios(I)) - c_fa(I)(1-c_ios)*s(F1)-c_ch(F1)+c_cond*(1-s(F1)))-(f_adj(N)*c_fa(I)*(1-c_ios(I)) - (f_adj(N)*c_fa(I)(1-c_ios)*s(F2)-c_ch(F2)+c_cond*(1-s(F2)))) Let's assume that c_ch(F1)=c_ch(F2). (If blob filters are employed for F1 and F2 this equality is quite valid.) in this case we have x0=(b(F1)-b(F2))/(f_adj(N)*c_fa(I)(1-c_ios(I))*(s(F1)-s(F2))+c_cond*(s(F2)-s(F1)))

MariaDB Server

Improve the cost model for usage of in-memory range rowid / primary key filters

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration