Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-22360

Sufficient conditions for accurate calculation of join cardinality

    XMLWordPrintable

    Details

      Description

      Let Q be a join query over tables T1,...,Tn with the where condition C.

      In the following cases the calculation of the cardinality of the result set of Q is considered as accurate:

      a) C is a multiple equality such that for each column of it the number of distinct values is known

      Example:

      SELECT * from t1,t2 WHERE t1.a=t2.a;
      

      We just need to know NDV, which can be found in 2 ways

      • there is index on t1.a and t2.a
      • EITS statistics is available for t1.a and t2.a

      b) C is a conjunction of formulas from a)

      c) C is either inequality predicate or [NOT] BETWEEN predicate or [NOT] IN predicate or [NOT] LIKE predicate or IS [NOT] NULL predicate who has

      • only one non-constant argument and this argument a reference to a column that is
        used either as the first component of an index with known statistical data or a column
        for which a histogram is provided
        (for [NOT] IN / [NOT] LIKE predicate this column must be the first argument)
      • all other arguments must be inexpensive constant items that are converted to simple
        constant (cached) (by the time we check the conditions)
        (in the case of [NOT] LIKE predicate the constant argument must be of a special form)

      Examples here would be:

      SELECT * from t1,t2 WHERE t1.a op const
      

      Where op can be >/>=/</<=/=/<>

      Also the other cases are with
      [NOT] IN predicate, [NOT] NULL predicate, LIKE predicate

      d) C is a conjunction formulas of a) and AND-OR formulas over predicates from b) such that
      each of these formulas depends (directly or indirectly through equalities inferred from the
      conjuncted multiple equalities) only on one column.

      SELECT * from t1,t2 WHERE (t1.a =1 or t1.a=4);    // directly depends on one column
      

      SELECT * from t1,t2 WHERE t1.a=t2.b and (t1.a > 5 or t2.b < 1);
      

      indirectly depeneds on one column, we use muliple equality ME(t1.a,t2.b).
      So after substitution the query would look like

      SELECT * from t1,t2 WHERE t1.a=t2.b and (t1.a > 5 or t1.a < 1);
      

      and the entire OR conjunct (t1.a > 5 or t1.a < 1) can be resolved by one column

      The above sufficient conditions can easily be checked for any WHERE clause just by traversing the corresponding item.

      It's clear that if C guarantees accurateness of calculation of the cardinality of the result set for Q then the cardinality of any partial join over tables Ti1,...,Tik with conditions pushed from C to these tables can be calculated accurately.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              psergei Sergei Petrunia
              Reporter:
              igor Igor Babaev
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Dates

                Created:
                Updated: