Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-22360

Sufficient conditions for accurate calculation of join cardinality



    • Type: Task
    • Status: In Progress (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Fix Version/s: 10.12
    • Component/s: Optimizer
    • Labels:


      Let Q be a join query over tables T1,...,Tn with the where condition C.

      In the following cases the calculation of the cardinality of the result set of Q is considered as accurate:

      a) C is a multiple equality such that for each column of it the number of distinct values is known


      SELECT * from t1,t2 WHERE t1.a=t2.a;

      We just need to know NDV, which can be found in 2 ways

      • there is index on t1.a and t2.a
      • EITS statistics is available for t1.a and t2.a

      b) C is a conjunction of formulas from a)

      c) C is either inequality predicate or [NOT] BETWEEN predicate or [NOT] IN predicate or [NOT] LIKE predicate or IS [NOT] NULL predicate who has

      • only one non-constant argument and this argument a reference to a column that is
        used either as the first component of an index with known statistical data or a column
        for which a histogram is provided
        (for [NOT] IN / [NOT] LIKE predicate this column must be the first argument)
      • all other arguments must be inexpensive constant items that are converted to simple
        constant (cached) (by the time we check the conditions)
        (in the case of [NOT] LIKE predicate the constant argument must be of a special form)

      Examples here would be:

      SELECT * from t1,t2 WHERE t1.a op const

      Where op can be >/>=/</<=/=/<>

      Also the other cases are with
      [NOT] IN predicate, [NOT] NULL predicate, LIKE predicate

      d) C is a conjunction formulas of a) and AND-OR formulas over predicates from b) such that
      each of these formulas depends (directly or indirectly through equalities inferred from the
      conjuncted multiple equalities) only on one column.

      SELECT * from t1,t2 WHERE (t1.a =1 or t1.a=4);    // directly depends on one column

      SELECT * from t1,t2 WHERE t1.a=t2.b and (t1.a > 5 or t2.b < 1);

      indirectly depeneds on one column, we use muliple equality ME(t1.a,t2.b).
      So after substitution the query would look like

      SELECT * from t1,t2 WHERE t1.a=t2.b and (t1.a > 5 or t1.a < 1);

      and the entire OR conjunct (t1.a > 5 or t1.a < 1) can be resolved by one column

      The above sufficient conditions can easily be checked for any WHERE clause just by traversing the corresponding item.

      It's clear that if C guarantees accurateness of calculation of the cardinality of the result set for Q then the cardinality of any partial join over tables Ti1,...,Tik with conditions pushed from C to these tables can be calculated accurately.


          Issue Links



              psergei Sergei Petrunia
              igor Igor Babaev
              0 Vote for this issue
              7 Start watching this issue



                  Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.