Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-13664

CONNECT BY dataset and where clause evaluation

    XMLWordPrintable

Details

    • Task
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None

    Description

      Computing the final result for hierarchical queries is a multi-step process. First, let's quote oracle documentation:

      Oracle processes hierarchical queries as follows:
       
      A join, if present, is evaluated first, whether the join is specified in the FROM clause or with WHERE clause predicates.
       
      The CONNECT BY condition is evaluated.
       
      Any remaining WHERE clause predicates are evaluated.
      

      I will define the following terms:

      Data Set: The full set of rows, with all their associated columns that are used to generate a final result set for a hierarchical query. Some of these rows may appear multiple times in the final result set, when they are part of different paths (coming from a diamond inheritance), or not at at all, either due to the rows not passing the start with conditions, the connect by conditions or the where clause conditions.

      Base Set: The set of rows which pass the START WITH clause from the Data Set. These rows may or may not pass the where clause conditions. Each one of these rows represents the root for a new tree which will be generated during CONNECT BY computation.

      Hierarchical Set: The set of rows which are generated starting with the Base Set, by recursively evaluating the CONNECT BY condition. These rows must at least contain the rows in the Base Set.

      Result Set: The set of rows which are returned to the client. These rows are a subset of the Hierarchical Set, containing all rows which pass the WHERE Filtering Conditions.

      Join Conditions: Conditions which filter out the Cartesian product resulting from a select with multiple tables. These conditions can be expressed in the FROM clause such as:

      FROM t1 [LEFT|RIGHT|INNER|...] JOIN t2 ON t1.id = t2.fk
      

      Or via WHERE clause predicates, such as:

      FROM t1,t2
      WHERE t1.id = t2.fk
      

      Filtering Conditions: All conditions from the WHERE clause which are not part of Join Conditions.

      Given these definitions I shall define the multiple step processing on a conceptual level to produce the Result Set.

      Given a generic hierarchical query, which has N tables in the FROM clause.
      a) Compute all n-tuples of rows (row t1(i 1), row t2(i 2), ..., row tN(i N)) where row t x (i) represent the i-th row for table tx and i 1..N take values from 1 to number_of_rows(t 1..N)
      b) Filter out all n-tuples based on the Join Conditions. This produces the Dataset.
      c) Select from the Data Set all rows which pass the START WITH clause. These form the Base Set.
      d) Given the Base Set and the Data Set apply the CONNECT BY clauses recursively (algorithm defined in MDEV-13482) to form the Hierarchical Set.
      e) Filter out the Hierarchical Set according to the Filtering Conditions to generate the Result Set

      The only ambiguous task in this multi-step process is separating the Join Conditions from the Filtering Conditions, as Oracle's documentation provides a very shallow explanation of what kind of predicates in where clause represent join conditions.

      Let's look at a few examples:

      select p.person_id, p.name, p.gender, b.birth_date, c.name, p.mother_id, p.father_id
      from births b, people p inner join cities c on p.city_id = c.city_id
      where (p.gender = 'M') and p.birth_id = b.birth_id
      start with p.person_id = 5
      connect by (p.mother_id = prior p.person_id) or (p.father_id = prior p.person_id);
      

      Here we have the following conditions:
      p.city_id = c.city_id in the FROM clause and p.birth_id = b.birth_id and p.gender = 'M' in the WHERE clause.

      The first two conditions are treated as Join Conditions while the last one is treated as a Filtering condition.

      Adding the following extra condition in the from clause:

      select p.person_id, p.name, p.gender, b.birth_date, c.name, p.mother_id, p.father_id
      from births b, people p inner join cities c on p.city_id = c.city_id and c.city_id > 1
      where (p.gender = 'M') and p.birth_id = b.birth_id
      start with p.person_id = 5
      connect by (p.mother_id = prior p.person_id) or (p.father_id = prior p.person_id);
      

      The Join Conditions are: p.city_id = c.city_id and c.city_id > 1 and p.birth_id = b.birth_id.
      The Filtering Condition is: p.gender = 'M'.

      IMPORTANT

      We can confuse this algorithm by changing the equivalent p.birth_id = b.birth_id to be: case when p.birth_id = b.birth_id then 0 else 1 end = 0.
      In Oracle, the CASE expression is treated as a filtering condition, and thus the Data Set becomes larger. All birth rows get matched to all other previously matched pairs from people and cities tables. This leads to a larger potential hierarchy and a different result set.

      Another strange use case is if we change the AND condition in the WHERE clause to an OR:
      where (p.gender = 'M') AND p.birth_id = b.birth_id becomes where (p.gender = 'M') OR p.birth_id = b.birth_id. Semantically this doesn't really make sense, however Oracle performs the same split, where p.birth_id = b.birth_id is a Join Condition while p.gender = 'M' is a Filtering Condition.

      Given this brittleness observed in Oracle, I suggest we define a set of clear and simple rules and aim to respect those. It will not match Oracle in all use cases, but it will provide a consistent behaviour.

      Note

      The actual database used for these examples is pasted in the comment below.

      Attachments

        Issue Links

          Activity

            People

              cvicentiu Vicențiu Ciorbaru
              cvicentiu Vicențiu Ciorbaru
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.