Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-13482

CONNECT BY computation algorithm

    XMLWordPrintable

Details

    • Task
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None

    Description

      Background - how Oracle does it

      //Vicentiu, based on Oracle's documentation:

      Oracle selects the child rows of each root row. Each child row must satisfy the 
      condition of the CONNECT BY condition with respect to one of the root rows.
       
      Oracle selects successive generations of child rows. Oracle first selects the 
      children of the  rows returned in step 2, and then the children of those children, 
      and so on. Oracle always selects children by evaluating the CONNECT BY 
      condition with respect to a current parent row.
      

      This describes a breadth-first traversal.

      • Also, the docs specify that output rows are returned in the order that one would get if they were doing a depth-first traversal (search for (Oracle returns the rows in the order shown in Figure 9-1).
      • An experiment has shown that depth-first traversal is used: MDEV-13473.

      Looking at CONNECT BY from Recursive CTE point of view

      • CONNECT BY recursion is always linear (that is, only last-generation items contribute to the next generation).
      • CONNECT BY recursion detects cycles (and either aborts or stops producing further offspring there)
        • cycle detection is based on remembering all ancestors for each elements and not producing elements that have themselves as ancestors.

      Computing CONNECT BY as a Recursive CTE

      • Recursive CTE code does a breadth-first computation.
      • As far as our understanding goes, one can use breadth-first computation to compute CONNECT BY queries (but is it a good idea?)

      Detecting loops

      The documentation says:

      A loop occurs if one row is both the parent (or grandparent or direct ancestor)
      and a child (or a grandchild or a direct descendent) of another row.

      When CONNECT BY does not have NOCYCLE specifier, we only need to detect looping and abort the query with an error.

      When NOCYCLE specifier is used, we need to stop before row R (that is, not include R in the query output or in further computations) if R together with its ancestors forms a loop. R's parent row will have CYCLE=1, which is a bit counter-intuitive, as the cycle has not formed yet (See https://lists.launchpad.net/maria-developers/msg10840.html for details).

      TODO: How does one produce values for the CYCLE column? We will need to perform either "lookahead" (to see if current row has children) or "back-patching" (the value of CYCLE of some row R1 will be computed when we try to compute children(R1)). Considering that connect_by_expr may use CYCLE itself, lookahead might be the only option?

      The algorithm

      TODO

      Other details

      • A select with CONNECT BY construct is not a derived table, so if it is a subquery, it may have outer references and so will be re-executed multiple
        times.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              psergei Sergei Petrunia
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.