
Type: Task

Status: Stalled (View Workflow)

Priority: Critical

Resolution: Unresolved

Fix Version/s: 10.5

Component/s: Optimizer  CTE

Labels:None
One can use UNION DISTINCT as an easy way of avoiding cycles when traversing a graph with a CTE:
WITH RECURSIVE cte (from_, to_) ( 
SELECT 1,1 
UNION DISTINCT 
SELECT graph.from_, graph.to_ FROM graph, cte WHERE graph.from_ = cte.to_ 
) ...

But often one needs to know more than edges, for example
WITH RECURSIVE cte (depth, from_, to_) ( 
SELECT 0,1,1 
UNION DISTINCT 
SELECT depth+1, graph.from_, graph.to_ FROM graph, cte WHERE graph.from_ = cte.to_ 
) ...

and here DISTINCT no longer works.
SQL Standard specifies that a CTE can have a CYCLE clause as
WITH RECURSIVE ... (

...

)

CYCLE <cycle column list>

SET <cycle mark column> TO <cycle mark value> DEFAULT <noncycle mark value>

USING <path column>

where
 <cycle column list> is a subset of columns that the CTE returns
 <cycle mark column> is a new column, generated on the fly, its value for any particular row being <cycle mark value> if there's a cycle and <noncycle mark value> if there's no cycle
 <path column> is an ARRAY where the path is being accumulated
While in the standard all clauses in the CYCLE are mandatory, we'll relax this grammar to allow only CYCLE <cycle column list>.
This task is about implementing optional CYCLE <cycle column list> clause after the recursive CTE definition.
There is a simple way to implement it by changing CTE's UNION ALL or UNION DISTINCT operator to enforce distinctness only over <cycle column list> columns, not over all columns that CTE returns.
The example from above would look like
WITH RECURSIVE cte (depth, from_, to_) ( 
SELECT 0,1,1 
UNION 
SELECT depth+1, graph.from_, graph.to_ FROM graph, cte WHERE graph.from_ = cte.to_ 
) CYCLE from_, to_

...

Note that it doesn't matter whether the CTE uses UNION ALL or UNION DISTINCT anymore. UNION ALL means "all rows, but without cycles", which is exactly what we'll do. And UNION DISTINCT means all rows should be different, which, again, is what will happen — as we'll enforce uniqueness over a subset of columns, complete rows will automatically be all different.