[MDEV-20632] Recursive CTE cycle detection using CYCLE clause (nonstandard) - Jira

Details

Type: Task
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Fix Version/s: 10.5.2
Component/s: Optimizer - CTE
Labels:
None

Description

One can use UNION DISTINCT as an easy way of avoiding cycles when traversing a graph with a CTE:

WITH RECURSIVE cte (from_, to_) (

   SELECT 1,1

   UNION DISTINCT

   SELECT graph.from_, graph.to_ FROM graph, cte WHERE graph.from_ = cte.to_

) ...

But often one needs to know more than edges, for example

WITH RECURSIVE cte (depth, from_, to_) (

   SELECT 0,1,1

   UNION DISTINCT

   SELECT depth+1, graph.from_, graph.to_ FROM graph, cte WHERE graph.from_ = cte.to_

) ...

and here DISTINCT no longer works.

SQL Standard specifies that a CTE can have a CYCLE clause as

WITH RECURSIVE ... (

...

CYCLE <cycle column list>

SET <cycle mark column> TO <cycle mark value> DEFAULT <non-cycle mark value>

USING <path column>

where

<cycle column list> is a subset of columns that the CTE returns
<cycle mark column> is a new column, generated on the fly, its value for any particular row being <cycle mark value> if there's a cycle and <non-cycle mark value> if there's no cycle
<path column> is an ARRAY where the path is being accumulated

While in the standard all clauses in the CYCLE are mandatory, we'll relax this grammar to allow only CYCLE <cycle column list>.

This task is about implementing optional CYCLE <cycle column list> clause after the recursive CTE definition.

There is a simple way to implement it by changing CTE's UNION ALL or UNION DISTINCT operator to enforce distinct-ness only over <cycle column list> columns, not over all columns that CTE returns.

The example from above would look like

WITH RECURSIVE cte (depth, from_, to_) (

   SELECT 0,1,1

   UNION

   SELECT depth+1, graph.from_, graph.to_ FROM graph, cte WHERE graph.from_ = cte.to_

) CYCLE from_, to_ RESTRICT

...

Note that it doesn't matter whether the CTE uses UNION ALL or UNION DISTINCT anymore. UNION ALL means "all rows, but without cycles", which is exactly what we'll do. And UNION DISTINCT means all rows should be different, which, again, is what will happen — as we'll enforce uniqueness over a subset of columns, complete rows will automatically be all different.

Attachments

Issue Links

causes

MDEV-22018 WITH RECURSIVE supports CYCLE clause, but it is not documented

Closed

Activity

People

Assignee:: Oleksandr Byelkin

Reporter:: Sergei Golubchik

Votes:: 1 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 2019-09-19 11:06

Updated:: 2020-08-24 18:29

Resolved:: 2020-03-10 06:33

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server