[MDEV-20632] Recursive CTE cycle detection using CYCLE clause (nonstandard) Created: 2019-09-19  Updated: 2020-08-24  Resolved: 2020-03-10

Status: Closed
Project: MariaDB Server
Component/s: Optimizer - CTE
Fix Version/s: 10.5.2

Type: Task Priority: Critical
Reporter: Sergei Golubchik Assignee: Oleksandr Byelkin
Resolution: Fixed Votes: 1
Labels: None

Issue Links:
Blocks
Problem/Incident
causes MDEV-22018 WITH RECURSIVE supports CYCLE clause,... Closed

 Description   

One can use UNION DISTINCT as an easy way of avoiding cycles when traversing a graph with a CTE:

WITH RECURSIVE cte (from_, to_) (
   SELECT 1,1
   UNION DISTINCT
   SELECT graph.from_, graph.to_ FROM graph, cte WHERE graph.from_ = cte.to_
) ...

But often one needs to know more than edges, for example

WITH RECURSIVE cte (depth, from_, to_) (
   SELECT 0,1,1
   UNION DISTINCT
   SELECT depth+1, graph.from_, graph.to_ FROM graph, cte WHERE graph.from_ = cte.to_
) ...

and here DISTINCT no longer works.


SQL Standard specifies that a CTE can have a CYCLE clause as

WITH RECURSIVE ... (
  ...
)
CYCLE <cycle column list>
SET <cycle mark column> TO <cycle mark value> DEFAULT <non-cycle mark value>
USING <path column>

where

  • <cycle column list> is a subset of columns that the CTE returns
  • <cycle mark column> is a new column, generated on the fly, its value for any particular row being <cycle mark value> if there's a cycle and <non-cycle mark value> if there's no cycle
  • <path column> is an ARRAY where the path is being accumulated

While in the standard all clauses in the CYCLE are mandatory, we'll relax this grammar to allow only CYCLE <cycle column list>.


This task is about implementing optional CYCLE <cycle column list> clause after the recursive CTE definition.

There is a simple way to implement it by changing CTE's UNION ALL or UNION DISTINCT operator to enforce distinct-ness only over <cycle column list> columns, not over all columns that CTE returns.

The example from above would look like

WITH RECURSIVE cte (depth, from_, to_) (
   SELECT 0,1,1
   UNION
   SELECT depth+1, graph.from_, graph.to_ FROM graph, cte WHERE graph.from_ = cte.to_
) CYCLE from_, to_ RESTRICT
...

Note that it doesn't matter whether the CTE uses UNION ALL or UNION DISTINCT anymore. UNION ALL means "all rows, but without cycles", which is exactly what we'll do. And UNION DISTINCT means all rows should be different, which, again, is what will happen — as we'll enforce uniqueness over a subset of columns, complete rows will automatically be all different.



 Comments   
Comment by Oleksandr Byelkin [ 2020-01-13 ]

in standart there is CYCLE ( field list)

Comment by Oleksandr Byelkin [ 2020-02-28 ]

Added RESTRICT to mark non-standard usage and resolve parsing conflicts

Comment by Oleksandr Byelkin [ 2020-02-28 ]

commit be683b85ccada4bfdcd6b08fe4014442b33ce711 (HEAD > bb-10.5MDEV-206322, origin/bb-10.5MDEV-20632-2)
Author: Oleksandr Byelkin <sanja@mariadb.com>
Date: Mon Jan 27 21:50:16 2020 +0100

MDEV-20632: Recursive CTE cycle detection using CYCLE clause

Added CYCLE clause to recursive CTE.

Generated at Thu Feb 08 09:00:59 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.