|
There is a bit of confusion with the CONNECT_BY_ISCYCLE (I'll use CYCLE for short) column. The above mentioned explanation that we set the CYCLE value to be 1 for the parent of the row R which would cause a cycle if it were added to the list. This seems to hold for most examples, however here is a counter example:
create table t2 (row_id char, id int, parent int);
|
insert into t2 values ('a', 1, NULL);
|
insert into t2 values ('b', 2, 1);
|
insert into t2 values ('c', 3, 2);
|
insert into t2 values ('d', 4, 3);
|
insert into t2 values ('e', 5, 4);
|
insert into t2 values ('f', 6, 5);
|
insert into t2 values ('g', 2, 6);
|
This table has a simple loop within it. Running the following connect by query seems to corroborate the theory described in the entry.
SELECT row_id, id, parent, connect_by_iscycle, SYS_CONNECT_BY_PATH(row_id || id, '/') "Path"
|
from t2
|
start with id = 1
|
connect by nocycle prior id = parent;
|
|
| ROW_ID | ID | PARENT | CONNECT_BY_ISCYCLE | Path |
|
|--------|----|--------|--------------------|--------------------|
|
| a | 1 | (null) | 0 | /a1 |
|
| b | 2 | 1 | 0 | /a1/b2 |
|
| c | 3 | 2 | 0 | /a1/b2/c3 |
|
| d | 4 | 3 | 0 | /a1/b2/c3/d4 |
|
| e | 5 | 4 | 0 | /a1/b2/c3/d4/e5 |
|
| f | 6 | 5 | 1 | /a1/b2/c3/d4/e5/f6 |
|
As we can see, there is no 'g2' entry in the list, since if g2 were present, we would have a loop through b2, given this connect by condition.
Now lets try a different example:
create table t1 (id int, fst_child int, snd_child int);
|
|
insert into t1 values (1, 2, NULL);
|
insert into t1 values (2, 3, 4);
|
insert into t1 values (3, 5, NULL);
|
insert into t1 values (4, 5, NULL);
|
insert into t1 values (5, 6, NULL);
|
insert into t1 values (6, 2, 4);
|
Here, the relationship is expressed in a different direction. Instead of children having references to parents, nodes have references to their children.
Here the relationship diagram looks like this:
-- 1
|
-- |
|
-- 2 <--|
|
-- / \ |
|
-- 3 4 <-|
|
-- \ / |
|
-- 5 |
|
-- | |
|
-- 6 -----|
|
There are 2 features in this graph:
1) We have a diamond shape.
2) We have 2 possible cycles, from 6 to 2, and from 6 to 4.
Let's run a connect by with this graph, with no cycles, so row with id 6 has no children and is defined like this instead:
delete from t1 where id = 6;
|
insert into t1 values (6, NULL, NULL);
|
|
SELECT id, fst_child, snd_child, connect_by_iscycle, SYS_CONNECT_BY_PATH(id, '/') "Path"
|
from t1
|
start with id = 1
|
connect by nocycle ((id = prior fst_child) or (id = prior snd_child));
|
|
| ID | FST_CHILD | SND_CHILD | CONNECT_BY_ISCYCLE | Path |
|
|----|-----------|-----------|--------------------|------------|
|
| 1 | 2 | (null) | 0 | /1 |
|
| 2 | 3 | 4 | 0 | /1/2 |
|
| 3 | 5 | (null) | 0 | /1/2/3 |
|
| 5 | 6 | (null) | 0 | /1/2/3/5 |
|
| 6 | (null) | (null) | 0 | /1/2/3/5/6 |
|
| 4 | 5 | (null) | 0 | /1/2/4 |
|
| 5 | 6 | (null) | 0 | /1/2/4/5 |
|
| 6 | (null) | (null) | 0 | /1/2/4/5/6 |
|
In this case we have no loops and the diamond inheritance generates two paths, one going through node 4 and one going through node 3.
Now lets add a loop through node 6 to node 2.
delete from t1 where id = 6;
|
insert into t1 values (6, 2, NULL);
|
|
SELECT id, fst_child, snd_child, connect_by_iscycle, SYS_CONNECT_BY_PATH(id, '/') "Path"
|
from t1
|
start with id = 1
|
connect by nocycle ((id = prior fst_child) or (id = prior snd_child));
|
|
| ID | FST_CHILD | SND_CHILD | CONNECT_BY_ISCYCLE | Path |
|
|----|-----------|-----------|--------------------|----------|
|
| 1 | 2 | (null) | 0 | /1 |
|
| 2 | 3 | 4 | 0 | /1/2 |
|
| 3 | 5 | (null) | 0 | /1/2/3 |
|
| 5 | 6 | (null) | 1 | /1/2/3/5 |
|
| 4 | 5 | (null) | 0 | /1/2/4 |
|
| 5 | 6 | (null) | 1 | /1/2/4/5 |
|
As expected, we are missing row with id 6, as it would introduce a cycle if it were added to the result set. It's parent row number 5 has CYCLE set to 1.
So far so good. Now come the strange examples.
Lets change the loop from going to node with id 2 to go to node with id 4.
delete from t1 where id = 6;
|
insert into t1 values (6, 4, NULL);
|
|
SELECT id, fst_child, snd_child, connect_by_iscycle, SYS_CONNECT_BY_PATH(id, '/') "Path"
|
from t1
|
start with id = 1
|
connect by nocycle ((id = prior fst_child) or (id = prior snd_child));
|
| ID | FST_CHILD | SND_CHILD | CONNECT_BY_ISCYCLE | Path |
|
|----|-----------|-----------|--------------------|------------|
|
| 1 | 2 | (null) | 0 | /1 |
|
| 2 | 3 | 4 | 0 | /1/2 |
|
| 3 | 5 | (null) | 0 | /1/2/3 |
|
| 5 | 6 | (null) | 0 | /1/2/3/5 |
|
| 6 | 4 | (null) | 1 | /1/2/3/5/6 |
|
| 4 | 5 | (null) | 0 | /1/2/4 |
|
| 5 | 6 | (null) | 0 | /1/2/4/5 |
|
| 6 | 4 | (null) | 1 | /1/2/4/5/6 |
|
This case has row with id 6 show up TWICE and it in turn has cycle set to 1. This doesn't change wether we go through the diamond by node 3 or by node 4.
An even stranger result happens if 6 has children both 2 and 4.
delete from t1 where id = 6;
|
insert into t1 values (6, 2, 4);
|
|
SELECT id, fst_child, snd_child, connect_by_iscycle, SYS_CONNECT_BY_PATH(id, '/') "Path"
|
from t1
|
start with id = 1
|
connect by nocycle ((id = prior fst_child) or (id = prior snd_child));
|
| ID | FST_CHILD | SND_CHILD | CONNECT_BY_ISCYCLE | Path |
|
|----|-----------|-----------|--------------------|------------|
|
| 1 | 2 | (null) | 0 | /1 |
|
| 2 | 3 | 4 | 0 | /1/2 |
|
| 3 | 5 | (null) | 0 | /1/2/3 |
|
| 5 | 6 | (null) | 0 | /1/2/3/5 |
|
| 6 | 2 | 4 | 1 | /1/2/3/5/6 |
|
| 4 | 5 | (null) | 0 | /1/2/4 |
|
| 5 | 6 | (null) | 0 | /1/2/4/5 |
|
| 6 | 2 | 4 | 1 | /1/2/4/5/6 |
|
In this case we get the same result as if we had a loop through 4 only.
To further add to the confusion, lets set fst_child and snd_child both to equal 2 for node 6.
delete from t1 where id = 6;
|
insert into t1 values (6, 2, 2);
|
|
SELECT id, fst_child, snd_child, connect_by_iscycle, SYS_CONNECT_BY_PATH(id, '/') "Path"
|
from t1
|
start with id = 1
|
connect by nocycle ((id = prior fst_child) or (id = prior snd_child));
|
|
| ID | FST_CHILD | SND_CHILD | CONNECT_BY_ISCYCLE | Path |
|
|----|-----------|-----------|--------------------|------------|
|
| 1 | 2 | (null) | 0 | /1 |
|
| 2 | 3 | 4 | 0 | /1/2 |
|
| 3 | 5 | (null) | 0 | /1/2/3 |
|
| 5 | 6 | (null) | 0 | /1/2/3/5 |
|
| 6 | 2 | 2 | 1 | /1/2/3/5/6 |
|
| 4 | 5 | (null) | 0 | /1/2/4 |
|
| 5 | 6 | (null) | 0 | /1/2/4/5 |
|
| 6 | 2 | 2 | 1 | /1/2/4/5/6 |
|
This produces different results than if we only have fst_child set to 2 and snd_child set to NULL.
The same result is present if we have 4 for both fst_child and snd_child.
delete from t1 where id = 6;
|
insert into t1 values (6, 2, 2);
|
|
SELECT id, fst_child, snd_child, connect_by_iscycle, SYS_CONNECT_BY_PATH(id, '/') "Path"
|
from t1
|
start with id = 1
|
connect by nocycle ((id = prior fst_child) or (id = prior snd_child));
|
|
| ID | FST_CHILD | SND_CHILD | CONNECT_BY_ISCYCLE | Path |
|
|----|-----------|-----------|--------------------|------------|
|
| 1 | 2 | (null) | 0 | /1 |
|
| 2 | 3 | 4 | 0 | /1/2 |
|
| 3 | 5 | (null) | 0 | /1/2/3 |
|
| 5 | 6 | (null) | 0 | /1/2/3/5 |
|
| 6 | 4 | 4 | 1 | /1/2/3/5/6 |
|
| 4 | 5 | (null) | 0 | /1/2/4 |
|
| 5 | 6 | (null) | 0 | /1/2/4/5 |
|
| 6 | 4 | 4 | 1 | /1/2/4/5/6 |
|
|
|
1. the g2 in the first example is a reasonable result, it should not be contained in the result.
create table t2 (row_id char, id int, parent int);
|
insert into t2 values ('a', 1, NULL);
|
insert into t2 values ('b', 2, 1);
|
insert into t2 values ('c', 3, 2);
|
insert into t2 values ('d', 4, 3);
|
insert into t2 values ('e', 5, 4);
|
insert into t2 values ('f', 6, 5);
|
insert into t2 values ('g', 2, 6);
|
|
SELECT row_id, id, parent, connect_by_iscycle, SYS_CONNECT_BY_PATH(row_id || id, '/') "Path"
|
from t2
|
start with id = 1
|
connect by nocycle prior id = parent;
|
|
| ROW_ID | ID | PARENT | CONNECT_BY_ISCYCLE | Path |
|
|--------|----|--------|--------------------|--------------------|
|
| a | 1 | (null) | 0 | /a1 |
|
| b | 2 | 1 | 0 | /a1/b2 |
|
| c | 3 | 2 | 0 | /a1/b2/c3 |
|
| d | 4 | 3 | 0 | /a1/b2/c3/d4 |
|
| e | 5 | 4 | 0 | /a1/b2/c3/d4/e5 |
|
| f | 6 | 5 | 1 | /a1/b2/c3/d4/e5/f6 |
|
We can just assume g2 can be showed above, then the result should be "| g | 2 | 6 | 1 | /a1/b2/c3/d4/e5/f6/g2 |",we can find that "id=2" appeared twice in the hierarchical tree, it violate the cycle semantic, so the g2 should not be returned in the result.
2. I think "the diamond example with first child of 2 second child of null" is an Oracle bug.
create table t1 (id int, fst_child int, snd_child int);
|
|
--base data
|
insert into t1 values (1, 2, NULL);
|
insert into t1 values (2, 3, 4);
|
insert into t1 values (3, 5, NULL);
|
insert into t1 values (4, 5, NULL);
|
insert into t1 values (5, 6, NULL);
|
insert into t1 values (6, 2, 4);
|
|
--construct a cycle with id=6 to id=2
|
delete from t1 where id = 6;
|
insert into t1 values (6, 2, NULL);
|
|
SELECT id, fst_child, snd_child, connect_by_iscycle, SYS_CONNECT_BY_PATH(id, '/') "Path"
|
from t1
|
start with id = 1
|
connect by nocycle ((id = prior fst_child) or (id = prior snd_child));
|
|
| ID | FST_CHILD | SND_CHILD | CONNECT_BY_ISCYCLE | Path |
|
|----|-----------|-----------|--------------------|----------|
|
| 1 | 2 | (null) | 0 | /1 |
|
| 2 | 3 | 4 | 0 | /1/2 |
|
| 3 | 5 | (null) | 0 | /1/2/3 |
|
| 5 | 6 | (null) | 1 | /1/2/3/5 |
|
| 4 | 5 | (null) | 0 | /1/2/4 |
|
| 5 | 6 | (null) | 1 | /1/2/4/5 |
|
Since id=5 to id=6 will not get a cycle(duplicate value) in the hierarchical tree in the SYS_CONNECT_BY_PATH function, so I think the results should contain the below two rows for id=6
| 6 | 2 | 4 | 1 | /1/2/3/5/6 |
|
| 6 | 4 | 4 | 1 | /1/2/4/5/6 |
|
3. I think all the other results are correct, they all enforce the semantic "no cycle/duplicate value in the hierarchical tree in the SYS_CONNECT_BY_PATH function".
|
|
For implementing CONNECT BY as a CTE, the following code or similar can be used. Here with CONNECT_BY_ISLEAF:
select * from (
– transitive closure without CONNECT_BY_ISLEAF
WITH RECURSIVE trcl AS
(
select PARENT_ID p, CHILD_ID c, 1 depth, CONCAT('', IF(PARENT_ID ISNULL,'',PARENT_ID),'',CHILD_ID,'_') trace
from RELATION
where CHILD_ID in (select some_id from SOME_TABLE)
and PARENT_ID_TYPE = 'sometype'
UNION ALL
select r.PARENT_ID p, t.c c, t.depth+1, CONCAT('', IF(r.PARENT_ID ISNULL,'',r.PARENT_ID),'',t.trace)
from RELATION r, trcl t
where r.CHILD_ID = t.p and INSTR(t.trace,CONCAT('', IF(r.PARENT_ID ISNULL,'',r.PARENT_ID),'')) = 0
)
SELECT depth lev, c, p, trace
from (
– this is the equivalent to an “isleaf” alias
select t.p, t.c, t.depth, t.trace
from trcl t
where not exists (select 1 from RELATION s where s.CHILD_ID = t.p and IF(s.PARENT_ID IS NULL,'',s.PARENT_ID) != IF(s.CHILD_ID ISNULL,'',s.CHILD_ID) and INSTR(t.trace,CONCAT('',IF(s.PARENT_ID ISNULL,'',s.PARENT_ID),'')) = 0)
UNION ALL
select t.p, t.c, t.depth, t.trace
from trcl t
where exists (select 1 from RELATION s where s.CHILD_ID = t.p andIF(s.PARENT_ID IS NULL,'',s.PARENT_ID) != IF(s.CHILD_ID ISNULL,'',s.CHILD_ID) and INSTR(t.trace,CONCAT('',IF(s.PARENT_ID ISNULL,'',s.PARENT_ID),'')) = 0)
);
|