[MDEV-13664] CONNECT BY dataset and where clause evaluation Created: 2017-08-28 Updated: 2017-09-05 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major |
| Reporter: | Vicențiu Ciorbaru | Assignee: | Vicențiu Ciorbaru |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Description |
|
Computing the final result for hierarchical queries is a multi-step process. First, let's quote oracle documentation:
I will define the following terms: Data Set: The full set of rows, with all their associated columns that are used to generate a final result set for a hierarchical query. Some of these rows may appear multiple times in the final result set, when they are part of different paths (coming from a diamond inheritance), or not at at all, either due to the rows not passing the start with conditions, the connect by conditions or the where clause conditions. Base Set: The set of rows which pass the START WITH clause from the Data Set. These rows may or may not pass the where clause conditions. Each one of these rows represents the root for a new tree which will be generated during CONNECT BY computation. Hierarchical Set: The set of rows which are generated starting with the Base Set, by recursively evaluating the CONNECT BY condition. These rows must at least contain the rows in the Base Set. Result Set: The set of rows which are returned to the client. These rows are a subset of the Hierarchical Set, containing all rows which pass the WHERE Filtering Conditions. Join Conditions: Conditions which filter out the Cartesian product resulting from a select with multiple tables. These conditions can be expressed in the FROM clause such as:
Or via WHERE clause predicates, such as:
Filtering Conditions: All conditions from the WHERE clause which are not part of Join Conditions. Given these definitions I shall define the multiple step processing on a conceptual level to produce the Result Set. Given a generic hierarchical query, which has N tables in the FROM clause. The only ambiguous task in this multi-step process is separating the Join Conditions from the Filtering Conditions, as Oracle's documentation provides a very shallow explanation of what kind of predicates in where clause represent join conditions. Let's look at a few examples:
Here we have the following conditions: The first two conditions are treated as Join Conditions while the last one is treated as a Filtering condition. Adding the following extra condition in the from clause:
The Join Conditions are: p.city_id = c.city_id and c.city_id > 1 and p.birth_id = b.birth_id. IMPORTANTWe can confuse this algorithm by changing the equivalent p.birth_id = b.birth_id to be: case when p.birth_id = b.birth_id then 0 else 1 end = 0. Another strange use case is if we change the AND condition in the WHERE clause to an OR: Given this brittleness observed in Oracle, I suggest we define a set of clear and simple rules and aim to respect those. It will not match Oracle in all use cases, but it will provide a consistent behaviour. NoteThe actual database used for these examples is pasted in the comment below. |
| Comments |
| Comment by Vicențiu Ciorbaru [ 2017-08-28 ] | ||||||||||||||||||||||||||||||||||||||||||||||
|
| ||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Vicențiu Ciorbaru [ 2017-08-28 ] | ||||||||||||||||||||||||||||||||||||||||||||||