Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-11588

Support for ONLY_FULL_GROUP_BY functional dependency

Details

    Description

      When SQL mode ONLY_FULL_GROUP_BY is enabled (as is default), MariaDB rejects, per the '92 SQL standard, any SELECT queries with (non-aggregated) columns not also listed in the GROUP BY statement with error 1055: "'db.table.column_name' isn't in GROUP_BY".

      However, the 2003 spec loosens this restriction with a requirement that the columns need only be functionally dependent:

      "17) If T is a grouped table, then let G be the set of grouping columns of T. In each ((value expression)) contained in ((select list)) , each column reference that references a column of T shall reference some column C that is functionally dependent on G or shall be contained in an aggregated argument of a ((set function specification)) whose aggregation query is QS."

      Since version 5.7.5, available since late 2014, MySQL supports the detection of functional dependencies in GROUP BY statements. It would be good to see this ported over to MariaDB as well to eliminate the need for redundant double column listing (and presumable performance hit when dependent columns are needlessly iterated for grouping), where functional dependency could be detected. For some useful background, see this entry at the MySQLServerTeam blog, along with Roland Bouman's related write-up .

      Since ONLY_FULL_GROUP_BY is now a default both in Maria and MySQL, not porting this to Maria would make migration from MySQL to Maria more involved, forcing either SQL mode or code changes where affected queries are used. (There's a related older ticket mentioning the ANY_VALUE() function to deal with the over-strict mode, however that alone makes for a clunky work-around to over-strict grouping.)

      Attachments

        Issue Links

          Activity

            What was done the previous week (07-23 - 07-30):

            1. Stage 1 was ended and sent on review to Igor
            2. LEFT/RIGHT/OUTER joins handling code is in process
            3. MySQL tests on functional dependencies were ran and highlighted some bugs

            shagalla Galina Shalygina (Inactive) added a comment - What was done the previous week (07-23 - 07-30): 1. Stage 1 was ended and sent on review to Igor 2. LEFT/RIGHT/OUTER joins handling code is in process 3. MySQL tests on functional dependencies were ran and highlighted some bugs

            What was done the previous week (07-30 - 08-06):

            After discussion with Igor it was decided to change LEFT/RIGHT/OUTER join handling.

            • Dependent maps used in implementations were changed on recursion function
            • Code was entirely rewritten.

            Current implementation is done in such way:

            Consider a query:

            SELECT t3.x
            FROM t1 LEFT JOIN t2 ON (t1.a = t2.e AND t1.c = 3)
                                LEFT JOIN t3 ON (t2.e = t3.x AND t1.a = t2.f)
            WHERE t1.a = 2
            GROUP BY t1.b;
            

            1. Consider the most outer join t1 LEFT JOIN t2 ON (t1.a = t2.e)
            1.1.1. Consider table t1 (the most outer table in this join) and find in WHERE clause equalities that use t1 only.
            1.1.2. There is such an equality: t1.a = 2. From this equality can be said that t1.a is ‘allowed’.
            1.2.1. Consider table t2
            1.2.2. Take this LEFT JOIN on expression part: (t1.a = t2.e AND t1.c = 3)
            1.2.3. Extract from this ON expression part equalities that depend on t2 and t1. Important! Take only those equalities that depend on t2. So (t1.c = 3) will not be taken in account.
            1.2.4. Take WHERE clause equalities depend on t1,t2.
            1.2.5. Try to extract new ‘allowed’ field. From (t1.a = t2.e) can be said that t2.e is allowed.
            2. Consider t3.
            2.1. Take on expression of the t3 LEFT join: (t2.e = t3.x AND t1.a = t2.f)
            2.2. Extract from this ON expression part equalities depend on t1,t2,t3. Don’t take equalities that depend on t1 and t2 (or both) only. So (t1.a = t2.f) will not be taken.
            2.3. Take WHERE clause equalities depend on t1,t2,t3.
            2.4. Try to extract new ‘allowed’ field. From (t2.e = t3.x) can be said that t3.x is allowed.

            What is also done:

            1. MySQL tests were ran. This highlighted some current implementation limitations:
            1.1. Equalities that use ROW items are not handled.
            1.2. Materialized derived tables/views unique/primary keys are not used
            1.3. Simple equalities like number * no_dep_field = dep_field are not handled.
            1.4. Can't handle: select a from v1 where a=b group by 2*t1.a;
            1.5. Can't handle "group by "";" with on expressions.

            2. ORDER BY handling was added, some bugs were fixed.

            shagalla Galina Shalygina (Inactive) added a comment - - edited What was done the previous week (07-30 - 08-06): After discussion with Igor it was decided to change LEFT/RIGHT/OUTER join handling. Dependent maps used in implementations were changed on recursion function Code was entirely rewritten. Current implementation is done in such way: Consider a query: SELECT t3.x FROM t1 LEFT JOIN t2 ON (t1.a = t2.e AND t1.c = 3) LEFT JOIN t3 ON (t2.e = t3.x AND t1.a = t2.f) WHERE t1.a = 2 GROUP BY t1.b; 1. Consider the most outer join t1 LEFT JOIN t2 ON (t1.a = t2.e) 1.1.1. Consider table t1 (the most outer table in this join) and find in WHERE clause equalities that use t1 only. 1.1.2. There is such an equality: t1.a = 2. From this equality can be said that t1.a is ‘allowed’. 1.2.1. Consider table t2 1.2.2. Take this LEFT JOIN on expression part: (t1.a = t2.e AND t1.c = 3) 1.2.3. Extract from this ON expression part equalities that depend on t2 and t1. Important! Take only those equalities that depend on t2. So (t1.c = 3) will not be taken in account. 1.2.4. Take WHERE clause equalities depend on t1,t2. 1.2.5. Try to extract new ‘allowed’ field. From (t1.a = t2.e) can be said that t2.e is allowed. 2. Consider t3. 2.1. Take on expression of the t3 LEFT join: (t2.e = t3.x AND t1.a = t2.f) 2.2. Extract from this ON expression part equalities depend on t1,t2,t3. Don’t take equalities that depend on t1 and t2 (or both) only. So (t1.a = t2.f) will not be taken. 2.3. Take WHERE clause equalities depend on t1,t2,t3. 2.4. Try to extract new ‘allowed’ field. From (t2.e = t3.x) can be said that t3.x is allowed. What is also done: 1. MySQL tests were ran. This highlighted some current implementation limitations: 1.1. Equalities that use ROW items are not handled. 1.2. Materialized derived tables/views unique/primary keys are not used 1.3. Simple equalities like number * no_dep_field = dep_field are not handled. 1.4. Can't handle: select a from v1 where a=b group by 2*t1.a; 1.5. Can't handle "group by "";" with on expressions. 2. ORDER BY handling was added, some bugs were fixed.

            What was done the previous week (08-26 - 09-03):

            1. Removed set subquery context methods. Now subquery context is taken the same way as it is done in fix_outer_field() method.
            2. Cleaned code
            3. Rewrote comments
            4. Added new tests and removed useless tests
            5. Removed methods that check if function arguments are of the allowed types (e.g. MINITE(arg), arg should be DATE or DATETIME).
            6. Added new methods that check if functions return deterministic result (string functions like LENGTH for varchar). (Deterministic here: Deterministic function here is a function that returns the same
            result for equal input sets.)
            7. Forbid to use control flow functions in SELECT list, HAVING and ORDER BY if GROUP BY is used.
            8. Clarified with Igor what conditions should be met when functionally dependent field is tried to be extracted from the ON expression equality predicates.

            What is still missing:

            Now MariaDB forbids to use non GROUP BY fields in HAVING. It throughs error before any functional dependencies are checked even if 'ONLY_FULL_GROUP_BY' mode is not set.
            This code should be changed so functionally dependent fields can be used in HAVING.

            shagalla Galina Shalygina (Inactive) added a comment - What was done the previous week (08-26 - 09-03): 1. Removed set subquery context methods. Now subquery context is taken the same way as it is done in fix_outer_field() method. 2. Cleaned code 3. Rewrote comments 4. Added new tests and removed useless tests 5. Removed methods that check if function arguments are of the allowed types (e.g. MINITE(arg), arg should be DATE or DATETIME). 6. Added new methods that check if functions return deterministic result (string functions like LENGTH for varchar). (Deterministic here: Deterministic function here is a function that returns the same result for equal input sets.) 7. Forbid to use control flow functions in SELECT list, HAVING and ORDER BY if GROUP BY is used. 8. Clarified with Igor what conditions should be met when functionally dependent field is tried to be extracted from the ON expression equality predicates. What is still missing: Now MariaDB forbids to use non GROUP BY fields in HAVING. It throughs error before any functional dependencies are checked even if 'ONLY_FULL_GROUP_BY' mode is not set. This code should be changed so functionally dependent fields can be used in HAVING.

            Can someone please finish this.
            There are hundreds of started features but so many get inactive.

            blackout Roman Stingler (Inactive) added a comment - Can someone please finish this. There are hundreds of started features but so many get inactive.

            Functional dependencies project description is divided into several steps.

            First, the definitions will be introduced.
            Further, the examples of functional dependencies cases in relational databases will be provided: starting from the basic case of the examples of functional dependencies in the table definition to the narrowing of functional dependency rules using HAVING clause.

            Along with using functional dependencies with GROUP BY in the query, they can be used for omitting additional calculations or creating temporary tables for derived tables.
            Consider the example:

            create table t1(a int, b int);
            insert into t1 values (1, 2), (2, 3);
             
            select d.a
            from t1 join (
            select t1.a, max(b) as mb
            from t1 group by t1.a
            ) as d on t1.b = d.a;
            

            In the query it can be seen that the derived table field d.mb is not used in the parent SELECT. So, its calculations can be omitted.

            I. Functional dependency definitions

            Relation is a set of tuples.
            A tuple is a set of attribute values in which no two distinct elements have the same name.

            Consider a relation R and sets of attributes A and B in R.
            Let “R: A ↦ B” (read “in R, A determines B” or “B is functionally dependent on A in R”) denote the functional dependency of B on A in R, which is true if, for any possible value of R, any two rows that are not distinct for every column in A also are not distinct for every column in B.

            Several properties of functional dependencies can be proved. They are called Armstrong's axioms.

            Given A, B and C sets of attributes in a relation R, these properties are said to be true:
            1. Reflexivity
            If B is a subset of A, then A → B.
            It can be weakened to: A → {}, where {} - is an empty set of attributes.
            2. Transitivity
            If A → B and B → C, then A → C
            3. Augmentation
            If A → B, then AC → BC

            Also let's introduce the definition of FD-function that will be used later:

            FD-function is a function f for which is true: if two elements are equal, then their FD-function values are also equal:
            x1=x2 => f1(x1)=f(x2)
            x1=y1, ..., xn=yn => f(x1,...,xn) = f(y1,...,yn)

            Example of FD-function:

            f = x + 1
            Simply, for any numeric x1 = x2 => x1 + 1 = x2 + 1
            

            Example of non-FD function:

            1. f = length(x)
             
              x1 = "a"
              x2 = "a "
             
              length(x1) = length("a") = 1
              length(x2) = length("a ") = 2
             
              x1 = x2, but f(x1) != f(x2)
             
            2. f = concat(x)
             
              x1 = "a", x2 = "b"
              y1 = "a ", y2 = "b"
             
              concat(x1, x2) = "ab"
              concat(y1, y2) = "a b"
             
              x1 = y1, x2 = y2, but concat(x1, x2) != concat(y1, y2)
            

            II. Functional dependencies in a relation definition.

            Relation further will be considered as a query expression body.

            1. Primary key

            According to the definition, if a relation contains a primary key, then all attributes of this relation are functionally dependent on the primary key.

            A → B, where A - is a primary key set of attributes in relation R, B - is a set of all attributes of relation R.

            It can be seen through the example:

            create table t1 (a int primary key, b int, c int);
            insert into t1 values (1,2,2), (2,2,2), (3,4,5);
             
            t1.a -> {t1.a, t1.b, t1.c}
            

            2. Unique key

            In a similar way, if a relation contains a unique key, then all attributes of this relation are
            functionally dependent on this unique key.

            A → B, where A - is a unique constraint set of attributes in relation R, B - is a set of all attributes of relation R.

            Consider the example:

            create table t1 (a int, b int, c int, unique(a,b));
            insert into t1 values (1,2,2), (2,2,2), (3,4,5);
             
            {t1.a, t1.b} -> {t1.a, t1.b, t1.c}
            

            3. Virtual column
            Virtual column is a relation attribute that is represented as a result of the function of some relation attributes and/or constants.

            It can be represented this way:

            V = F(A), where V - is a virtual column, F - is a function, A - is a set of attributes

            It can be said that virtual column is functionally dependent on the attributes set that defines this column only if the function used in the virtual column definition is FD-function:

            V = F(A), where F - is FD-function, =>
            A -> V

            Consider the example:

            create table t1 (a int, b int, c int as (a*b) virtual);
            insert into t1 values (1,3,default), (2,2,default);
             
            {t1.a, t1.b} -> {t1.a*t1.b} (as multiplication is FD-function)
            

            III. Functional dependencies added by using WHERE clause

            To extract new functional dependencies WHERE clause top-level AND-component equalities can be used. It's also considered that between parts of the equalities no implicit conversions should be used.

            1. Consider the SELECT:

            SELECT *
            FROM R
            WHERE R.A = R.B;
            

            where R - is a relation, A and B are R attributes
            It can be said that:

            R.A -> R.B and R.B -> R.A

            2. Consider the SELECT:

            SELECT *
            FROM R
            WHERE R.A = F(R.B, R.C);
            

            where R - is a relation, A,B,C are attributes and F is FD-function.

            R.A -> {R.B, R.C} and {R.B, R.C} -> R.A

            3. Consider the SELECT:

            SELECT *
            FROM R
            WHERE R.A = R.B AND R.B = R.C;
            

            where R - is a relation, A,B,C are attributes.

            R.A -> R.B and R.B -> R.A
            R.B -> R.C and R.C -> R.B

            Using transitivity axiom new functional dependencies can be extracted:

            R.A -> R.C
            R.C -> R.A

            IV. JOIN

            Let's consider more difficult cases defining the rules for functional dependency extraction when
            SELECT from multiple relations is used.

            1. CROSS JOIN

            Consider the SELECT:

            SELECT *
            FROM OP1 CROSS JOIN OP2;
            

            where OP1 and OP2 are relations.

            By definition, all functional dependencies from OP1 and from OP2 definitions are true in the result of OP1 and OP2 cross join.

            2. INNER JOIN

            Inner JOIN rules for functional dependency extraction are the same as for the CROSS JOIN (1).

            Additionally, ON clause top-level AND-component equalities can be used for extraction of new functional dependencies. Similarly with WHERE-clause, between parts of the equalities no implicit conversions should be used.

            Consider the SELECT:

            SELECT *
            FROM OP1 INNER JOIN OP2
            ON OP1.A = OP2.C
            WHERE OP1.A = OP2.B;
            

            where A is an attribute from OP1, B, C are attributes from OP2.

            From the WHERE clause these functional dependencies can be extracted:
            OP1.A -> OP2.B and OP2.B -> OP1.A

            From ON clause equality (similar with extraction from WHERE):
            OP1.A -> OP2.C and OP2.C -> OP1.A

            So, four functional dependencies have been extracted. Using transitivity axiom another two can be extracted from them:

            OP2.C -> OP2.B and OP2.C -> OP2.B

            3. LEFT and RIGHT JOIN

            Let's define LEFT and RIGHT JOIN definitions:

            1. Left JOIN

            OP1 LEFT JOIN OP2 ON ...
            where OP1 is called strong side, OP2 - weak side of LEFT JOIN.

            2. RIGHT JOIN

            OP1 RIGHT JOIN OP2 ON ...
            where OP1 is called weak side, OP2 - strong side of RIGHT JOIN.

            To extract functional dependencies when LEFT or RIGHT JOIN is used CROSS JOIN (1) rules can be used.

            Additionally, functional dependencies can be extracted from ON-clause top-level AND-component equalities but only if the equality doesn't contain strong side attributes that aren't participating in any already existing functional dependency rules.

            Consider the queries:

            create table t1 (a int primary key, b int);
            create table t2 (x int primary key, y int;
             
            insert into t1 values (1,1), (2,1);
            insert into t2 values (1,2), (2,3);
            

            a.

            select t1.a, t1.b
               from t1 left join t2
                 on t1.a = t1.b;
             
            +------+------+
            | t1.a | t1.b |
            +------+------+
            |   1  |   1  |
            +------+------+
            |   1  |   1  |
            +------+------+
            |   2  |   1  |
            +------+------+
            

            From the result of the query it can be seen that t1.a and t1.b are not functionally dependent (because for t1.b = 1 there are two t1.a different values). t1.a is participating in the functional dependency as a primary key, but t1.b is not participating in any functional dependency rule.

            b.

            select t1.a, t2.y
               from t1 left join t2
                 on t1.a = t2.y;
             
            +------+------+
            | t1.a | t2.y |
            +------+------+
            |   1  |   1  |
            +------+------+
            |   2  |   2  |
            +------+------+
            

            Here t1.a -> t2.y and t2.y -> t1.a functional dependencies can be extracted.

            t1.a is a primary key, and t2.y is an attribute from the weak side of the LEFT JOIN.

            4. Derived tables, views.

            In the case of the derived table or view it can be considered as a relation in the SELECT
            where it is used. So, all functional dependencies that can be extracted from the definition of the derived table/view can be used in the SELECT where the derived table/view is used for extracting new functional dependencies.

            V. SELECT list projections.

            1. General rules

            As projections formulate the final list of the result attributes there is no need to extract functional
            dependencies which parts are not participating in the projections.

            It's important to mention that only attributes that participate in functional dependencies can be used in SELECT list when GROUP BY is used.

            Consider the example:

            SELECT R.A
            FROM R
            WHERE R.B = R.C
            GROUP BY R.A, R.B
            

            Here the set of attributes is:

            {R.A, R.B, R.C}

            Functional dependencies are:
            R.B -> R.C

            As projections include only R.A attribute, set of attributes can be shortened to {R.A} and functional dependencies list will be empty.

            2. Expressions in the projections list

            It’s important to mention that expressions used in the projection list can be considered as virtual columns and the same rules as for the virtual column should be applied to this expression. For details go to II.3.

            Consider the example:

            SELECT R.A + R.B
            FROM R
            GROUP BY R.A, R.B
            


            As addition is an FD-function there is a functional dependency:

            {R.A, R.B} -> {R.A + R.B}

            VI. GROUP BY + HAVING

            1. GROUP BY

            Attributes that participate in the GROUP BY can be considered as an initial set for functional dependencies extraction. From this set functionally dependent attributes can be extracted recursively.

            a. Consider the query:

            SELECT R.A
            FROM R
            WHERE R.A = R.B
            GROUP BY R.A;
            


            where R - is a relation, A,B - are attributes of R.

            Here R.A is a GROUP BY field and can be considered as an initial set for functional dependency extraction.

            Initial set: {R.A}

            Using WHERE clause equality: R.A -> R.B

            b. Consider the query:

            SELECT R.A
            FROM R
            WHERE R.A = R.B AND R.PK = R.C
            GROUP BY R.A;
            


            where R - is a relation, A,B,C - are attributes of R, PK - is a primary key in R.

            Similarly with a.:
            Initial set: {R.A}

            WHERE clause equality helps to extract:
            R.A -> R.B

            So, now the set of attributes from where functional dependencies can be extracted is: {R.A, R.B}

            Despite the fact that R.PK is a primary key it can't participate in functional dependencies
            extraction and the set remains the same.

            c. Consider the query:

            SELECT R.A
            FROM R
            WHERE R.A = R.B
            GROUP BY R.PK;
            


            where R - is a relation, A,B - are attributes of R, PK - is a primary key in R.

            The initial set of attributes from GROUP BY is: {R.PK}

            R.PK is a primary key of R. So, using the properties of primary keys (II.1), all attributes of
            the relation R are functionally dependent from the R.PK. So, the set of attributes now is: {R.PK, R.A, R.B}

            2. HAVING

            HAVING can be used only with GROUP BY in the query. HAVING top-level AND-component equalities can be used for new functional dependencies extraction.
            However, attributes that participate in HAVING should be in the initial set or been extracted through functional dependencies.

            Consider the example:

            SELECT R.A, R.C
            FROM R
            WHERE R.B = R.C
            GROUP BY R.A, R.B
            HAVING R.A = R.C;
            


            where R - is a relation, A,B,C - are attributes of R.

            Initial set for functional dependency extraction is:
            {R.A, R.B} - attributes that participate in GROUP BY

            From WHERE-clause R.C can be added to the set through the functional dependency with R.B:

            R.B -> R.C
            Set: {R.A, R.B, R.C}

            Using HAVING-clause equality new functional dependency can be extracted (as both parts of the equality are in the set).

            Functional dependencies R.A -> R.B, R.B -> R.C and transitivity axioma lead to the new functional dependency:

            R.A -> R.C

            ===============
            The code written before for this MDEV is located here and currently can't be reproduced on 11.6.

            shagalla Galina Shalygina (Inactive) added a comment - Functional dependencies project description is divided into several steps. First, the definitions will be introduced. Further, the examples of functional dependencies cases in relational databases will be provided: starting from the basic case of the examples of functional dependencies in the table definition to the narrowing of functional dependency rules using HAVING clause. Along with using functional dependencies with GROUP BY in the query, they can be used for omitting additional calculations or creating temporary tables for derived tables. Consider the example: create table t1(a int , b int ); insert into t1 values (1, 2), (2, 3);   select d.a from t1 join ( select t1.a, max (b) as mb from t1 group by t1.a ) as d on t1.b = d.a; In the query it can be seen that the derived table field d.mb is not used in the parent SELECT . So, its calculations can be omitted. I. Functional dependency definitions Relation is a set of tuples. A tuple is a set of attribute values in which no two distinct elements have the same name. Consider a relation R and sets of attributes A and B in R. Let “R: A ↦ B” (read “in R, A determines B” or “B is functionally dependent on A in R”) denote the functional dependency of B on A in R, which is true if, for any possible value of R, any two rows that are not distinct for every column in A also are not distinct for every column in B. Several properties of functional dependencies can be proved. They are called Armstrong's axioms. Given A, B and C sets of attributes in a relation R, these properties are said to be true: 1. Reflexivity If B is a subset of A, then A → B. It can be weakened to: A → {}, where {} - is an empty set of attributes. 2. Transitivity If A → B and B → C, then A → C 3. Augmentation If A → B, then AC → BC Also let's introduce the definition of FD-function that will be used later: FD-function is a function f for which is true: if two elements are equal, then their FD-function values are also equal: x1=x2 => f1(x1)=f(x2) x1=y1, ..., xn=yn => f(x1,...,xn) = f(y1,...,yn) Example of FD-function: f = x + 1 Simply, for any numeric x1 = x2 => x1 + 1 = x2 + 1 Example of non-FD function: 1. f = length(x)   x1 = "a" x2 = "a "   length(x1) = length("a") = 1 length(x2) = length("a ") = 2   x1 = x2, but f(x1) != f(x2)   2. f = concat(x)   x1 = "a", x2 = "b" y1 = "a ", y2 = "b"   concat(x1, x2) = "ab" concat(y1, y2) = "a b"   x1 = y1, x2 = y2, but concat(x1, x2) != concat(y1, y2) II. Functional dependencies in a relation definition. Relation further will be considered as a query expression body. 1. Primary key According to the definition, if a relation contains a primary key, then all attributes of this relation are functionally dependent on the primary key. A → B, where A - is a primary key set of attributes in relation R, B - is a set of all attributes of relation R. It can be seen through the example: create table t1 (a int primary key , b int , c int ); insert into t1 values (1,2,2), (2,2,2), (3,4,5);   t1.a -> {t1.a, t1.b, t1.c} 2. Unique key In a similar way, if a relation contains a unique key, then all attributes of this relation are functionally dependent on this unique key. A → B, where A - is a unique constraint set of attributes in relation R, B - is a set of all attributes of relation R. Consider the example: create table t1 (a int , b int , c int , unique (a,b)); insert into t1 values (1,2,2), (2,2,2), (3,4,5);   {t1.a, t1.b} -> {t1.a, t1.b, t1.c} 3. Virtual column Virtual column is a relation attribute that is represented as a result of the function of some relation attributes and/or constants. It can be represented this way: V = F(A), where V - is a virtual column, F - is a function, A - is a set of attributes It can be said that virtual column is functionally dependent on the attributes set that defines this column only if the function used in the virtual column definition is FD-function: V = F(A), where F - is FD-function, => A -> V Consider the example: create table t1 (a int , b int , c int as (a*b) virtual); insert into t1 values (1,3, default ), (2,2, default );   {t1.a, t1.b} -> {t1.a*t1.b} ( as multiplication is FD- function ) III. Functional dependencies added by using WHERE clause To extract new functional dependencies WHERE clause top-level AND-component equalities can be used. It's also considered that between parts of the equalities no implicit conversions should be used. 1. Consider the SELECT: SELECT * FROM R WHERE R.A = R.B; where R - is a relation, A and B are R attributes It can be said that: R.A -> R.B and R.B -> R.A 2. Consider the SELECT: SELECT * FROM R WHERE R.A = F(R.B, R.C); where R - is a relation, A,B,C are attributes and F is FD-function. R.A -> {R.B, R.C} and {R.B, R.C} -> R.A 3. Consider the SELECT: SELECT * FROM R WHERE R.A = R.B AND R.B = R.C; where R - is a relation, A,B,C are attributes. R.A -> R.B and R.B -> R.A R.B -> R.C and R.C -> R.B Using transitivity axiom new functional dependencies can be extracted: R.A -> R.C R.C -> R.A IV. JOIN Let's consider more difficult cases defining the rules for functional dependency extraction when SELECT from multiple relations is used. 1. CROSS JOIN Consider the SELECT : SELECT * FROM OP1 CROSS JOIN OP2; where OP1 and OP2 are relations. By definition, all functional dependencies from OP1 and from OP2 definitions are true in the result of OP1 and OP2 cross join. 2. INNER JOIN Inner JOIN rules for functional dependency extraction are the same as for the CROSS JOIN (1). Additionally, ON clause top-level AND-component equalities can be used for extraction of new functional dependencies. Similarly with WHERE -clause, between parts of the equalities no implicit conversions should be used. Consider the SELECT : SELECT * FROM OP1 INNER JOIN OP2 ON OP1.A = OP2.C WHERE OP1.A = OP2.B; where A is an attribute from OP1, B, C are attributes from OP2. From the WHERE clause these functional dependencies can be extracted: OP1.A -> OP2.B and OP2.B -> OP1.A From ON clause equality (similar with extraction from WHERE ): OP1.A -> OP2.C and OP2.C -> OP1.A So, four functional dependencies have been extracted. Using transitivity axiom another two can be extracted from them: OP2.C -> OP2.B and OP2.C -> OP2.B 3. LEFT and RIGHT JOIN Let's define LEFT and RIGHT JOIN definitions: 1. Left JOIN OP1 LEFT JOIN OP2 ON ... where OP1 is called strong side, OP2 - weak side of LEFT JOIN. 2. RIGHT JOIN OP1 RIGHT JOIN OP2 ON ... where OP1 is called weak side, OP2 - strong side of RIGHT JOIN. To extract functional dependencies when LEFT or RIGHT JOIN is used CROSS JOIN (1) rules can be used. Additionally, functional dependencies can be extracted from ON-clause top-level AND-component equalities but only if the equality doesn't contain strong side attributes that aren't participating in any already existing functional dependency rules. Consider the queries: create table t1 (a int primary key , b int ); create table t2 (x int primary key , y int ;   insert into t1 values (1,1), (2,1); insert into t2 values (1,2), (2,3); a. select t1.a, t1.b from t1 left join t2 on t1.a = t1.b;   + ------+------+ | t1.a | t1.b | + ------+------+ | 1 | 1 | + ------+------+ | 1 | 1 | + ------+------+ | 2 | 1 | + ------+------+ From the result of the query it can be seen that t1.a and t1.b are not functionally dependent (because for t1.b = 1 there are two t1.a different values). t1.a is participating in the functional dependency as a primary key, but t1.b is not participating in any functional dependency rule. b. select t1.a, t2.y from t1 left join t2 on t1.a = t2.y;   + ------+------+ | t1.a | t2.y | + ------+------+ | 1 | 1 | + ------+------+ | 2 | 2 | + ------+------+ Here t1.a -> t2.y and t2.y -> t1.a functional dependencies can be extracted. t1.a is a primary key, and t2.y is an attribute from the weak side of the LEFT JOIN . 4. Derived tables, views. In the case of the derived table or view it can be considered as a relation in the SELECT where it is used. So, all functional dependencies that can be extracted from the definition of the derived table/view can be used in the SELECT where the derived table/view is used for extracting new functional dependencies. V. SELECT list projections. 1. General rules As projections formulate the final list of the result attributes there is no need to extract functional dependencies which parts are not participating in the projections. It's important to mention that only attributes that participate in functional dependencies can be used in SELECT list when GROUP BY is used. Consider the example: SELECT R.A FROM R WHERE R.B = R.C GROUP BY R.A, R.B Here the set of attributes is: {R.A, R.B, R.C} Functional dependencies are: R.B -> R.C As projections include only R.A attribute, set of attributes can be shortened to {R.A} and functional dependencies list will be empty. 2. Expressions in the projections list It’s important to mention that expressions used in the projection list can be considered as virtual columns and the same rules as for the virtual column should be applied to this expression. For details go to II.3. Consider the example: SELECT R.A + R.B FROM R GROUP BY R.A, R.B As addition is an FD-function there is a functional dependency: { R.A, R.B} -> {R.A + R.B } VI. GROUP BY + HAVING 1. GROUP BY Attributes that participate in the GROUP BY can be considered as an initial set for functional dependencies extraction. From this set functionally dependent attributes can be extracted recursively. a. Consider the query: SELECT R.A FROM R WHERE R.A = R.B GROUP BY R.A; where R - is a relation, A,B - are attributes of R. Here R.A is a GROUP BY field and can be considered as an initial set for functional dependency extraction. Initial set: {R.A} Using WHERE clause equality: R.A -> R.B b. Consider the query: SELECT R.A FROM R WHERE R.A = R.B AND R.PK = R.C GROUP BY R.A; where R - is a relation, A,B,C - are attributes of R, PK - is a primary key in R. Similarly with a.: Initial set: {R.A} WHERE clause equality helps to extract: R.A -> R.B So, now the set of attributes from where functional dependencies can be extracted is: {R.A, R.B} Despite the fact that R.PK is a primary key it can't participate in functional dependencies extraction and the set remains the same. c. Consider the query: SELECT R.A FROM R WHERE R.A = R.B GROUP BY R.PK; where R - is a relation, A,B - are attributes of R, PK - is a primary key in R. The initial set of attributes from GROUP BY is: {R.PK} R.PK is a primary key of R. So, using the properties of primary keys (II.1), all attributes of the relation R are functionally dependent from the R.PK. So, the set of attributes now is: {R.PK, R.A, R.B} 2. HAVING HAVING can be used only with GROUP BY in the query. HAVING top-level AND-component equalities can be used for new functional dependencies extraction. However, attributes that participate in HAVING should be in the initial set or been extracted through functional dependencies. Consider the example: SELECT R.A, R.C FROM R WHERE R.B = R.C GROUP BY R.A, R.B HAVING R.A = R.C; where R - is a relation, A,B,C - are attributes of R. Initial set for functional dependency extraction is: {R.A, R.B} - attributes that participate in GROUP BY From WHERE-clause R.C can be added to the set through the functional dependency with R.B: R.B -> R.C Set: {R.A, R.B, R.C} Using HAVING -clause equality new functional dependency can be extracted (as both parts of the equality are in the set). Functional dependencies R.A -> R.B, R.B -> R.C and transitivity axioma lead to the new functional dependency: R.A -> R.C =============== The code written before for this MDEV is located here and currently can't be reproduced on 11.6.

            People

              psergei Sergei Petrunia
              CodeSatori Markus A.O. Loponen
              Votes:
              17 Vote for this issue
              Watchers:
              25 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.