Functional dependencies project description is divided into several steps.
First, the definitions will be introduced.
Further, the examples of functional dependencies cases in relational databases will be provided: starting from the basic case of the examples of functional dependencies in the table definition to the narrowing of functional dependency rules using HAVING clause.
Along with using functional dependencies with GROUP BY in the query, they can be used for omitting additional calculations or creating temporary tables for derived tables.
Consider the example:
create table t1(a int, b int);
|
insert into t1 values (1, 2), (2, 3);
|
|
select d.a
|
from t1 join (
|
select t1.a, max(b) as mb
|
from t1 group by t1.a
|
) as d on t1.b = d.a;
|
In the query it can be seen that the derived table field d.mb is not used in the parent SELECT. So, its calculations can be omitted.
I. Functional dependency definitions
Relation is a set of tuples.
A tuple is a set of attribute values in which no two distinct elements have the same name.
Consider a relation R and sets of attributes A and B in R.
Let “R: A ↦ B” (read “in R, A determines B” or “B is functionally dependent on A in R”) denote the functional dependency of B on A in R, which is true if, for any possible value of R, any two rows that are not distinct for every column in A also are not distinct for every column in B.
Several properties of functional dependencies can be proved. They are called Armstrong's axioms.
Given A, B and C sets of attributes in a relation R, these properties are said to be true:
1. Reflexivity
If B is a subset of A, then A → B.
It can be weakened to: A → {}, where {} - is an empty set of attributes.
2. Transitivity
If A → B and B → C, then A → C
3. Augmentation
If A → B, then AC → BC
Also let's introduce the definition of FD-function that will be used later:
FD-function is a function f for which is true: if two elements are equal, then their FD-function values are also equal:
x1=x2 => f1(x1)=f(x2)
x1=y1, ..., xn=yn => f(x1,...,xn) = f(y1,...,yn)
Example of FD-function:
f = x + 1
|
Simply, for any numeric x1 = x2 => x1 + 1 = x2 + 1
|
Example of non-FD function:
1. f = length(x)
|
|
x1 = "a"
|
x2 = "a "
|
|
length(x1) = length("a") = 1
|
length(x2) = length("a ") = 2
|
|
x1 = x2, but f(x1) != f(x2)
|
|
2. f = concat(x)
|
|
x1 = "a", x2 = "b"
|
y1 = "a ", y2 = "b"
|
|
concat(x1, x2) = "ab"
|
concat(y1, y2) = "a b"
|
|
x1 = y1, x2 = y2, but concat(x1, x2) != concat(y1, y2)
|
II. Functional dependencies in a relation definition.
Relation further will be considered as a query expression body.
1. Primary key
According to the definition, if a relation contains a primary key, then all attributes of this relation are functionally dependent on the primary key.
A → B, where A - is a primary key set of attributes in relation R, B - is a set of all attributes of relation R.
It can be seen through the example:
create table t1 (a int primary key, b int, c int);
|
insert into t1 values (1,2,2), (2,2,2), (3,4,5);
|
|
t1.a -> {t1.a, t1.b, t1.c}
|
2. Unique key
In a similar way, if a relation contains a unique key, then all attributes of this relation are
functionally dependent on this unique key.
A → B, where A - is a unique constraint set of attributes in relation R, B - is a set of all attributes of relation R.
Consider the example:
create table t1 (a int, b int, c int, unique(a,b));
|
insert into t1 values (1,2,2), (2,2,2), (3,4,5);
|
|
{t1.a, t1.b} -> {t1.a, t1.b, t1.c}
|
3. Virtual column
Virtual column is a relation attribute that is represented as a result of the function of some relation attributes and/or constants.
It can be represented this way:
V = F(A), where V - is a virtual column, F - is a function, A - is a set of attributes
It can be said that virtual column is functionally dependent on the attributes set that defines this column only if the function used in the virtual column definition is FD-function:
V = F(A), where F - is FD-function, =>
A -> V
Consider the example:
create table t1 (a int, b int, c int as (a*b) virtual);
|
insert into t1 values (1,3,default), (2,2,default);
|
|
{t1.a, t1.b} -> {t1.a*t1.b} (as multiplication is FD-function)
|
III. Functional dependencies added by using WHERE clause
To extract new functional dependencies WHERE clause top-level AND-component equalities can be used. It's also considered that between parts of the equalities no implicit conversions should be used.
1. Consider the SELECT:
SELECT *
|
FROM R
|
WHERE R.A = R.B;
|
where R - is a relation, A and B are R attributes
It can be said that:
R.A -> R.B and R.B -> R.A
2. Consider the SELECT:
SELECT *
|
FROM R
|
WHERE R.A = F(R.B, R.C);
|
where R - is a relation, A,B,C are attributes and F is FD-function.
R.A -> {R.B, R.C} and {R.B, R.C} -> R.A
3. Consider the SELECT:
SELECT *
|
FROM R
|
WHERE R.A = R.B AND R.B = R.C;
|
where R - is a relation, A,B,C are attributes.
R.A -> R.B and R.B -> R.A
R.B -> R.C and R.C -> R.B
Using transitivity axiom new functional dependencies can be extracted:
R.A -> R.C
R.C -> R.A
IV. JOIN
Let's consider more difficult cases defining the rules for functional dependency extraction when
SELECT from multiple relations is used.
1. CROSS JOIN
Consider the SELECT:
SELECT *
|
FROM OP1 CROSS JOIN OP2;
|
where OP1 and OP2 are relations.
By definition, all functional dependencies from OP1 and from OP2 definitions are true in the result of OP1 and OP2 cross join.
2. INNER JOIN
Inner JOIN rules for functional dependency extraction are the same as for the CROSS JOIN (1).
Additionally, ON clause top-level AND-component equalities can be used for extraction of new functional dependencies. Similarly with WHERE-clause, between parts of the equalities no implicit conversions should be used.
Consider the SELECT:
SELECT *
|
FROM OP1 INNER JOIN OP2
|
ON OP1.A = OP2.C
|
WHERE OP1.A = OP2.B;
|
where A is an attribute from OP1, B, C are attributes from OP2.
From the WHERE clause these functional dependencies can be extracted:
OP1.A -> OP2.B and OP2.B -> OP1.A
From ON clause equality (similar with extraction from WHERE):
OP1.A -> OP2.C and OP2.C -> OP1.A
So, four functional dependencies have been extracted. Using transitivity axiom another two can be extracted from them:
OP2.C -> OP2.B and OP2.C -> OP2.B
3. LEFT and RIGHT JOIN
Let's define LEFT and RIGHT JOIN definitions:
1. Left JOIN
OP1 LEFT JOIN OP2 ON ...
where OP1 is called strong side, OP2 - weak side of LEFT JOIN.
2. RIGHT JOIN
OP1 RIGHT JOIN OP2 ON ...
where OP1 is called weak side, OP2 - strong side of RIGHT JOIN.
To extract functional dependencies when LEFT or RIGHT JOIN is used CROSS JOIN (1) rules can be used.
Additionally, functional dependencies can be extracted from ON-clause top-level AND-component equalities but only if the equality doesn't contain strong side attributes that aren't participating in any already existing functional dependency rules.
Consider the queries:
create table t1 (a int primary key, b int);
|
create table t2 (x int primary key, y int;
|
|
insert into t1 values (1,1), (2,1);
|
insert into t2 values (1,2), (2,3);
|
a.
select t1.a, t1.b
|
from t1 left join t2
|
on t1.a = t1.b;
|
|
+------+------+
|
| t1.a | t1.b |
|
+------+------+
|
| 1 | 1 |
|
+------+------+
|
| 1 | 1 |
|
+------+------+
|
| 2 | 1 |
|
+------+------+
|
From the result of the query it can be seen that t1.a and t1.b are not functionally dependent (because for t1.b = 1 there are two t1.a different values). t1.a is participating in the functional dependency as a primary key, but t1.b is not participating in any functional dependency rule.
b.
select t1.a, t2.y
|
from t1 left join t2
|
on t1.a = t2.y;
|
|
+------+------+
|
| t1.a | t2.y |
|
+------+------+
|
| 1 | 1 |
|
+------+------+
|
| 2 | 2 |
|
+------+------+
|
Here t1.a -> t2.y and t2.y -> t1.a functional dependencies can be extracted.
t1.a is a primary key, and t2.y is an attribute from the weak side of the LEFT JOIN.
4. Derived tables, views.
In the case of the derived table or view it can be considered as a relation in the SELECT
where it is used. So, all functional dependencies that can be extracted from the definition of the derived table/view can be used in the SELECT where the derived table/view is used for extracting new functional dependencies.
V. SELECT list projections.
1. General rules
As projections formulate the final list of the result attributes there is no need to extract functional
dependencies which parts are not participating in the projections.
It's important to mention that only attributes that participate in functional dependencies can be used in SELECT list when GROUP BY is used.
Consider the example:
SELECT R.A
|
FROM R
|
WHERE R.B = R.C
|
GROUP BY R.A, R.B
|
Here the set of attributes is:
{R.A, R.B, R.C}
Functional dependencies are:
R.B -> R.C
As projections include only R.A attribute, set of attributes can be shortened to {R.A} and functional dependencies list will be empty.
2. Expressions in the projections list
It’s important to mention that expressions used in the projection list can be considered as virtual columns and the same rules as for the virtual column should be applied to this expression. For details go to II.3.
Consider the example:
SELECT R.A + R.B
|
FROM R
|
GROUP BY R.A, R.B
|
As addition is an FD-function there is a functional dependency:
{R.A, R.B} -> {R.A + R.B}
VI. GROUP BY + HAVING
1. GROUP BY
Attributes that participate in the GROUP BY can be considered as an initial set for functional dependencies extraction. From this set functionally dependent attributes can be extracted recursively.
a. Consider the query:
SELECT R.A
|
FROM R
|
WHERE R.A = R.B
|
GROUP BY R.A;
|
where R - is a relation, A,B - are attributes of R.
Here R.A is a GROUP BY field and can be considered as an initial set for functional dependency extraction.
Initial set: {R.A}
Using WHERE clause equality: R.A -> R.B
b. Consider the query:
SELECT R.A
|
FROM R
|
WHERE R.A = R.B AND R.PK = R.C
|
GROUP BY R.A;
|
where R - is a relation, A,B,C - are attributes of R, PK - is a primary key in R.
Similarly with a.:
Initial set: {R.A}
WHERE clause equality helps to extract:
R.A -> R.B
So, now the set of attributes from where functional dependencies can be extracted is:
{R.A, R.B}
Despite the fact that R.PK is a primary key it can't participate in functional dependencies
extraction and the set remains the same.
c. Consider the query:
SELECT R.A
|
FROM R
|
WHERE R.A = R.B
|
GROUP BY R.PK;
|
where R - is a relation, A,B - are attributes of R, PK - is a primary key in R.
The initial set of attributes from GROUP BY is:
{R.PK}
R.PK is a primary key of R. So, using the properties of primary keys (II.1), all attributes of
the relation R are functionally dependent from the R.PK. So, the set of attributes now is:
{R.PK, R.A, R.B}
2. HAVING
HAVING can be used only with GROUP BY in the query. HAVING top-level AND-component equalities can be used for new functional dependencies extraction.
However, attributes that participate in HAVING should be in the initial set or been extracted through functional dependencies.
Consider the example:
SELECT R.A, R.C
|
FROM R
|
WHERE R.B = R.C
|
GROUP BY R.A, R.B
|
HAVING R.A = R.C;
|
where R - is a relation, A,B,C - are attributes of R.
Initial set for functional dependency extraction is:
{R.A, R.B} - attributes that participate in GROUP BY
From WHERE-clause R.C can be added to the set through the functional dependency with R.B:
R.B -> R.C
Set: {R.A, R.B, R.C}
Using HAVING-clause equality new functional dependency can be extracted (as both parts of the equality are in the set).
Functional dependencies R.A -> R.B, R.B -> R.C and transitivity axioma lead to the new functional dependency:
R.A -> R.C
===============
The code written before for this MDEV is located here and currently can't be reproduced on 11.6.
What was done the previous week (07-23 - 07-30):
1. Stage 1 was ended and sent on review to Igor
2. LEFT/RIGHT/OUTER joins handling code is in process
3. MySQL tests on functional dependencies were ran and highlighted some bugs