[MDEV-11588] Support for ONLY_FULL_GROUP_BY functional dependency Created: 2016-12-16  Updated: 2023-11-30

Status: Stalled
Project: MariaDB Server
Component/s: Parser
Fix Version/s: None

Type: New Feature Priority: Major
Reporter: Markus A.O. Loponen Assignee: Igor Babaev
Resolution: Unresolved Votes: 15
Labels: Compatibility, MySQL_575, gsoc17, maybe-10.7, only_full_group_by, sql_mode

Issue Links:
Blocks
blocks MDEV-28109 ANSI Mode & ONLY_FULL_GROUP_BY Open
Duplicate
is duplicated by MDEV-24239 ONLY_FULL_GROUP_BY not restricting OR... Closed
PartOf
includes MDEV-12867 Full scan despite appropriate index Stalled
includes MDEV-19307 Use functional dependencies when sear... Open
includes MDEV-20541 Implement a flag to show if an expres... Stalled
Relates
relates to MDEV-20409 With ONLY_FULL_GROUP_BY work don't co... Closed
relates to MDEV-25133 sql mode ONLY_FULL_GROUP_BY lets quer... Open

 Description   

When SQL mode ONLY_FULL_GROUP_BY is enabled (as is default), MariaDB rejects, per the '92 SQL standard, any SELECT queries with (non-aggregated) columns not also listed in the GROUP BY statement with error 1055: "'db.table.column_name' isn't in GROUP_BY".

However, the 2003 spec loosens this restriction with a requirement that the columns need only be functionally dependent:

"17) If T is a grouped table, then let G be the set of grouping columns of T. In each ((value expression)) contained in ((select list)) , each column reference that references a column of T shall reference some column C that is functionally dependent on G or shall be contained in an aggregated argument of a ((set function specification)) whose aggregation query is QS."

Since version 5.7.5, available since late 2014, MySQL supports the detection of functional dependencies in GROUP BY statements. It would be good to see this ported over to MariaDB as well to eliminate the need for redundant double column listing (and presumable performance hit when dependent columns are needlessly iterated for grouping), where functional dependency could be detected. For some useful background, see this entry at the MySQLServerTeam blog, along with Roland Bouman's related write-up .

Since ONLY_FULL_GROUP_BY is now a default both in Maria and MySQL, not porting this to Maria would make migration from MySQL to Maria more involved, forcing either SQL mode or code changes where affected queries are used. (There's a related older ticket mentioning the ANY_VALUE() function to deal with the over-strict mode, however that alone makes for a clunky work-around to over-strict grouping.)



 Comments   
Comment by Galina Shalygina (Inactive) [ 2019-04-26 ]

Step I.

Simple SELECTs with GROUP BY.

Find functional dependencies in SELECTs with inner joins.

For each table its determined fields will be stored in special bitmap.
Later fields from the select list of the query will be checked if they depend on determined fields only (or are aggregational functions or constants).

Plan:

1*. Check if all SELECT list fields are already found to be determined (or are aggregational functions or constants).

1. For each table create determined fields bitmap. Mark in bitmaps fields used in GROUP BY.
Remark: ignore expressions in GROUP BY (a+b, a*b, ...). Process them as how they are processed now in ONLY_FULL_GROUP_BY mode.
Further plans: create for such an expressions invisible virtual columns.

2. Create a list of equalities eq_list from the top level of WHERE clause if it is AND level. Or a single equality list if WHERE clause is an equality.

3. Check if fields from bitmaps participate in any equalities. If so and fields that are equal to them aren't contained in bitmaps - add these new fields in bitmaps.
If both equality fields are already in bitmaps - drop this equality from eq_list.

4. Repeat step 3 until no new fields can't be extracted from equalities to be added to bitmaps.

5. Check if SELECT list fields depend on bitmap fields only (or are aggregational functions or constants).

Comment by Galina Shalygina (Inactive) [ 2019-05-06 ]

6. Support PK, UNIQUE keys in GROUP BY

If PK or UNIQUE key fields are used in the GROUP BY or are equal to some fields used in the GROUP BY, all fields of the key table can be used in the SELECT list.

7. Support virtual columns

Virtual column is defined as some expression. It can be used in SELECT LIST if it depends on GROUP BY fields only or is used in GROUP BY.

Comment by Daniel Black [ 2019-05-07 ]

I started down a partial solution to this with: https://github.com/MariaDB/server/pull/1036 which removes considerations of group-by items after a unique index however ran out of time to complete. Very common amongst Oracle/Postgresql people who list every group by elements that is in the result set.

Comment by Galina Shalygina (Inactive) [ 2019-06-25 ]

8. Support materialized derived tables and views

If some materialized derived table or view field is equal to any GROUP BY field (or to functionally dependent field) of select ‘sel’ where derived table is used, all derived table or view fields are said to be functionally dependent on GROUP BY fields of ‘sel’.

9. Support materialized constant subqueries in SELECT list, WHERE clause, HAVING clause.

9.1.1 Parent select (PS) is with GROUP BY.
9.1.1 Consider constant subquery with GROUP BY which is used in SELECT list or HAVING of PS. Let this subquery depend on PS fields p_fld. It should be checked that these p_fld are GROUP BY fields or fields that are functionally dependent on GROUP BY fields of PS.
9.1.2 Consider constant subquery with GROUP BY which is used in WHERE of PS. Then subquery can use all fields of PS.
9.2 Parent select (PS) with no GROUP BY
9.2.1 Consider constant subquery with GROUP BY which is used in HAVING of PS. Then none of the fields of PS can be used in subquery.
9.2.2 Consider constant subquery with GROUP BY and no HAVING clause which is used in SELECT list of PS. Then subquery can use all fields of PS.
9.2.3 Consider constant subquery with GROUP BY and HAVING clause which is used in SELECT list and there is no GROUP BY in PS. Then none of the fields of PS can be used in subquery.
9.2.3 Consider constant subquery with GROUP BY which is used in WHERE and there is no GROUP BY in PS. Then subquery can use all fields of PS.
9.3 No aggregation functions can be used in WHERE clause constant subqueries with GROUP BY.

Comment by Galina Shalygina (Inactive) [ 2019-06-25 ]

Here is the redesigned plan of how SELECT is checked to be ‘allowed’:

1. For each table create ‘allowed’ fields bitmap. Mark in bitmaps fields used in GROUP BY.
1.1 Check if all Primary or Unique key fields are used in GROUP BY. If so - mark all fields of the table where this key is defined as ‘allowed’.
1.2 Check if any materialized derived tables fields are used in GROUP BY. If so - mark all derived table or view fields as allowed.

2. Check WHERE clause fields if they can be used (look in 9 section).

3. Create a list of equalities eq_list from the top level of WHERE clause if it is AND level. Or a single equality list if WHERE clause is an equality. Let’s call it eq_list.

4. For all equalities extracted from eq_list: Check if fields from bitmaps participate in equality.
4.1 Consider equality:
f11 = g(f21, .., f2n, c) (1)
where f11 is not in ‘allowed’ bitmaps, f21,...,f2n are in allowed bitmaps, c is some constant and g is some function.
From this equality f11 can be said to be ‘allowed’ if its context and f21,..,f2n contexts don’t conflict. Context here is defined as context in propagate_equal_fields().
4.2 All equalities except (1) where f11 is not ‘allowed’ and some of ‘f2*’ is not ‘allowed’ are dropped from eq_list.

5. Repeat step 4 until no new fields can be extracted from equalities to be added to bitmaps.

6. Check again if Primary or Unique key fields become ‘allowed’.
Also check if some of materialized derived table or view fields become allowed. If any of this is true and there are still not used equalities, go to 4.

7. Check if SELECT list and HAVING fields depend on bitmap fields only (or are aggregational functions or constants).

Comment by Galina Shalygina (Inactive) [ 2019-07-12 ]

What was done the previous week (07-02 - 07-09):
1. I ran MySQL test on functional dependencies.
Some mistakes and missing cases were found.
2. Added:
a. When a field is equal to some constant in the WHERE clause (it is constant field), then it is allowed to use this field in SELECT list and HAVING clause.
b. When some subquery field is equal to parent SELECT field then it is allowed to use this subquery field in SELECT list and HAVING clause.
3. Changed the way how information about subqueries places in parent SELECT is stored (where subqueries are situated in the parent SELECT: in SELECT list, HAVING, WHERE, ...). This information was stored in bitmaps (MY_BITMAP) and Igor suggested to remove them and use st_select_lex::context_analysis_place.

Comment by Galina Shalygina (Inactive) [ 2019-07-13 ]

What was done this week (07-10 - 07-12):
1. Code was changed and simplified as fields Context information was removed. Additional methods were added for correct work of new 'allowed' fields extraction.
2. Code was cleaned and self-reviewed
2. Comments were rewritten.
3. Tests were cleaned.

Comment by Galina Shalygina (Inactive) [ 2019-07-13 ]

What is still missing:

1. DATETIME, DATE, TIME operations and functions proper handling
2. Some other functions maybe missing (need double check)
3. Run MySQL test again
4. OUTER, LEFT, RIGHT joins support
5. More tests

Comment by Galina Shalygina (Inactive) [ 2019-07-23 ]

What was done this week (07-17 - 07-23):

1. New methods were added on prepare stage to check if functions used in the query return deterministic results. This information can be used in extracting new functional dependencies from WHERE clause equalities. If there is some function in equality that returns non-deterministic result then this equality can't be used for extraction of new functional dependent fields. In progress.
2. OUTER, LEFT, RIGHT join support was added. In progress, maybe still some cases missing.

Comment by Galina Shalygina (Inactive) [ 2019-07-30 ]

Stage II.

How LEFT/RIGHT/OUTER joins should be handled:

Go through TABLE_LISTs (tables or nests) in FROM clause starting from the most outer to the most inner. The most outer TABLE_LIST is the leftmost table (functional dependencies are checked after simplify_joins() call).
Let's call the TABLE_LIST under consideration tbl.

Repeat until all TABLE_LISTs in the FROM clause are processed:

1. Mark GROUP BY fields from tbl as 'allowed'.
These fields can be used in extraction of functionally dependent fields.
2.1. Find WHERE clause equalities that use 'allowed' fields, fields of tbl and/or constants.
Let's call set of these equalities 'tbl_conds'.
2.2. If tbl is used as dependent table in some join (right table in LEFT join) then add to tbl_conds equalities used in ON expression of this join.
3. Try to extract fields that are functionally dependent on 'allowed' fields from tbl_conds.
Here the first stage code can be used.

Consider a query:

SELECT t1.a,t2.c,t2.d
FROM t1
LEFT JOIN t2
  ON t1.a = t2.d
WHERE t1.a = 2 AND t2.b = t2.c
GROUP BY t2.b;

How the algorithm should work:

 
a. Consider t1 (as the outermost table).
1.  'allowed'             : {}
    'tbl_conds'           : {}
2.1 'allowed'             : {}
    'tbl_conds'           : {(t1.a = 2)}
2.2 'allowed'             : {} 
    'tbl_conds'           : {(t1.a = 2)}
After 3. 'allowed'        : {t1.a}
         'tbl_conds'      : {}
 
b. Consider t2.
1.  'allowed'             : {t1.a,t2.b}
    'tbl_conds'           : {}
2.1 'allowed'             : {t1.a,t2.b}
    'tbl_conds'           : {(t2.b = t2.c)}
2.2 'allowed'             : {t1.a,t2.b}
    'tbl_conds'           : {(t2.b = t2.c), (t1.a = t2.d)}
After 3. 'allowed'        : {t1.a,t2.c,t2.d}
         'tbl_conds'      : {}

All fields in SELECT list are allowed to be used.

Comment by Galina Shalygina (Inactive) [ 2019-07-30 ]

What was done the previous week (07-23 - 07-30):

1. Stage 1 was ended and sent on review to Igor
2. LEFT/RIGHT/OUTER joins handling code is in process
3. MySQL tests on functional dependencies were ran and highlighted some bugs

Comment by Galina Shalygina (Inactive) [ 2019-08-06 ]

What was done the previous week (07-30 - 08-06):

After discussion with Igor it was decided to change LEFT/RIGHT/OUTER join handling.

  • Dependent maps used in implementations were changed on recursion function
  • Code was entirely rewritten.

Current implementation is done in such way:

Consider a query:

SELECT t3.x
FROM t1 LEFT JOIN t2 ON (t1.a = t2.e AND t1.c = 3)
                    LEFT JOIN t3 ON (t2.e = t3.x AND t1.a = t2.f)
WHERE t1.a = 2
GROUP BY t1.b;

1. Consider the most outer join t1 LEFT JOIN t2 ON (t1.a = t2.e)
1.1.1. Consider table t1 (the most outer table in this join) and find in WHERE clause equalities that use t1 only.
1.1.2. There is such an equality: t1.a = 2. From this equality can be said that t1.a is ‘allowed’.
1.2.1. Consider table t2
1.2.2. Take this LEFT JOIN on expression part: (t1.a = t2.e AND t1.c = 3)
1.2.3. Extract from this ON expression part equalities that depend on t2 and t1. Important! Take only those equalities that depend on t2. So (t1.c = 3) will not be taken in account.
1.2.4. Take WHERE clause equalities depend on t1,t2.
1.2.5. Try to extract new ‘allowed’ field. From (t1.a = t2.e) can be said that t2.e is allowed.
2. Consider t3.
2.1. Take on expression of the t3 LEFT join: (t2.e = t3.x AND t1.a = t2.f)
2.2. Extract from this ON expression part equalities depend on t1,t2,t3. Don’t take equalities that depend on t1 and t2 (or both) only. So (t1.a = t2.f) will not be taken.
2.3. Take WHERE clause equalities depend on t1,t2,t3.
2.4. Try to extract new ‘allowed’ field. From (t2.e = t3.x) can be said that t3.x is allowed.

What is also done:

1. MySQL tests were ran. This highlighted some current implementation limitations:
1.1. Equalities that use ROW items are not handled.
1.2. Materialized derived tables/views unique/primary keys are not used
1.3. Simple equalities like number * no_dep_field = dep_field are not handled.
1.4. Can't handle: select a from v1 where a=b group by 2*t1.a;
1.5. Can't handle "group by "";" with on expressions.

2. ORDER BY handling was added, some bugs were fixed.

Comment by Galina Shalygina (Inactive) [ 2019-09-03 ]

What was done the previous week (08-26 - 09-03):

1. Removed set subquery context methods. Now subquery context is taken the same way as it is done in fix_outer_field() method.
2. Cleaned code
3. Rewrote comments
4. Added new tests and removed useless tests
5. Removed methods that check if function arguments are of the allowed types (e.g. MINITE(arg), arg should be DATE or DATETIME).
6. Added new methods that check if functions return deterministic result (string functions like LENGTH for varchar). (Deterministic here: Deterministic function here is a function that returns the same
result for equal input sets.)
7. Forbid to use control flow functions in SELECT list, HAVING and ORDER BY if GROUP BY is used.
8. Clarified with Igor what conditions should be met when functionally dependent field is tried to be extracted from the ON expression equality predicates.

What is still missing:

Now MariaDB forbids to use non GROUP BY fields in HAVING. It throughs error before any functional dependencies are checked even if 'ONLY_FULL_GROUP_BY' mode is not set.
This code should be changed so functionally dependent fields can be used in HAVING.

Comment by Roman Stingler (Inactive) [ 2021-03-04 ]

Can someone please finish this.
There are hundreds of started features but so many get inactive.

Generated at Thu Feb 08 07:51:07 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.