[MDEV-11588] Support for ONLY_FULL_GROUP_BY functional dependency Created: 2016-12-16 Updated: 2023-11-30 |
|
| Status: | Stalled |
| Project: | MariaDB Server |
| Component/s: | Parser |
| Fix Version/s: | None |
| Type: | New Feature | Priority: | Major |
| Reporter: | Markus A.O. Loponen | Assignee: | Igor Babaev |
| Resolution: | Unresolved | Votes: | 15 |
| Labels: | Compatibility, MySQL_575, gsoc17, maybe-10.7, only_full_group_by, sql_mode | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
When SQL mode ONLY_FULL_GROUP_BY is enabled (as is default), MariaDB rejects, per the '92 SQL standard, any SELECT queries with (non-aggregated) columns not also listed in the GROUP BY statement with error 1055: "'db.table.column_name' isn't in GROUP_BY". However, the 2003 spec loosens this restriction with a requirement that the columns need only be functionally dependent: "17) If T is a grouped table, then let G be the set of grouping columns of T. In each ((value expression)) contained in ((select list)) , each column reference that references a column of T shall reference some column C that is functionally dependent on G or shall be contained in an aggregated argument of a ((set function specification)) whose aggregation query is QS." Since version 5.7.5, available since late 2014, MySQL supports the detection of functional dependencies in GROUP BY statements. It would be good to see this ported over to MariaDB as well to eliminate the need for redundant double column listing (and presumable performance hit when dependent columns are needlessly iterated for grouping), where functional dependency could be detected. For some useful background, see this entry at the MySQLServerTeam blog, along with Roland Bouman's related write-up . Since ONLY_FULL_GROUP_BY is now a default both in Maria and MySQL, not porting this to Maria would make migration from MySQL to Maria more involved, forcing either SQL mode or code changes where affected queries are used. (There's a related older ticket mentioning the ANY_VALUE() function to deal with the over-strict mode, however that alone makes for a clunky work-around to over-strict grouping.) |
| Comments |
| Comment by Galina Shalygina (Inactive) [ 2019-04-26 ] | |||||||||||||||||||||||||||
|
Step I. Simple SELECTs with GROUP BY. Find functional dependencies in SELECTs with inner joins. For each table its determined fields will be stored in special bitmap. Plan: 1*. Check if all SELECT list fields are already found to be determined (or are aggregational functions or constants). 1. For each table create determined fields bitmap. Mark in bitmaps fields used in GROUP BY. 2. Create a list of equalities eq_list from the top level of WHERE clause if it is AND level. Or a single equality list if WHERE clause is an equality. 3. Check if fields from bitmaps participate in any equalities. If so and fields that are equal to them aren't contained in bitmaps - add these new fields in bitmaps. 4. Repeat step 3 until no new fields can't be extracted from equalities to be added to bitmaps. 5. Check if SELECT list fields depend on bitmap fields only (or are aggregational functions or constants). | |||||||||||||||||||||||||||
| Comment by Galina Shalygina (Inactive) [ 2019-05-06 ] | |||||||||||||||||||||||||||
|
6. Support PK, UNIQUE keys in GROUP BY If PK or UNIQUE key fields are used in the GROUP BY or are equal to some fields used in the GROUP BY, all fields of the key table can be used in the SELECT list. 7. Support virtual columns Virtual column is defined as some expression. It can be used in SELECT LIST if it depends on GROUP BY fields only or is used in GROUP BY. | |||||||||||||||||||||||||||
| Comment by Daniel Black [ 2019-05-07 ] | |||||||||||||||||||||||||||
|
I started down a partial solution to this with: https://github.com/MariaDB/server/pull/1036 which removes considerations of group-by items after a unique index however ran out of time to complete. Very common amongst Oracle/Postgresql people who list every group by elements that is in the result set. | |||||||||||||||||||||||||||
| Comment by Galina Shalygina (Inactive) [ 2019-06-25 ] | |||||||||||||||||||||||||||
|
8. Support materialized derived tables and views If some materialized derived table or view field is equal to any GROUP BY field (or to functionally dependent field) of select ‘sel’ where derived table is used, all derived table or view fields are said to be functionally dependent on GROUP BY fields of ‘sel’. 9. Support materialized constant subqueries in SELECT list, WHERE clause, HAVING clause. 9.1.1 Parent select (PS) is with GROUP BY. | |||||||||||||||||||||||||||
| Comment by Galina Shalygina (Inactive) [ 2019-06-25 ] | |||||||||||||||||||||||||||
|
Here is the redesigned plan of how SELECT is checked to be ‘allowed’: 1. For each table create ‘allowed’ fields bitmap. Mark in bitmaps fields used in GROUP BY. 2. Check WHERE clause fields if they can be used (look in 9 section). 3. Create a list of equalities eq_list from the top level of WHERE clause if it is AND level. Or a single equality list if WHERE clause is an equality. Let’s call it eq_list. 4. For all equalities extracted from eq_list: Check if fields from bitmaps participate in equality. 5. Repeat step 4 until no new fields can be extracted from equalities to be added to bitmaps. 6. Check again if Primary or Unique key fields become ‘allowed’. 7. Check if SELECT list and HAVING fields depend on bitmap fields only (or are aggregational functions or constants). | |||||||||||||||||||||||||||
| Comment by Galina Shalygina (Inactive) [ 2019-07-12 ] | |||||||||||||||||||||||||||
|
What was done the previous week (07-02 - 07-09): | |||||||||||||||||||||||||||
| Comment by Galina Shalygina (Inactive) [ 2019-07-13 ] | |||||||||||||||||||||||||||
|
What was done this week (07-10 - 07-12): | |||||||||||||||||||||||||||
| Comment by Galina Shalygina (Inactive) [ 2019-07-13 ] | |||||||||||||||||||||||||||
|
What is still missing: 1. DATETIME, DATE, TIME operations and functions proper handling | |||||||||||||||||||||||||||
| Comment by Galina Shalygina (Inactive) [ 2019-07-23 ] | |||||||||||||||||||||||||||
|
What was done this week (07-17 - 07-23): 1. New methods were added on prepare stage to check if functions used in the query return deterministic results. This information can be used in extracting new functional dependencies from WHERE clause equalities. If there is some function in equality that returns non-deterministic result then this equality can't be used for extraction of new functional dependent fields. In progress. | |||||||||||||||||||||||||||
| Comment by Galina Shalygina (Inactive) [ 2019-07-30 ] | |||||||||||||||||||||||||||
|
Stage II. How LEFT/RIGHT/OUTER joins should be handled: Go through TABLE_LISTs (tables or nests) in FROM clause starting from the most outer to the most inner. The most outer TABLE_LIST is the leftmost table (functional dependencies are checked after simplify_joins() call). Repeat until all TABLE_LISTs in the FROM clause are processed: 1. Mark GROUP BY fields from tbl as 'allowed'. Consider a query:
How the algorithm should work:
All fields in SELECT list are allowed to be used. | |||||||||||||||||||||||||||
| Comment by Galina Shalygina (Inactive) [ 2019-07-30 ] | |||||||||||||||||||||||||||
|
What was done the previous week (07-23 - 07-30): 1. Stage 1 was ended and sent on review to Igor | |||||||||||||||||||||||||||
| Comment by Galina Shalygina (Inactive) [ 2019-08-06 ] | |||||||||||||||||||||||||||
|
What was done the previous week (07-30 - 08-06): After discussion with Igor it was decided to change LEFT/RIGHT/OUTER join handling.
Current implementation is done in such way: Consider a query:
1. Consider the most outer join t1 LEFT JOIN t2 ON (t1.a = t2.e) What is also done: 1. MySQL tests were ran. This highlighted some current implementation limitations: 2. ORDER BY handling was added, some bugs were fixed. | |||||||||||||||||||||||||||
| Comment by Galina Shalygina (Inactive) [ 2019-09-03 ] | |||||||||||||||||||||||||||
|
What was done the previous week (08-26 - 09-03): 1. Removed set subquery context methods. Now subquery context is taken the same way as it is done in fix_outer_field() method. What is still missing: Now MariaDB forbids to use non GROUP BY fields in HAVING. It throughs error before any functional dependencies are checked even if 'ONLY_FULL_GROUP_BY' mode is not set. | |||||||||||||||||||||||||||
| Comment by Roman Stingler (Inactive) [ 2021-03-04 ] | |||||||||||||||||||||||||||
|
Can someone please finish this. |