[MDEV-13345] Elimination of materialized view with GROUP BY in equi-join on grouping fields Created: 2017-07-18  Updated: 2023-04-11

Status: Stalled
Project: MariaDB Server
Component/s: None
Fix Version/s: None

Type: Task Priority: Major
Reporter: Igor Babaev Assignee: Igor Babaev
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-13369 Optimization for equi-joins of derive... Closed

 Description   

If a query contains a left join where the inner table is materialized view or derived table with GROUP BY and join operation is an equi-jion on grouping fields then the following
transformation can be applied to the query:

  • the derived is eliminated from the query
  • the subquery from the view / derived table specification with WHERE condition enforced by join condition replaces the references to the fields of the view / derived table.

Here is an example of such transformation:

create table t1 (a int);
insert into t1 values (5), (1), (2), (9), (7), (2), (7);
create table t2 (a int, b int, index idx(a));
insert into t2 values (7,10), (1,20), (2,23), (7,18), (1,30), (4,71), (3,15), (7,82);
 
select t1.a,max from t1 left join (select a, max(t2.b) max from t2 group by t2.a) t on t1.a=t.a;
=> 
select t1.a, (select max(t2.b) max from t2 where t2.a=t1.a) max from t1;

This re-writing can be done on the SQL level: both queries are valid SQL queries and they are equivalent.
In more general case:

select t1.a,max,min from t1 left join (select a, max(t2.b) max, min(t2.b) min from t2 group by t2.a) t on t1.a=t.a;
=>
select t1.a, (select max(t2.b) max, min(t2.b) min from t2 where t2.a=t1.a)  from t1;

the transformation produces an illegal query. However, it's obvious how this query can be executed.

The transformation with small changes can be applied to join queries as well.

This transformation allows to avoid materialization of grouping views / derived tables and improve the performance of many equi-join queries that use these views / derived tables.



 Comments   
Comment by Igor Babaev [ 2017-07-20 ]

Here's the plan how this optimization can be implemented.

  • Find the optimal join order for the main query
  • Check whether the optimization can be applied to this join order:
    • Check that no tables are accessed using the fields of the derived table
    • Check that grouping fields belong to the first table of the plan chosen for derived table
    • Check that the GROUP BY list of the materialized derived table is compatible with some index
    • The elements of the GROUP BY list are contained in the SELECT list of the derived table
    • There are equi-join predicates for all fields the GROUP BY list of the form outer_field=field_from_list
    • There are no join conditions that refer non-grouping fields of the derived table
  • If the optimization is applicable do the re-writing for this optimization just before the call of
    get_best_combination():
    • Create Item_single_row_subselect out of the SELECT of the derived table
    • Add the join conditions for the derived table to the WHERE clause of this SELECT replacing the references the fields of the derived table produced by grouping fields for the references to the fields themselves.
    • Wrap the Item_single_row_subselect object into a special function that would call bring_value() for the object
    • Prepare this function and push it in front of the hidden fields of the SELECT list of the main query
    • Set references of the items referring to the fields of the derived tables to the items of the created Item_single_row_subselect.
    • Optimize the subquery of the Item_single_row_subselect object
  • Remove the derived table from the join order of the main query.
Generated at Thu Feb 08 08:04:52 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.