[MDEV-13225] Lateral-like optimization for derived tables with GROUP BY Created: 2017-06-30  Updated: 2023-11-03  Resolved: 2023-11-03

Status: Closed
Project: MariaDB Server
Component/s: N/A
Fix Version/s: N/A

Type: Task Priority: Major
Reporter: Sergei Petrunia Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-13369 Optimization for equi-joins of derive... Closed
relates to MDEV-13369 Optimization for equi-joins of derive... Closed

 Description   

(this is just a draft based on the preliminary discussions with igor).

Consider a query

select ...
from 
  t1,
  (select group_col, ... 
   from t2 
   group by group_col
  ) as TBL
where
  TBL.group_col= t1.col and ...

At the moment, the only way to execute this is to

  • Materialize table TBL (which means enumerate all of the GROUP BY groups)
  • Compute a join between the derived table TBL and t1.

This works poorly when the set of t1.col is such that it only covers a few of groups in t2.

Possible solutions to this:

  • pull the GROUP BY operation out to the parent query
  • ...
  • "Lateral-like" execution for table TBL.

This MDEV is about the latter.

The idea is as follows:

Applicability

When we see a derived table that cannot be merged and has GROUP BY as its top-level operation (as opposed to, say, ORDER BY-LIMIT), we mark it as
Lateral-candidate, and note its group-by columns (this is because conditions on those can be pushed through the group-by operation).

update_ref_and_keys()

update_ref_and_keys also records equalities in form

lateral_candidate.group_col = expr

We will call these Lateral-Parameter-Equalities.

Note that "condition pushdown into derived table" can happen independently of this. Whatever can be pushed down, should be, we are not interested in those conditions here.

Optimization of the subquery

Similar to "subquery materialization vs IN->EXISTS rewrite" choice, we will need to do query optimization for two variants:

1. Optimize for just computing the derived table (to do it once, that is).
2. Optimize for computation of derived table with lateral_candidate.group_col=expr" pushed down into the derived table. Here, expr is constant for the duration
of one computation but its value is not known at the optimization time (TODO: check with Sanja what kind of Item_outer_ref object should be used for them)

Let's call this "Lateral computation".

Optimization of the upper query.

Whenever best_access_path tries to add a Lateral-candidate table into the query plan, it should consider two options:

  • Use it as regular derived table
    = there is a startup cost to populate the derived table
    = then we can read the derived table (possibly using keys when "derived_with_keys" is ON).
  • Use it as "lateral derived table"
    = This is only possible to do when the tables referred from Lateral-Parameter-Equalities are already in the join prefix (similar to ref access)
    = There is no startup cost.
    = For each incoming record combination, "Lateral computation" process happens, after which the table is cleared.

Execution

Igor has mentioned that Recursive CTEs have a framework that allows derived tables to be re-filled on each execution (or at least, not once-per-query).



 Comments   
Comment by Sergei Petrunia [ 2023-11-03 ]

Closing as Duplicate of MDEV-13369.

Generated at Thu Feb 08 08:03:53 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.