[MDEV-9062] join pushdown to storage engines Created: 2015-11-02  Updated: 2023-11-30

Status: Open
Project: MariaDB Server
Component/s: Optimizer, Plugins
Fix Version/s: None

Type: New Feature Priority: Major
Reporter: Sergei Golubchik Assignee: Igor Babaev
Resolution: Unresolved Votes: 3
Labels: smart_engine

Issue Links:
Blocks
is blocked by MDEV-17096 Pushdown of simple derived tables to ... Closed
PartOf
is part of MDEV-22366 Changes in optimizer to allow better ... Open
Relates
relates to MDEV-16502 Join pushdown to something like an IN... Open

 Description   

Make it possible for a storage engine to take any query fragment to be executed internally in the engine. In particular, it should be possible to push joins, GROUP BY, ORDER BY, WHERE, HAVING.

Assorted thoughts:

  • this is step 1, only do simple cases, see below.
  • only push down joins when all tables in a query are in the same storage engine
  • interface to the server: replace all that with a pseudo-table? with a handler?
  • PS/SP reexecution — restore or keep? probably, restore. easier for the engine. and we rerun optimizer anyway
  • optimizer: no changes, in step 1 there is no cost estimation for pushed down joins.
  • EXPLAIN:
    | 1 | t1 | ... | pushed down
    | 2 | t2 | ... | pushed down
    | 3 | <PUSHDOWN RESULT> | ... | for 1 and 2
  • UNION — don't, every UNION part is handled separately
  • derived tables, views: decide to materialize or merge, as usual. if merged — push together with the upper join, if materialized — push separately
  • subqueries: if materialized — push separately, if semijoin — push with the upper join, otherwise — don't.
  • partitioning: not affected
  • use FederatedX as a PoC


 Comments   
Comment by Igor Babaev [ 2018-06-22 ]

Here's a suggested scheme of the implementation of this feature.

Let Q be an n-way join query of tables such that some of tables belong to the engines that can execute join operations SELECT ... FROM t[0],...,t[n] WHERE P(t[0],...,t[n]).
Consider a possible join order t[i[0]],...,t[i[n]] and let T[0],...T[k] be partitioning of this join order such that any
partition contains only tables from the same engine and no neighbors has tables from the same engine.
Let's also assume that for any two neighbors only one belongs to the engine that can execute join by its own means while the other executes joins by MariaDB SQL processor. Let's assume that T[1] belongs to the engine that can execute join by its own means. Then any odd partition also belongs to such an engine while joins of any even partition are executed by SQL processor.
Let partition T[j] consists of tables t[j][0],...t[j][lj]
It's obvious that query Q is equivalent to the following query

SELECT ... FROM
   (SELECT (...
      (SELECT ...FROM
          (SELECT ... FROM t[0][0],...,t[0][l0]) AS dt[1],
          t[1][0],...,t[1][l1]) AS dt[2] ...) AS dtk, t[k][0],...,t[k][lk]
 WHERE P(t[0],...,t[n]).

In this query first we define a derived table dt[1] that joins tables from T[0]. Then we define the derived table dt[2] that joins dt[1] and tables from T[1]. The derived table dt[j+1] joins the derived table dt[j] with tables from T[j].

Comment by Igor Babaev [ 2018-08-01 ]

Another scheme just encapsulate tables from odd partitions into derived tables and pushes conditions
into these derived tables

SELECT ... FROM
...
(SELECT ... FROM t[k][0],...,t[k][lk] WHERE P(t[k][0],...,t[k][lk])) as dtk
...
WHERE P(t[0],...,t[n])

Here P(t[k][0],...,t[k][lk])) is the condition over t[k][0],...,t[k][lk] that can be extracted from P(t[0],...,t[n]).

With this scheme we rather push into the engine SELECTs that specify derived tables.
This well fits into the scheme how SELECT queries are handled by ColumnStore.

Comment by Igor Babaev [ 2018-08-01 ]

Let's call a derived table External Derived Table (EDT) if it is specified by a SELECT over tables belonging to the same external database engine capable to execute SQL operations.
If an EDT is used in a query we always consider it as a materialized derived table whose materialization is performed by an external engine.

Comment by Igor Babaev [ 2018-08-08 ]

Some observations
1. Rows of a EDT for ColumnStore engine could be produced in the same way as the result set for a select query.
2. To process EDT the server needs a special function to fill the EDT that would use the iterator function over the rows of EDT
3. The optimizer has to know the cardinality of EDT. This cardinality either calculated by the external engine or is calculated by the optimizer itself employing the statistical data on the tables and columns referenced in the specification of EDT.

Comment by Igor Babaev [ 2018-08-29 ]

I spinned off MDEV-17096 "Pushdown of derived tables to storage engines".
This is the first step to resolve of the problem of pushing joins into storage engines.
This is probably the only task that realistically can be done within 10.4.

Generated at Thu Feb 08 07:31:50 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.