Details

    Description

      Make it possible for a storage engine to take any query fragment to be executed internally in the engine. In particular, it should be possible to push joins, GROUP BY, ORDER BY, WHERE, HAVING.

      Assorted thoughts:

      • this is step 1, only do simple cases, see below.
      • only push down joins when all tables in a query are in the same storage engine
      • interface to the server: replace all that with a pseudo-table? with a handler?
      • PS/SP reexecution — restore or keep? probably, restore. easier for the engine. and we rerun optimizer anyway
      • optimizer: no changes, in step 1 there is no cost estimation for pushed down joins.
      • EXPLAIN:
        | 1 | t1 | ... | pushed down
        | 2 | t2 | ... | pushed down
        | 3 | <PUSHDOWN RESULT> | ... | for 1 and 2
      • UNION — don't, every UNION part is handled separately
      • derived tables, views: decide to materialize or merge, as usual. if merged — push together with the upper join, if materialized — push separately
      • subqueries: if materialized — push separately, if semijoin — push with the upper join, otherwise — don't.
      • partitioning: not affected
      • use FederatedX as a PoC

      Attachments

        Issue Links

          Activity

            igor Igor Babaev (Inactive) added a comment - - edited

            Here's a suggested scheme of the implementation of this feature.

            Let Q be an n-way join query of tables such that some of tables belong to the engines that can execute join operations SELECT ... FROM t[0],...,t[n] WHERE P(t[0],...,t[n]).
            Consider a possible join order t[i[0]],...,t[i[n]] and let T[0],...T[k] be partitioning of this join order such that any
            partition contains only tables from the same engine and no neighbors has tables from the same engine.
            Let's also assume that for any two neighbors only one belongs to the engine that can execute join by its own means while the other executes joins by MariaDB SQL processor. Let's assume that T[1] belongs to the engine that can execute join by its own means. Then any odd partition also belongs to such an engine while joins of any even partition are executed by SQL processor.
            Let partition T[j] consists of tables t[j][0],...t[j][lj]
            It's obvious that query Q is equivalent to the following query

            SELECT ... FROM
               (SELECT (...
                  (SELECT ...FROM
                      (SELECT ... FROM t[0][0],...,t[0][l0]) AS dt[1],
                      t[1][0],...,t[1][l1]) AS dt[2] ...) AS dtk, t[k][0],...,t[k][lk]
             WHERE P(t[0],...,t[n]).
            

            In this query first we define a derived table dt[1] that joins tables from T[0]. Then we define the derived table dt[2] that joins dt[1] and tables from T[1]. The derived table dt[j+1] joins the derived table dt[j] with tables from T[j].

            igor Igor Babaev (Inactive) added a comment - - edited Here's a suggested scheme of the implementation of this feature. Let Q be an n-way join query of tables such that some of tables belong to the engines that can execute join operations SELECT ... FROM t [0] ,...,t [n] WHERE P(t [0] ,...,t [n] ). Consider a possible join order t[i [0] ],...,t[i [n] ] and let T [0] ,...T [k] be partitioning of this join order such that any partition contains only tables from the same engine and no neighbors has tables from the same engine. Let's also assume that for any two neighbors only one belongs to the engine that can execute join by its own means while the other executes joins by MariaDB SQL processor. Let's assume that T [1] belongs to the engine that can execute join by its own means. Then any odd partition also belongs to such an engine while joins of any even partition are executed by SQL processor. Let partition T [j] consists of tables t [j] [0] ,...t [j] [lj] It's obvious that query Q is equivalent to the following query SELECT ... FROM (SELECT (... (SELECT ...FROM (SELECT ... FROM t[0][0],...,t[0][l0]) AS dt[1], t[1][0],...,t[1][l1]) AS dt[2] ...) AS dtk, t[k][0],...,t[k][lk] WHERE P(t[0],...,t[n]). In this query first we define a derived table dt [1] that joins tables from T [0] . Then we define the derived table dt [2] that joins dt [1] and tables from T [1] . The derived table dt [j+1] joins the derived table dt [j] with tables from T [j] .

            Another scheme just encapsulate tables from odd partitions into derived tables and pushes conditions
            into these derived tables

            SELECT ... FROM
            ...
            (SELECT ... FROM t[k][0],...,t[k][lk] WHERE P(t[k][0],...,t[k][lk])) as dtk
            ...
            WHERE P(t[0],...,t[n])
            

            Here P(t[k][0],...,t[k][lk])) is the condition over t[k][0],...,t[k][lk] that can be extracted from P(t[0],...,t[n]).

            With this scheme we rather push into the engine SELECTs that specify derived tables.
            This well fits into the scheme how SELECT queries are handled by ColumnStore.

            igor Igor Babaev (Inactive) added a comment - Another scheme just encapsulate tables from odd partitions into derived tables and pushes conditions into these derived tables SELECT ... FROM ... (SELECT ... FROM t[k][0],...,t[k][lk] WHERE P(t[k][0],...,t[k][lk])) as dtk ... WHERE P(t[0],...,t[n]) Here P(t [k] [0] ,...,t [k] [lk] )) is the condition over t [k] [0] ,...,t [k] [lk] that can be extracted from P(t [0] ,...,t [n] ). With this scheme we rather push into the engine SELECTs that specify derived tables. This well fits into the scheme how SELECT queries are handled by ColumnStore.
            igor Igor Babaev (Inactive) added a comment - - edited

            Let's call a derived table External Derived Table (EDT) if it is specified by a SELECT over tables belonging to the same external database engine capable to execute SQL operations.
            If an EDT is used in a query we always consider it as a materialized derived table whose materialization is performed by an external engine.

            igor Igor Babaev (Inactive) added a comment - - edited Let's call a derived table External Derived Table (EDT) if it is specified by a SELECT over tables belonging to the same external database engine capable to execute SQL operations. If an EDT is used in a query we always consider it as a materialized derived table whose materialization is performed by an external engine.

            Some observations
            1. Rows of a EDT for ColumnStore engine could be produced in the same way as the result set for a select query.
            2. To process EDT the server needs a special function to fill the EDT that would use the iterator function over the rows of EDT
            3. The optimizer has to know the cardinality of EDT. This cardinality either calculated by the external engine or is calculated by the optimizer itself employing the statistical data on the tables and columns referenced in the specification of EDT.

            igor Igor Babaev (Inactive) added a comment - Some observations 1. Rows of a EDT for ColumnStore engine could be produced in the same way as the result set for a select query. 2. To process EDT the server needs a special function to fill the EDT that would use the iterator function over the rows of EDT 3. The optimizer has to know the cardinality of EDT. This cardinality either calculated by the external engine or is calculated by the optimizer itself employing the statistical data on the tables and columns referenced in the specification of EDT.
            igor Igor Babaev (Inactive) added a comment - - edited

            I spinned off MDEV-17096 "Pushdown of derived tables to storage engines".
            This is the first step to resolve of the problem of pushing joins into storage engines.
            This is probably the only task that realistically can be done within 10.4.

            igor Igor Babaev (Inactive) added a comment - - edited I spinned off MDEV-17096 "Pushdown of derived tables to storage engines". This is the first step to resolve of the problem of pushing joins into storage engines. This is probably the only task that realistically can be done within 10.4.

            People

              Unassigned Unassigned
              serg Sergei Golubchik
              Votes:
              3 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.