Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25080

Allow pushdown of queries involving UNIONs in outer select to foreign engines

    XMLWordPrintable

    Details

      Description

      mysql_union() currently does not support select handler execution. This is causing performance degradation of queries in ColumnStore involving a UNION in an outer select. More details on the issue are in MCOL-4584.

      Current APIs

      Select Handler

      mysql_select()
        join->prepare(...)
        select_lex->pushdown_select= find_select_handler(thd, select_lex);
      

      ...

      JOIN::exec_inner(): 
        if (select_lex->pushdown_select)
        {
          /* Execute the query pushed into a foreign engine */
          error= select_lex->pushdown_select->execute();
        }
      

      This eventually calls:

      bool select_handler::send_data() {
         ...
         if (select->join->result->send_data(result_columns))
      }
      

      which writes the data to query output.

      Derived handler

      JOIN::prepare()
        select_lex->handle_derived(thd->lex, DT_PREPARE)
        mysql_derived_prepare()
        derived->dt_handler= derived->find_derived_handler(thd);
        ...
      

      Two pushdown interfaces

      select_handler:

      • Takes the root SELECT_LEX* as an argument (ha_federated actually assumes that the SELECT is the top-level select).
        • This is why UNION is not handled.
      • non-virtual function creates the result temp. table.
      • then uses init_scan()/next_row()/end_scan() to pump the rows.
      • Temporary table is not actually used. We use only temporary table's table->record[0]
        as a buffer to get the next row.

      derived_handler:

      • Takes a SELECT_LEX_UNIT* as an argument
        • That's why derived tables that are UNIONs can be handled.
      • Also has init_scan()/next_row()/end_scan() to pump the rows.
      • The table is pumped into a temporary table which is then used to do joins, etc.

      groupby_handler:

      • Has an API that's totally different from the above two. So, it is outside of scope of this MDEV.

      Storage engines

      FederatedX - the test engine

      ha_federatedx supports select_handler and derived_handler.
      The query is pushed by getting a text of the query:

      int ha_federatedx_select_handler::init_scan() {
        ...
        if ((*iop)->query(thd->query(), thd->query_length()))
          goto err;
      

      or of the derived table:

      ha_federatedx_derived_handler::init_scan() {
        ...
        if ((*iop)->query(derived->derived_spec.str, derived->derived_spec.length))
          goto err;
      

      (This obviously doesn't work in many cases e.g. with views or different table names on the backend. But it's good enough to do testing).

      ColumnStore - the target engine

      ColumnStore actually walks the parsed query tree (SELECT_LEX[_UNIT] structures) and constructs operations that it will push to the backend.

      The query format that's used to pass to the backend is *NOT* SQL.

      Solution

      The idea is:

      1. Make select_handler interface accept UNIONs.

      Instead of accepting a SELECT_LEX (and assuming it is the top-level SELECT), select_handler should accept a SELECT_LEX_UNIT (like derived_handler does).
      This way, queries that have UNION (or UNION ALL, INTERSECT, etc) at the top level can be handled with a select_handler.

      2. Join the storage-engine facing parts of select_handler and derived_handler.

      There seems to be no difference between these two classes as far as the Storage Engine is concerned. To make the API smaller, we can join them together.

      3. Allow parts of UNION to be handled with a Select Handler.

      Suppose the query is a UNION where some parts of it can be pushed and some not:

      select * from columnstore_table union select * from innodb_table
      

      Passing the entire query to the select_handler will fail. It should be very easy (see attached poc patch) then to try pushing individual SELECTs to the select_handler.

      Code-wise, it should be the same select_handler class which is initialized with a SELECT_LEX (like it is done now) instead of a SELECT_LEX_UNIT.
      .

      Implementation details

      Finding the storage engine to push into

      Currently, it is done in these functions:

      select_handler *find_select_handler(THD *thd, SELECT_LEX *select_lex)
      derived_handler *TABLE_LIST::find_derived_handler(THD *thd)
      

      They find the first table that has a select (or derived) handler and push.

      Note that they do not check if the select (or the derived table) has tables from other engines. Such checks are currently done inside each engine's create_select/create_derived function (if done at all).

      Adjusting Federatedx

      TODO

      EXPLAIN output changes

      What should be printed when the whole top-level UNION is pushed?

      Explain_select has a special case where it was pushed. Check out Explain_select::print_explain,

        if (select_type == pushed_derived_text || select_type == pushed_select_text)
        {
           print_explain_message_line(output, explain_flags, is_analyze ...
      

      Explain_union doesn't have such logic. We should either extend Explain_union to have it, or produce an Explain_select object instead of Explain_union.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              serg Sergei Golubchik
              Reporter:
              tntnatbry Gagan Goel
              Votes:
              4 Vote for this issue
              Watchers:
              19 Start watching this issue

                Dates

                Created:
                Updated:

                  Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.