Details
-
Task
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
Description
mysql_union() currently does not support select handler execution. This is causing performance degradation of queries in ColumnStore involving a UNION in an outer select. More details on the issue are in MCOL-4584.
Current APIs
Select Handler
mysql_select()
|
join->prepare(...)
|
select_lex->pushdown_select= find_select_handler(thd, select_lex);
|
...
JOIN::exec_inner():
|
if (select_lex->pushdown_select)
|
{
|
/* Execute the query pushed into a foreign engine */
|
error= select_lex->pushdown_select->execute();
|
}
|
This eventually calls:
bool select_handler::send_data() {
|
...
|
if (select->join->result->send_data(result_columns))
|
}
|
which writes the data to query output.
Derived handler
JOIN::prepare()
|
select_lex->handle_derived(thd->lex, DT_PREPARE)
|
mysql_derived_prepare()
|
derived->dt_handler= derived->find_derived_handler(thd);
|
...
|
Two pushdown interfaces
select_handler:
- Takes the root SELECT_LEX* as an argument (ha_federated actually assumes that the SELECT is the top-level select).
- This is why UNION is not handled.
- non-virtual function creates the result temp. table.
- then uses init_scan()/next_row()/end_scan() to pump the rows.
- Temporary table is not actually used. We use only temporary table's table->record[0]
as a buffer to get the next row.
derived_handler:
- Takes a SELECT_LEX_UNIT* as an argument
- That's why derived tables that are UNIONs can be handled.
- Also has init_scan()/next_row()/end_scan() to pump the rows.
- The table is pumped into a temporary table which is then used to do joins, etc.
groupby_handler:
- Has an API that's totally different from the above two. So, it is outside of scope of this MDEV.
Storage engines
FederatedX - the test engine
ha_federatedx supports select_handler and derived_handler.
The query is pushed by getting a text of the query:
int ha_federatedx_select_handler::init_scan() { |
...
|
if ((*iop)->query(thd->query(), thd->query_length())) |
goto err; |
or of the derived table:
ha_federatedx_derived_handler::init_scan() {
|
...
|
if ((*iop)->query(derived->derived_spec.str, derived->derived_spec.length)) |
goto err; |
(This obviously doesn't work in many cases e.g. with views or different table names on the backend. But it's good enough to do testing).
ColumnStore - the target engine
ColumnStore actually walks the parsed query tree (SELECT_LEX[_UNIT] structures) and constructs operations that it will push to the backend.
The query format that's used to pass to the backend is *NOT* SQL.
Solution
The idea is:
1. Make select_handler interface accept UNIONs.
Instead of accepting a SELECT_LEX (and assuming it is the top-level SELECT), select_handler should accept a SELECT_LEX_UNIT (like derived_handler does).
This way, queries that have UNION (or UNION ALL, INTERSECT, etc) at the top level can be handled with a select_handler.
2. Join the storage-engine facing parts of select_handler and derived_handler.
There seems to be no difference between these two classes as far as the Storage Engine is concerned. To make the API smaller, we can join them together.
3. Allow parts of UNION to be handled with a Select Handler.
Suppose the query is a UNION where some parts of it can be pushed and some not:
select * from columnstore_table union select * from innodb_table
|
Passing the entire query to the select_handler will fail. It should be very easy (see attached poc patch) then to try pushing individual SELECTs to the select_handler.
Code-wise, it should be the same select_handler class which is initialized with a SELECT_LEX (like it is done now) instead of a SELECT_LEX_UNIT.
.
Implementation details
Finding the storage engine to push into
Currently, it is done in these functions:
select_handler *find_select_handler(THD *thd, SELECT_LEX *select_lex)
|
derived_handler *TABLE_LIST::find_derived_handler(THD *thd)
|
They find the first table that has a select (or derived) handler and push.
Note that they do not check if the select (or the derived table) has tables from other engines. Such checks are currently done inside each engine's create_select/create_derived function (if done at all).
Adjusting Federatedx
TODO
EXPLAIN output changes
What should be printed when the whole top-level UNION is pushed?
Explain_select has a special case where it was pushed. Check out Explain_select::print_explain,
if (select_type == pushed_derived_text || select_type == pushed_select_text)
|
{
|
print_explain_message_line(output, explain_flags, is_analyze ...
|
Explain_union doesn't have such logic. We should either extend Explain_union to have it, or produce an Explain_select object instead of Explain_union.
Attachments
Issue Links
- causes
-
MCOL-4569 Queries with UNION ALL perform disproportionally badly
- Closed
-
MCOL-4584 Significant performance degradation when UNION ALL is used in an outer select, compared to in a subquery.
- Closed
- is blocked by
-
MCOL-4901 Allow pushdown of queries involving UNIONs in outer select to ColumnStore
- Closed
-
MDEV-30868 Add ColumnStore version 23.x to MariaDB Community Server
- Closed
- is part of
-
MDEV-29547 prepare 10.11.0 preview releases
- Closed
- relates to
-
MDEV-29624 Memory leak on pushdown of a merged derived table
- Closed
-
MDEV-30828 ORDER BY clause using an integer (positional argument) on a SELECT query involving a pushed down UNION produces incorrect results
- Closed
-
MDEV-29640 FederatedX does not properly handle pushdown in case of difference in local and remote table names
- Closed