[MDEV-5123] Remove duplicated conditions pushed both to join_tab->select_cond and join_tab->cache_select->cond for blocked joins. Created: 2013-10-10  Updated: 2013-10-21  Due: 2013-10-15  Resolved: 2013-10-21

Status: Closed
Project: MariaDB Server
Component/s: None
Fix Version/s: 10.0.5

Type: Task Priority: Major
Reporter: Timour Katchaounov (Inactive) Assignee: Timour Katchaounov (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Attachments: File md5123.1.diff    
Issue Links:
Blocks
blocks MDEV-83 Cost-based choice for the pushdown of... Stalled

 Description   

If there is a condition that depends only on the second table in a blocked join,
the following blocked join algorithms (BNL, BNLH) use this condition to pre-filter records from this table prior to joining the records.

Currently the optimizer extracts such a single table condition independently from the generic condition pushdown. As a result the same condition may be evaluated up to two times more. This is a problem if the condition is expensive. It also makes it problematic to move the condition to an optimal partial join (see MDEV-83).



 Comments   
Comment by Timour Katchaounov (Inactive) [ 2013-10-10 ]

"Generic" pushdown conditions are attached to specific joins via the member: JOIN_TAB::select_cond.
The filter conditions used by blocked join algorithms are attached to JOIN_TAB::cache_select::cond.

Currently the extraction of these conditions happens in the following places in the optimizer:

JOIN::optimize_inner calls:

  • make_join_select
    Step #2: Extract WHERE/ON parts
    ...
    tmp= make_cond_for_table(thd, cond, used_tables, current_map,
    FALSE, FALSE);
    ...
    if (tmp || !cond || tab->type == JT_REF || tab->type == JT_REF_OR_NULL ||
    tab->type == JT_EQ_REF || first_inner_tab)
    {
    ...
    else if (tab->type == JT_ALL && ! use_quick_range)
    Unknown macro: { .... if (i != join->const_tables && tab->use_quick != 2 && !tab->first_inner) { /* Read with cache */ if (tab->make_scan_filter()) DBUG_RETURN(1); } .... }

    }

  • make_join_readinfo
  • check_join_cache_usage_for_tables()
    if (!tab->hash_join_is_possible() ||
    tab->make_scan_filter())
    goto no_join_cache;
    ...
    tab->cache= new JOIN_CACHE_BNLH(...)
Comment by Timour Katchaounov (Inactive) [ 2013-10-10 ]

Low-level design:

1. Split JOIN_TAB::select_cond into two conditions:

  • select_cond_for_table
  • select_cond

2. Instead of extracting the whole pushdown condition, make_join_select extracts the above two conditions.
The current call:
tmp= make_cond_for_table(thd, cond, used_tables, current_map,
FALSE, FALSE);
is substituted by two calls:
select_cond_for_table= make_cond_for_table(thd, cond, (const_table_map | table->map), table->map, FALSE, TRUE);
select_cond= make_cond_for_table(thd, cond, used_tables, current_map, FALSE, FALSE);

3. The calls to make_scan_filter() will not extract the single table condition they used to.
Instead they will use the already extracted condition JOIN_TAB::select_cond_for_table.
Once select_cond_for_table is used, set it to NULL.

4. In the end of make_join_read_info set the final pushdown condition to:
(select_cond_for_table AND select_cond).

Comment by Sergei Petrunia [ 2013-10-14 ]

So, after this task, instead of extracting one condition:

table_n_cond = make_cond_for_table(....)

we will always extract two:

table_n_cond_for_table = ...
table_n_select_cond =...

and will then use

table_n_cond_for_table AND table_n_select_cond

this approach may cause a problem:

"table_n_cond" is not always the same as "table_n_cond_for_table AND table_n_select_cond"

Comment by Timour Katchaounov (Inactive) [ 2013-10-14 ]

The attached diff implements the above incomplete idea.

Comment by Timour Katchaounov (Inactive) [ 2013-10-15 ]

The task will be implemented according the following idea porposed by Sergey Petrunia:

for each top-level AND-item in select_cond
{
if (item->used_tables() is covered by current_table | const_tables)
remove item from select_cond
}

These conjuncts can be removed from select_cond because they are false
if cache_cond->select is false. This is so, because cache_cond->select is the
maximal condition that depends only on (current_table | const_tables).

Comment by Sergei Petrunia [ 2013-10-17 ]

Had a review discussion on the latest variant of the patch. Approved, after requested changes are implemented.

Comment by Timour Katchaounov (Inactive) [ 2013-10-18 ]

The task is pushed to 5.5-timour (because this is my only tree that runs under buildbot).

  • All review comments implemented,
  • All tests pass,
  • Submitted for testing to Elena.
Comment by Timour Katchaounov (Inactive) [ 2013-10-21 ]
  • reviewed by Sergey Petrunia
  • all review comments implemented
  • tested by Elena
  • pushed to 10.0-base (10.0.5)
Generated at Thu Feb 08 07:01:52 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.