[MDEV-12387] Push conditions into materialized IN subqueries Created: 2017-03-28  Updated: 2018-10-07  Resolved: 2018-05-21

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Fix Version/s: 10.4.0

Type: Task Priority: Major
Reporter: Igor Babaev Assignee: Galina Shalygina (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-7486 Condition pushdown from HAVING into W... Closed
Sprint: 10.0.34, 10.1.31, 10.2.13, 10.1.32, 10.3.6-1

 Description   

The subqueries with GROUP BY are always materialized in MariaDB/MySQL.
If such a subquery is an IN subquery

 (expr[1],...) IN (SELECT col[1], ...FROM ... GROUP BY...)

and is a conjunct of the WHERE condition of the main query then for every row in the result set the following is true:

expr[1]=col[1] AND ...

Let P be a comparison predicate over expr[i]. Then the condition P(expr[i]/col[i]) can be pushed into the subquery.

This task basically has to repeat what was done for pushing conditions into materialized views/derived tables (see MDEV-9197).

EDIT
optimizer_switch flag name is condition_pushdown_for_subquery



 Comments   
Comment by Galina Shalygina (Inactive) [ 2017-12-27 ]

Status for 27.12.2017

This project can be implemented in the similar way as MDEV-9197 and some of the procedures from MDEV-9197 can be reused.

The procedure for the push down conditions in the materialized IN-subqueries should work this way:

Let's look on the example:

select * from t1 
where (a>4) and (b<200) and (x>1)
  (a,b) in (
     select c,max(d) 
     from t2 
     where (c<10) 
     group by c)
;

1. Condition that can be pushed in the IN-subquery should be extracted. So it should be condition that is defined only with the fields from the left part of the IN subquery (fields (a,b))

extracted_cond:

(a>4) and (b<200)

2. Than this condition should be transformed into the condition that is defined only with the fields from the projection list from the right part of the IN-subquery.
(as a=c and b=max(d))

subq_extracted_cond:

(c>4) and (max(d)<200)

3. Than condition which depends only on the fields from the group by of the IN subquery should be extracted from subq_extracted_cond (there procedures from MDEV-9197 can be reused)

subq_extracted_cond_for_group_fields:

 (c>4)

4. Now subq_extracted_cond_for_group_fields can be pushed into the WHERE part of the IN subquery. Also subq_extracted_cond_for_group_fields should be deleted from the subq_extracted_cond.

select * from t1 
where (a>4) and (b<200) and (x>1)
  (a,b) in (
     select c,max(d) 
     from t2 
     where (с>4) and (c<10) 
     group by c)
;

5. On the last stage subq_extracted_cond can be pushed into the HAVING part of the IN subquery.

select * from t1 
where (a>4) and (b<200) and (x>1)
  (a,b) in (
     select c,max(d) 
     from t2 
     where (с>4) and (c<10)
     having (max(d)<200)
     group by c)
;

Now everything except the 2 step is done. Procedures for the transformations in the 2 step are also written but they should be corrected.
Also some tests are written.

There was a problem with the 2 step because in the MDEV-9197 all transformations were made after defining what condition can be pushed in the WHERE part of the materialized derived table/view (the 4 step there). There it should be made before (on the 2 step) and also all procedures should be rewritten in the way that they can work with the subquries.

Comment by Galina Shalygina (Inactive) [ 2018-01-09 ]

Status for 09.01.2018

It was found that the implementation scheme in the comment above is wrong. The procedure should work this way:

select * from t1 
where (a>4) and (b<200) and (x>1)
  (a,b) in (
     select c,max(d) 
     from t2 
     where (c<10) 
     group by c)
;

1 step remains the same. So we get:

extracted_cond:

(a>4) and (b<200)

2. The condition which depends only on the fields from the group by of the IN subquery should be extracted from extracted_cond

extracted_cond_for_group_fields:

(a>4)

3. Than extracted_cond_for_group_fields should be trasformed into the condition that is defined only with the fields from the projection list from the right part of the IN-subquery (there a=c)

subq_extracted_cond_for_group_fields:

 (c>4)

4. Now subq_extracted_cond_for_group_fields can be pushed into the WHERE part of the IN subquery. Also extracted_cond_for_group_fields should be deleted from the extracted_cond.

extracted_cond:

(b<200)

5. Than extracted_cond should be trasformed into the condition that is defined only with the fields from the projection list from the right part of the IN-subquery (there b=max(d))

subq_extracted_cond:

(max(d)<200)

6. Now subq_extracted_cond can be pushed into the HAVING part of the IN subquery.

select * from t1 
where (a>4) and (b<200) and (x>1)
  (a,b) in (
     select c,max(d) 
     from t2 
     where (с>4) and (c<10)
     having (max(d)<200)
     group by c)
;

Now the working prototype is made. Btw it is needed to rewrite setup_jtbm_semi_joins procedure in the way that it will work in this case (to get equalities between left and right parts of IN subquery after pushdown is done). This remarks are done but needs to be tested.

Comment by Galina Shalygina (Inactive) [ 2018-05-21 ]

Pushed in 10.4

Comment by Sergei Petrunia [ 2018-06-26 ]

Documentation is being worked on here: https://mariadb.com/kb/en/library/condition-pushdown-into-in-subqueries/

Generated at Thu Feb 08 07:57:19 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.