Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-12387

Push conditions into materialized IN subqueries

Details

    • 10.0.34, 10.1.31, 10.2.13, 10.1.32, 10.3.6-1

    Description

      The subqueries with GROUP BY are always materialized in MariaDB/MySQL.
      If such a subquery is an IN subquery

       (expr[1],...) IN (SELECT col[1], ...FROM ... GROUP BY...)
      

      and is a conjunct of the WHERE condition of the main query then for every row in the result set the following is true:

      expr[1]=col[1] AND ...
      

      Let P be a comparison predicate over expr[i]. Then the condition P(expr[i]/col[i]) can be pushed into the subquery.

      This task basically has to repeat what was done for pushing conditions into materialized views/derived tables (see MDEV-9197).

      EDIT
      optimizer_switch flag name is condition_pushdown_for_subquery

      Attachments

        Issue Links

          Activity

            Status for 27.12.2017

            This project can be implemented in the similar way as MDEV-9197 and some of the procedures from MDEV-9197 can be reused.

            The procedure for the push down conditions in the materialized IN-subqueries should work this way:

            Let's look on the example:

            select * from t1 
            where (a>4) and (b<200) and (x>1)
              (a,b) in (
                 select c,max(d) 
                 from t2 
                 where (c<10) 
                 group by c)
            ;
            

            1. Condition that can be pushed in the IN-subquery should be extracted. So it should be condition that is defined only with the fields from the left part of the IN subquery (fields (a,b))

            extracted_cond:

            (a>4) and (b<200)
            

            2. Than this condition should be transformed into the condition that is defined only with the fields from the projection list from the right part of the IN-subquery.
            (as a=c and b=max(d))

            subq_extracted_cond:

            (c>4) and (max(d)<200)
            

            3. Than condition which depends only on the fields from the group by of the IN subquery should be extracted from subq_extracted_cond (there procedures from MDEV-9197 can be reused)

            subq_extracted_cond_for_group_fields:

             (c>4)
            

            4. Now subq_extracted_cond_for_group_fields can be pushed into the WHERE part of the IN subquery. Also subq_extracted_cond_for_group_fields should be deleted from the subq_extracted_cond.

            select * from t1 
            where (a>4) and (b<200) and (x>1)
              (a,b) in (
                 select c,max(d) 
                 from t2 
                 where (с>4) and (c<10) 
                 group by c)
            ;
            

            5. On the last stage subq_extracted_cond can be pushed into the HAVING part of the IN subquery.

            select * from t1 
            where (a>4) and (b<200) and (x>1)
              (a,b) in (
                 select c,max(d) 
                 from t2 
                 where (с>4) and (c<10)
                 having (max(d)<200)
                 group by c)
            ;
            

            Now everything except the 2 step is done. Procedures for the transformations in the 2 step are also written but they should be corrected.
            Also some tests are written.

            There was a problem with the 2 step because in the MDEV-9197 all transformations were made after defining what condition can be pushed in the WHERE part of the materialized derived table/view (the 4 step there). There it should be made before (on the 2 step) and also all procedures should be rewritten in the way that they can work with the subquries.

            shagalla Galina Shalygina (Inactive) added a comment - - edited Status for 27.12.2017 This project can be implemented in the similar way as MDEV-9197 and some of the procedures from MDEV-9197 can be reused. The procedure for the push down conditions in the materialized IN-subqueries should work this way: Let's look on the example: select * from t1 where (a>4) and (b<200) and (x>1) (a,b) in ( select c,max(d) from t2 where (c<10) group by c) ; 1. Condition that can be pushed in the IN-subquery should be extracted. So it should be condition that is defined only with the fields from the left part of the IN subquery (fields (a,b)) extracted_cond : (a>4) and (b<200) 2. Than this condition should be transformed into the condition that is defined only with the fields from the projection list from the right part of the IN-subquery. (as a=c and b=max(d)) subq_extracted_cond : (c>4) and (max(d)<200) 3. Than condition which depends only on the fields from the group by of the IN subquery should be extracted from subq_extracted_cond (there procedures from MDEV-9197 can be reused) subq_extracted_cond_for_group_fields : (c>4) 4. Now subq_extracted_cond_for_group_fields can be pushed into the WHERE part of the IN subquery. Also subq_extracted_cond_for_group_fields should be deleted from the subq_extracted_cond . select * from t1 where (a>4) and (b<200) and (x>1) (a,b) in ( select c,max(d) from t2 where (с>4) and (c<10) group by c) ; 5. On the last stage subq_extracted_cond can be pushed into the HAVING part of the IN subquery. select * from t1 where (a>4) and (b<200) and (x>1) (a,b) in ( select c,max(d) from t2 where (с>4) and (c<10) having (max(d)<200) group by c) ; Now everything except the 2 step is done. Procedures for the transformations in the 2 step are also written but they should be corrected. Also some tests are written. There was a problem with the 2 step because in the MDEV-9197 all transformations were made after defining what condition can be pushed in the WHERE part of the materialized derived table/view (the 4 step there). There it should be made before (on the 2 step) and also all procedures should be rewritten in the way that they can work with the subquries.

            Status for 09.01.2018

            It was found that the implementation scheme in the comment above is wrong. The procedure should work this way:

            select * from t1 
            where (a>4) and (b<200) and (x>1)
              (a,b) in (
                 select c,max(d) 
                 from t2 
                 where (c<10) 
                 group by c)
            ;
            

            1 step remains the same. So we get:

            extracted_cond:

            (a>4) and (b<200)
            

            2. The condition which depends only on the fields from the group by of the IN subquery should be extracted from extracted_cond

            extracted_cond_for_group_fields:

            (a>4)
            

            3. Than extracted_cond_for_group_fields should be trasformed into the condition that is defined only with the fields from the projection list from the right part of the IN-subquery (there a=c)

            subq_extracted_cond_for_group_fields:

             (c>4)
            

            4. Now subq_extracted_cond_for_group_fields can be pushed into the WHERE part of the IN subquery. Also extracted_cond_for_group_fields should be deleted from the extracted_cond.

            extracted_cond:

            (b<200)
            

            5. Than extracted_cond should be trasformed into the condition that is defined only with the fields from the projection list from the right part of the IN-subquery (there b=max(d))

            subq_extracted_cond:

            (max(d)<200)
            

            6. Now subq_extracted_cond can be pushed into the HAVING part of the IN subquery.

            select * from t1 
            where (a>4) and (b<200) and (x>1)
              (a,b) in (
                 select c,max(d) 
                 from t2 
                 where (с>4) and (c<10)
                 having (max(d)<200)
                 group by c)
            ;
            

            Now the working prototype is made. Btw it is needed to rewrite setup_jtbm_semi_joins procedure in the way that it will work in this case (to get equalities between left and right parts of IN subquery after pushdown is done). This remarks are done but needs to be tested.

            shagalla Galina Shalygina (Inactive) added a comment - Status for 09.01.2018 It was found that the implementation scheme in the comment above is wrong. The procedure should work this way: select * from t1 where (a>4) and (b<200) and (x>1) (a,b) in ( select c,max(d) from t2 where (c<10) group by c) ; 1 step remains the same. So we get: extracted_cond: (a>4) and (b<200) 2. The condition which depends only on the fields from the group by of the IN subquery should be extracted from extracted_cond extracted_cond_for_group_fields: (a>4) 3. Than extracted_cond_for_group_fields should be trasformed into the condition that is defined only with the fields from the projection list from the right part of the IN-subquery (there a=c) subq_extracted_cond_for_group_fields: (c>4) 4. Now subq_extracted_cond_for_group_fields can be pushed into the WHERE part of the IN subquery. Also extracted_cond_for_group_fields should be deleted from the extracted_cond . extracted_cond: (b<200) 5. Than extracted_cond should be trasformed into the condition that is defined only with the fields from the projection list from the right part of the IN-subquery (there b=max(d)) subq_extracted_cond: (max(d)<200) 6. Now subq_extracted_cond can be pushed into the HAVING part of the IN subquery. select * from t1 where (a>4) and (b<200) and (x>1) (a,b) in ( select c,max(d) from t2 where (с>4) and (c<10) having (max(d)<200) group by c) ; Now the working prototype is made. Btw it is needed to rewrite setup_jtbm_semi_joins procedure in the way that it will work in this case (to get equalities between left and right parts of IN subquery after pushdown is done). This remarks are done but needs to be tested.

            Pushed in 10.4

            shagalla Galina Shalygina (Inactive) added a comment - Pushed in 10.4
            psergei Sergei Petrunia added a comment - Documentation is being worked on here: https://mariadb.com/kb/en/library/condition-pushdown-into-in-subqueries/

            People

              shagalla Galina Shalygina (Inactive)
              igor Igor Babaev (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.