Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-14779

using left join causes incorrect results with materialization and derived tables

Details

    • 5.5.59, 10.0.34

    Description

      SELECT t.SID, t.NAME,t.ID_NO,t.MOBILE,r.credit_status,t.amount,r.cust_active_amount,t.END_TIME,t.FILE_NAME
      FROM (select sid,name,id_no,recommend_id_no,mobile,amount,end_time,FILE_NAME from tc_white_list
      WHERE  recommend_id_no='ybr' ) t
        LEFT   JOIN (SELECT credit_status,id_no,cust_active_amount
                   from datacqsn.tc_credit_request WHERE sid in(SELECT max(sid)
                                                                from datacqsn.tc_credit_request where credit_status>'0' GROUP BY id_no)) r
          on t.ID_NO=r.id_no
      

      table t have two rows ,r have 0 rows ,result have no rows

      Attachments

        Issue Links

          Activity

            alice Alice Sherepa added a comment -

            It looks like you have syntax error (no WHERE ) here:

            SELECT max(sid) from test.tc_credit_request credit_status>'0' GROUP BY id_no

            and please add output of:

            show create table tc_white_list;
            show create table tc_credit_request;
            show variables like 'optimizer_switch';
            

            alice Alice Sherepa added a comment - It looks like you have syntax error (no WHERE ) here: SELECT max (sid) from test.tc_credit_request credit_status> '0' GROUP BY id_no and please add output of: show create table tc_white_list; show create table tc_credit_request; show variables like 'optimizer_switch' ;
            alice Alice Sherepa added a comment - - edited

            Thanks for the report! Problem is reproducible on 5.5-10.2 with optimizer_switch='materialization=on'
            testcase:

            create table t1(id  int);
            insert into t1 values (1),(2);
            create table t2(sid int, id int);
            insert into t2 values (1,1),(2,2);
             
            select * from t1 t
             left join (select * from t2 where sid in (select max(sid) from t2 where 0=1 group by id)) r
             on t.id=r.id ;
            

            --------------
            select * from t1 t
             left join (select * from t2 where sid in (select max(sid) from t2 where 0=1 group by id)) r
             on t.id=r.id
            --------------
            Empty set (0.00 sec)
             
            --------------
            explain extended select * from t1 t
             left join (select * from t2 where sid in (select max(sid) from t2 where 0=1 group by id)) r
             on t.id=r.id
            --------------
            +------+--------------+-------+------+---------------+------+---------+------+------+----------+-----------------------------------------------------+
            | id   | select_type  | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                               |
            +------+--------------+-------+------+---------------+------+---------+------+------+----------+-----------------------------------------------------+
            |    1 | PRIMARY      | NULL  | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | Impossible WHERE noticed after reading const tables |
            |    3 | MATERIALIZED | NULL  | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | Impossible WHERE                                    |
            +------+--------------+-------+------+---------------+------+---------+------+------+----------+-----------------------------------------------------+
            2 rows in set, 1 warning (0.00 sec)
             
            Note (Code 1003): select `test`.`t`.`id` AS `id`,`test`.`t2`.`sid` AS `sid`,`test`.`t2`.`id` AS `id` from  <materialize> (select max(`test`.`t2`.`sid`) from `test`.`t2` where 0 group by `test`.`t2`.`id`) join `test`.`t1` `t` left join (`test`.`t2`) on(`test`.`t2`.`id` = `test`.`t`.`id` and 1) where 0
            --------------
            set optimizer_switch='materialization=off'
            --------------
            Query OK, 0 rows affected (0.00 sec)
             
            --------------
            select * from t1 t
             left join (select * from t2 where sid in (select max(sid) from t2 where 0=1 group by id)) r
             on t.id=r.id
            --------------
            +------+------+------+
            | id   | sid  | id   |
            +------+------+------+
            |    1 | NULL | NULL |
            |    2 | NULL | NULL |
            +------+------+------+
            2 rows in set (0.00 sec)
             
            --------------
            explain extended select * from t1 t
             left join (select * from t2 where sid in (select max(sid) from t2 where 0=1 group by id)) r
             on t.id=r.id
            --------------
            +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+
            | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                           |
            +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+
            |    1 | PRIMARY     | t     | ALL  | NULL          | NULL | NULL    | NULL |    2 |   100.00 |                                                 |
            |    1 | PRIMARY     | t2    | ALL  | NULL          | NULL | NULL    | NULL |    2 |   100.00 | Using where; Using join buffer (flat, BNL join) |
            |    3 | SUBQUERY    | NULL  | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | Impossible WHERE                                |
            +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+
            3 rows in set, 1 warning (0.00 sec)
             
            Note (Code 1003): select `test`.`t`.`id` AS `id`,`test`.`t2`.`sid` AS `sid`,`test`.`t2`.`id` AS `id` from `test`.`t1` `t` left join (`test`.`t2`) on(`test`.`t2`.`id` = `test`.`t`.`id` and <in_optimizer>(`test`.`t2`.`sid`,<exists>(select max(`test`.`t2`.`sid`) from `test`.`t2` where 0 group by `test`.`t2`.`id`))) where 1
            

            alice Alice Sherepa added a comment - - edited Thanks for the report! Problem is reproducible on 5.5-10.2 with optimizer_switch='materialization=on' testcase: create table t1(id int ); insert into t1 values (1),(2); create table t2(sid int , id int ); insert into t2 values (1,1),(2,2);   select * from t1 t left join ( select * from t2 where sid in ( select max (sid) from t2 where 0=1 group by id)) r on t.id=r.id ; -------------- select * from t1 t left join (select * from t2 where sid in (select max(sid) from t2 where 0=1 group by id)) r on t.id=r.id -------------- Empty set (0.00 sec)   -------------- explain extended select * from t1 t left join (select * from t2 where sid in (select max(sid) from t2 where 0=1 group by id)) r on t.id=r.id -------------- +------+--------------+-------+------+---------------+------+---------+------+------+----------+-----------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+--------------+-------+------+---------------+------+---------+------+------+----------+-----------------------------------------------------+ | 1 | PRIMARY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Impossible WHERE noticed after reading const tables | | 3 | MATERIALIZED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Impossible WHERE | +------+--------------+-------+------+---------------+------+---------+------+------+----------+-----------------------------------------------------+ 2 rows in set, 1 warning (0.00 sec)   Note (Code 1003): select `test`.`t`.`id` AS `id`,`test`.`t2`.`sid` AS `sid`,`test`.`t2`.`id` AS `id` from <materialize> (select max(`test`.`t2`.`sid`) from `test`.`t2` where 0 group by `test`.`t2`.`id`) join `test`.`t1` `t` left join (`test`.`t2`) on(`test`.`t2`.`id` = `test`.`t`.`id` and 1) where 0 -------------- set optimizer_switch='materialization=off' -------------- Query OK, 0 rows affected (0.00 sec)   -------------- select * from t1 t left join (select * from t2 where sid in (select max(sid) from t2 where 0=1 group by id)) r on t.id=r.id -------------- +------+------+------+ | id | sid | id | +------+------+------+ | 1 | NULL | NULL | | 2 | NULL | NULL | +------+------+------+ 2 rows in set (0.00 sec)   -------------- explain extended select * from t1 t left join (select * from t2 where sid in (select max(sid) from t2 where 0=1 group by id)) r on t.id=r.id -------------- +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+ | 1 | PRIMARY | t | ALL | NULL | NULL | NULL | NULL | 2 | 100.00 | | | 1 | PRIMARY | t2 | ALL | NULL | NULL | NULL | NULL | 2 | 100.00 | Using where; Using join buffer (flat, BNL join) | | 3 | SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Impossible WHERE | +------+-------------+-------+------+---------------+------+---------+------+------+----------+-------------------------------------------------+ 3 rows in set, 1 warning (0.00 sec)   Note (Code 1003): select `test`.`t`.`id` AS `id`,`test`.`t2`.`sid` AS `sid`,`test`.`t2`.`id` AS `id` from `test`.`t1` `t` left join (`test`.`t2`) on(`test`.`t2`.`id` = `test`.`t`.`id` and <in_optimizer>(`test`.`t2`.`sid`,<exists>(select max(`test`.`t2`.`sid`) from `test`.`t2` where 0 group by `test`.`t2`.`id`))) where 1

            Looking at Alice's SHOW WARNINGS when materialization=ON:

            select 
              test.t.id AS id,test.t2.sid AS sid,test.t2.id AS id 
            from  
              <materialize> (select max(test.t2.sid) 
                             from test.t2 
                             where 0 
                             group by test.t2.id) 
              join 
              test.t1 t 
              left join (test.t2) on (test.t2.id = test.t.id and 1)
            where 0
            

            The subquery with GROUP BY is outside of the inner side of the outer join. This is incorrect, it should have been inside (as it is in WHERE clause of the child subquery which IS on the inner side of the outer join).

            psergei Sergei Petrunia added a comment - Looking at Alice's SHOW WARNINGS when materialization=ON: select test.t.id AS id,test.t2.sid AS sid,test.t2.id AS id from <materialize> (select max(test.t2.sid) from test.t2 where 0 group by test.t2.id) join test.t1 t left join (test.t2) on (test.t2.id = test.t.id and 1) where 0 The subquery with GROUP BY is outside of the inner side of the outer join. This is incorrect, it should have been inside (as it is in WHERE clause of the child subquery which IS on the inner side of the outer join).

            ... and this explains why we don't get output rows. The query should produce

            {t1.row, NULL-complemented row}

            as output. but since the subquery got out of outer join, null-complemented rows are not produced.

            psergei Sergei Petrunia added a comment - ... and this explains why we don't get output rows. The query should produce {t1.row, NULL-complemented row} as output. but since the subquery got out of outer join, null-complemented rows are not produced.

            So after all the investigation we find the bug that we convert a subquery into a non-mergeable semi-join even when we should not.

                      Conversion of an IN subquery predicate into semi-join
                      is blocked now if the predicate occurs:
                      - in the ON expression of an outer join
                      - in the ON expression of an inner join embedded directly
                        or indirectly in the inner nest of an outer join
            

            Currently in the function find_and_block_conversion_to_sj(), we block the conversion of a subquery to a semi-join when we have an IN subquery predicated in the on_expr of an outer join

            if (to_find->type() != Item::SUBSELECT_ITEM ||
                  ((Item_subselect *) to_find)->substype() != Item_subselect::IN_SUBS)
            

            But this does not handle the case when the IN subquery predicate is wrapped inside an Item_in_optimizer object.

            So we need to disable the conversion even if we have an Item_in_optimizer object inside the on_expr of an outer join.

            varun Varun Gupta (Inactive) added a comment - So after all the investigation we find the bug that we convert a subquery into a non-mergeable semi-join even when we should not. Conversion of an IN subquery predicate into semi-join is blocked now if the predicate occurs: - in the ON expression of an outer join - in the ON expression of an inner join embedded directly or indirectly in the inner nest of an outer join Currently in the function find_and_block_conversion_to_sj(), we block the conversion of a subquery to a semi-join when we have an IN subquery predicated in the on_expr of an outer join if (to_find->type() != Item::SUBSELECT_ITEM || ((Item_subselect *) to_find)->substype() != Item_subselect::IN_SUBS) But this does not handle the case when the IN subquery predicate is wrapped inside an Item_in_optimizer object. So we need to disable the conversion even if we have an Item_in_optimizer object inside the on_expr of an outer join.
            shoentjen Sander Hoentjen added a comment - - edited

            I created MDEV-15523, with that case I have the same (wrong) result when I do `set optimizer_switch='materialization=off'`
            Was it expected that doing this fixed the issue? If that is the case then my bug is something different.

            shoentjen Sander Hoentjen added a comment - - edited I created MDEV-15523 , with that case I have the same (wrong) result when I do `set optimizer_switch='materialization=off'` Was it expected that doing this fixed the issue? If that is the case then my bug is something different.
            alice Alice Sherepa added a comment -

            The result depends on the value of optimizer_switch, but it should not, it is a bug and setting value to 'materalization=off' is only a temporary solution, only until the bug will be fixed. I checked testcase from MDEV-15523 with the default values of variables, it is the same bug as MDEV-14779, but maybe you have other switches, that also affected the result.

            MariaDB [test]> CREATE TABLE `t1` (
                ->   `id` int(11) NOT NULL AUTO_INCREMENT,
                ->   `works` VARCHAR(1) DEFAULT NULL,
                ->   PRIMARY KEY (`id`)
                -> );
            Query OK, 0 rows affected (0.08 sec)
             
            MariaDB [test]> INSERT INTO `t1` VALUES (1,'y'),(2,'n');
            Query OK, 2 rows affected (0.01 sec)
            Records: 2  Duplicates: 0  Warnings: 0
             
            MariaDB [test]> DROP TABLE IF EXISTS `t2`;
            Query OK, 0 rows affected (0.06 sec)
             
            MariaDB [test]> CREATE TABLE `t2` (
                ->   `id` int(11) NOT NULL AUTO_INCREMENT,
                ->   `t1_id` int(11) DEFAULT NULL,
                ->   PRIMARY KEY (`id`)
                -> );
            Query OK, 0 rows affected (0.07 sec)
             
            MariaDB [test]> INSERT INTO `t2` VALUES (1,1),(2,1),(3,3),(4,3);
            Query OK, 4 rows affected (0.01 sec)
            Records: 4  Duplicates: 0  Warnings: 0
             
            MariaDB [test]> SELECT * FROM t1 LEFT OUTER JOIN (SELECT t1_id FROM t2 WHERE id IN (SELECT MAX(id) FROM t2 GROUP BY t1_id)) tmpname on tmpname.t1_id = t1.id;
            +----+-------+-------+
            | id | works | t1_id |
            +----+-------+-------+
            |  1 | y     |     1 |
            +----+-------+-------+
            1 row in set (0.02 sec)
             
            MariaDB [test]> set optimizer_switch='materialization=off';
            Query OK, 0 rows affected (0.00 sec)
             
            MariaDB [test]> SELECT * FROM t1 LEFT OUTER JOIN (SELECT t1_id FROM t2 WHERE id IN (SELECT MAX(id) FROM t2 GROUP BY t1_id)) tmpname on tmpname.t1_id = t1.id;
            +----+-------+-------+
            | id | works | t1_id |
            +----+-------+-------+
            |  1 | y     |     1 |
            |  2 | n     |  NULL |
            +----+-------+-------+
            2 rows in set (0.00 sec)
             
            MariaDB [test]> show variables like '%switch%'\G
            *************************** 1. row ***************************
            Variable_name: optimizer_switch
                    Value: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=off,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on
            1 row in set (0.00 sec)
            
            

            alice Alice Sherepa added a comment - The result depends on the value of optimizer_switch, but it should not, it is a bug and setting value to 'materalization=off' is only a temporary solution, only until the bug will be fixed. I checked testcase from MDEV-15523 with the default values of variables, it is the same bug as MDEV-14779 , but maybe you have other switches, that also affected the result. MariaDB [test]> CREATE TABLE `t1` ( -> `id` int(11) NOT NULL AUTO_INCREMENT, -> `works` VARCHAR(1) DEFAULT NULL, -> PRIMARY KEY (`id`) -> ); Query OK, 0 rows affected (0.08 sec)   MariaDB [test]> INSERT INTO `t1` VALUES (1,'y'),(2,'n'); Query OK, 2 rows affected (0.01 sec) Records: 2 Duplicates: 0 Warnings: 0   MariaDB [test]> DROP TABLE IF EXISTS `t2`; Query OK, 0 rows affected (0.06 sec)   MariaDB [test]> CREATE TABLE `t2` ( -> `id` int(11) NOT NULL AUTO_INCREMENT, -> `t1_id` int(11) DEFAULT NULL, -> PRIMARY KEY (`id`) -> ); Query OK, 0 rows affected (0.07 sec)   MariaDB [test]> INSERT INTO `t2` VALUES (1,1),(2,1),(3,3),(4,3); Query OK, 4 rows affected (0.01 sec) Records: 4 Duplicates: 0 Warnings: 0   MariaDB [test]> SELECT * FROM t1 LEFT OUTER JOIN (SELECT t1_id FROM t2 WHERE id IN (SELECT MAX(id) FROM t2 GROUP BY t1_id)) tmpname on tmpname.t1_id = t1.id; +----+-------+-------+ | id | works | t1_id | +----+-------+-------+ | 1 | y | 1 | +----+-------+-------+ 1 row in set (0.02 sec)   MariaDB [test]> set optimizer_switch='materialization=off'; Query OK, 0 rows affected (0.00 sec)   MariaDB [test]> SELECT * FROM t1 LEFT OUTER JOIN (SELECT t1_id FROM t2 WHERE id IN (SELECT MAX(id) FROM t2 GROUP BY t1_id)) tmpname on tmpname.t1_id = t1.id; +----+-------+-------+ | id | works | t1_id | +----+-------+-------+ | 1 | y | 1 | | 2 | n | NULL | +----+-------+-------+ 2 rows in set (0.00 sec)   MariaDB [test]> show variables like '%switch%'\G *************************** 1. row *************************** Variable_name: optimizer_switch Value: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=off,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on 1 row in set (0.00 sec)

            Ok to push

            psergei Sergei Petrunia added a comment - Ok to push

            Pushed to 5.5

            varun Varun Gupta (Inactive) added a comment - Pushed to 5.5

            People

              varun Varun Gupta (Inactive)
              altercyx bela sand
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.