Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-19580

Unrelated JOINs corrupt usage of 'WHERE function() IN (subquery)'

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.1.38, 10.3.13, 5.5(EOL), 10.0(EOL), 10.1(EOL), 10.2(EOL), 10.3(EOL), 10.4(EOL)
    • 10.2.25, 5.5.65, 10.1.41, 10.3.16, 10.4.6
    • Optimizer, Server
    • None
    • Debian 9 with MariaDB 10.1.38
      Gentoo with MariaDB 10.3.13

    Description

      In certain circumstances a LEFT JOIN can prevent a WHERE clause comparing a function's result with a subquery from working:

      SELECT * FROM `employee`
          LEFT JOIN `division` ON `employee`.`division_id` = `division`.`id`
          WHERE pastPosition(`employee`.`id`, '1980-01-01') IN
              (SELECT `id` FROM `position` WHERE `is_management`=1);
      

      Expected results:

      id name division_id id name
      1 Mrs Robinson 2 2 Song Character
      3 Paul Simon 1 1 Songwriter
      4 Art Garfunkel 1 1 Songwriter

      Actual results:
      No rows returned

      If LEFT is omitted from the JOIN it seems to work.

      pastPosition function:

      CREATE FUNCTION `pastPosition`(`who` INT(11), `when` DATE) RETURNS int(11)
      BEGIN
          DECLARE `result` int(11);
          SELECT `position_id` INTO `result`
              FROM `position_history`
              WHERE `when`>=`from` AND `when`<=`to` AND `employee_id`=`who`;
          RETURN `result`;
      END$$
      DELIMITER ;
      

      Data:

      employee table:

      id name division_id
      U.INT(11) VARCHAR(50) U.INT(11)
      1 'Mrs Robinson' 2
      2 'Joe DiMaggio' 2
      3 'Paul Simon' 1
      4 'Art Garfunkel' 1

      position table:

      id position_name is_management
      U.INT(11) VARCHAR(50) TINYINT(1)
      1 'Grand Poobah' 1
      2 'Average Poobah' 1
      3 'Serf' 0

      position_history table:

      from to employee_id position_id
      DATE DATE U.INT(11) U.INT(11)
      '1972-01-01' '1988-12-31' 3 1
      '1972-01-01' '1988-12-31' 4 1
      '1972-01-01' '1988-12-31' 1 2
      '1972-01-01' '1988-12-31' 2 3

      division table:

      id name
      U.INT(11) VARCHAR(50)
      1 'Songwriter'
      2 'Song Character'

      Attachments

        1. test.sql
          0.2 kB
        2. prime.sql
          1 kB

        Activity

          alice Alice Sherepa added a comment -

          Thanks a lot!
          Reproducible on MariaDB 5.5-10.4 with optimizer_switch='materialization=on'.

          CREATE TABLE t1 (id int, a varchar(50), b int);
          INSERT INTO t1 VALUES (1, 'mrs', 2),(2, 'joe', 2), (3, 'paul', 1),(4, 'art', 1);
           
          CREATE TABLE t2 (id int, a varchar(50), x int);
          INSERT INTO t2 VALUES(1, 'grand', 1),(2, 'average', 1),(3, 'serf', 0);
           
          CREATE TABLE t3 (d1 date, d2 date, t1_id int, t2_id int );
          INSERT INTO t3 VALUES ('1972-01-01','1988-12-31', 3, 1),('1972-01-01','1988-12-31', 4, 1),('1972-01-01','1988-12-31', 1, 2),('1972-01-01','1988-12-31', 2, 3);
           
          CREATE TABLE t4 ( id int, a varchar(50) );
          INSERT INTO t4 VALUES (1, 'songwriter'),(2, 'song character');
           
          delimiter $$;
          create function f1(who int, dt date) returns int 
          begin
              declare result int;
              select t2_id into result from t3 where dt>=d1 and dt<=d2 and t1_id=who;
              return result;
          end$$
          delimiter ;$$
           
          select * from t1
              left join t4 on t1.b = t4.id
              where f1(t1.id, '1980-01-01') in (select id from t2 where x=1);
           
          drop table t1,t2,t3,t4;
          drop function f1;
          

           
          MariaDB [test]> select * from t1
              ->     left join t4 on t1.b = t4.id
              ->     where f1(t1.id, '1980-01-01') in (select id from t2 where x=1);
          Empty set (0.01 sec)
           
          MariaDB [test]> set optimizer_switch='materialization=off';
          Query OK, 0 rows affected (0.00 sec)
           
          MariaDB [test]> select * from t1     left join t4 on t1.b = t4.id     where f1(t1.id, '1980-01-01') in (select id from t2 where x=1);+------+------+------+------+----------------+
          | id   | a    | b    | id   | a              |
          +------+------+------+------+----------------+
          |    3 | paul |    1 |    1 | songwriter     |
          |    4 | art  |    1 |    1 | songwriter     |
          |    1 | mrs  |    2 |    2 | song character |
          +------+------+------+------+----------------+
          3 rows in set (0.01 sec)
           
          MariaDB [test]> explain extended 
              -> select * from t1
              ->     left join t4 on t1.b = t4.id
              ->     where f1(t1.id, '1980-01-01') in (select id from t2 where x=1);
          +------+--------------+-------------+--------+---------------+--------------+---------+------+------+----------+-------------------------------------------------+
          | id   | select_type  | table       | type   | possible_keys | key          | key_len | ref  | rows | filtered | Extra                                           |
          +------+--------------+-------------+--------+---------------+--------------+---------+------+------+----------+-------------------------------------------------+
          |    1 | PRIMARY      | t1          | ALL    | NULL          | NULL         | NULL    | NULL |    4 |   100.00 |                                                 |
          |    1 | PRIMARY      | <subquery2> | eq_ref | distinct_key  | distinct_key | 4       | func |    1 |   100.00 | Using where                                     |
          |    1 | PRIMARY      | t4          | ALL    | NULL          | NULL         | NULL    | NULL |    2 |   100.00 | Using where; Using join buffer (flat, BNL join) |
          |    2 | MATERIALIZED | t2          | ALL    | NULL          | NULL         | NULL    | NULL |    3 |   100.00 | Using where                                     |
          +------+--------------+-------------+--------+---------------+--------------+---------+------+------+----------+-------------------------------------------------+
          4 rows in set, 1 warning (0.00 sec)
           
          Note (Code 1003): select `test`.`t1`.`id` AS `id`,`test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS `b`,`test`.`t4`.`id` AS `id`,`test`.`t4`.`a` AS `a` from `test`.`t1` semi join (`test`.`t2`) left join `test`.`t4` on(`test`.`t4`.`id` = `test`.`t1`.`b`) where `test`.`t2`.`x` = 1 and `f1`(`test`.`t1`.`id`,'1980-01-01') = `test`.`t2`.`id`
          
          

          mysql> select * from t1
              ->     left join t4 on t1.b = t4.id
              ->     where f1(t1.id, '1980-01-01') in (select id from t2 where x=1);
          +------+------+------+------+----------------+
          | id   | a    | b    | id   | a              |
          +------+------+------+------+----------------+
          |    3 | paul |    1 |    1 | songwriter     |
          |    4 | art  |    1 |    1 | songwriter     |
          |    1 | mrs  |    2 |    2 | song character |
          +------+------+------+------+----------------+
          3 rows in set (0.00 sec)
          

          alice Alice Sherepa added a comment - Thanks a lot! Reproducible on MariaDB 5.5-10.4 with optimizer_switch='materialization=on'. CREATE TABLE t1 (id int , a varchar (50), b int ); INSERT INTO t1 VALUES (1, 'mrs' , 2),(2, 'joe' , 2), (3, 'paul' , 1),(4, 'art' , 1);   CREATE TABLE t2 (id int , a varchar (50), x int ); INSERT INTO t2 VALUES (1, 'grand' , 1),(2, 'average' , 1),(3, 'serf' , 0);   CREATE TABLE t3 (d1 date , d2 date , t1_id int , t2_id int ); INSERT INTO t3 VALUES ( '1972-01-01' , '1988-12-31' , 3, 1),( '1972-01-01' , '1988-12-31' , 4, 1),( '1972-01-01' , '1988-12-31' , 1, 2),( '1972-01-01' , '1988-12-31' , 2, 3);   CREATE TABLE t4 ( id int , a varchar (50) ); INSERT INTO t4 VALUES (1, 'songwriter' ),(2, 'song character' );   delimiter $$; create function f1(who int , dt date ) returns int begin declare result int ; select t2_id into result from t3 where dt>=d1 and dt<=d2 and t1_id=who; return result; end $$ delimiter ;$$   select * from t1 left join t4 on t1.b = t4.id where f1(t1.id, '1980-01-01' ) in ( select id from t2 where x=1);   drop table t1,t2,t3,t4; drop function f1;   MariaDB [test]> select * from t1 -> left join t4 on t1.b = t4.id -> where f1(t1.id, '1980-01-01') in (select id from t2 where x=1); Empty set (0.01 sec)   MariaDB [test]> set optimizer_switch='materialization=off'; Query OK, 0 rows affected (0.00 sec)   MariaDB [test]> select * from t1 left join t4 on t1.b = t4.id where f1(t1.id, '1980-01-01') in (select id from t2 where x=1);+------+------+------+------+----------------+ | id | a | b | id | a | +------+------+------+------+----------------+ | 3 | paul | 1 | 1 | songwriter | | 4 | art | 1 | 1 | songwriter | | 1 | mrs | 2 | 2 | song character | +------+------+------+------+----------------+ 3 rows in set (0.01 sec)   MariaDB [test]> explain extended -> select * from t1 -> left join t4 on t1.b = t4.id -> where f1(t1.id, '1980-01-01') in (select id from t2 where x=1); +------+--------------+-------------+--------+---------------+--------------+---------+------+------+----------+-------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+--------------+-------------+--------+---------------+--------------+---------+------+------+----------+-------------------------------------------------+ | 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 4 | 100.00 | | | 1 | PRIMARY | <subquery2> | eq_ref | distinct_key | distinct_key | 4 | func | 1 | 100.00 | Using where | | 1 | PRIMARY | t4 | ALL | NULL | NULL | NULL | NULL | 2 | 100.00 | Using where; Using join buffer (flat, BNL join) | | 2 | MATERIALIZED | t2 | ALL | NULL | NULL | NULL | NULL | 3 | 100.00 | Using where | +------+--------------+-------------+--------+---------------+--------------+---------+------+------+----------+-------------------------------------------------+ 4 rows in set, 1 warning (0.00 sec)   Note (Code 1003): select `test`.`t1`.`id` AS `id`,`test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS `b`,`test`.`t4`.`id` AS `id`,`test`.`t4`.`a` AS `a` from `test`.`t1` semi join (`test`.`t2`) left join `test`.`t4` on(`test`.`t4`.`id` = `test`.`t1`.`b`) where `test`.`t2`.`x` = 1 and `f1`(`test`.`t1`.`id`,'1980-01-01') = `test`.`t2`.`id` mysql> select * from t1 -> left join t4 on t1.b = t4.id -> where f1(t1.id, '1980-01-01') in (select id from t2 where x=1); +------+------+------+------+----------------+ | id | a | b | id | a | +------+------+------+------+----------------+ | 3 | paul | 1 | 1 | songwriter | | 4 | art | 1 | 1 | songwriter | | 1 | mrs | 2 | 2 | song character | +------+------+------+------+----------------+ 3 rows in set (0.00 sec)

          After adding to the function the specifier 'deterministic'

          create function f1(who int, dt date) returns int 
          begin
              declare result int;
              select t2_id into result from t3 where dt>=d1 and dt<=d2 and t1_id=who;
              return result;
          end$$
          

          the query returns the expected result.

          igor Igor Babaev (Inactive) added a comment - After adding to the function the specifier 'deterministic' create function f1(who int , dt date ) returns int begin declare result int ; select t2_id into result from t3 where dt>=d1 and dt<=d2 and t1_id=who; return result; end $$ the query returns the expected result.

          The results of my analysis:
          The output of EXPLAIN EXTENDED is

          MariaDB [test]> explain extended select * from t1
              ->     left join t4 on t1.b = t4.id
              ->     where f1(t1.id, '1980-01-01') in (select id from t2 where x=1);
          +------+--------------+-------------+--------+---------------+--------------+---------+------+------+----------+-------------------------------------------------+
          | id   | select_type  | table       | type   | possible_keys | key          | key_len | ref  | rows | filtered | Extra                                           |
          +------+--------------+-------------+--------+---------------+--------------+---------+------+------+----------+-------------------------------------------------+
          |    1 | PRIMARY      | t1          | ALL    | NULL          | NULL         | NULL    | NULL |    4 |   100.00 |                                                 |
          |    1 | PRIMARY      | <subquery2> | eq_ref | distinct_key  | distinct_key | 4       | func |    1 |   100.00 | Using where                                     |
          |    1 | PRIMARY      | t4          | ALL    | NULL          | NULL         | NULL    | NULL |    2 |   100.00 | Using where; Using join buffer (flat, BNL join) |
          |    2 | MATERIALIZED | t2          | ALL    | NULL          | NULL         | NULL    | NULL |    3 |   100.00 | Using where                                     |
          +------+--------------+-------------+--------+---------------+--------------+---------+------+------+----------+-------------------------------------------------+
          MariaDB [test]> show warnings;
          +-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
          | Level | Code | Message                                                                                                                                                                                                                                                                                                                          |
          +-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
          | Note  | 1003 | select `test`.`t1`.`id` AS `id`,`test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS `b`,`test`.`t4`.`id` AS `id`,`test`.`t4`.`a` AS `a` from `test`.`t1` semi join (`test`.`t2`) left join `test`.`t4` on((`test`.`t4`.`id` = `test`.`t1`.`b`)) where ((`test`.`t2`.`x` = 1) and (`f1`(`test`.`t1`.`id`,'1980-01-01') = `test`.`t2`.`id`)) |
          +-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
          

          From this output I see that the IN subquery in converted to a semi-join with a extra conjunct in
          WHERE (`f1`(`test`.`t1`.`id`,'1980-01-01') = `test`.`t2`.`id`)) and that semi-join is materialized.
          I debugger I see that the added conjunct is attached to the result of materialization of the semi-join as (`f1`(`test`.`t1`.`id`,'1980-01-01') = `sj-materialize`.`id`) and to the last table t4 as
          (`f1`(`test`.`t1`.`id`,'1980-01-01') = `test`.`t2`.`id`).
          The latter condition cannot be correctly evaluated as table t2 is used only during materialization and after this `t2`.`id` is set to NULL.

          igor Igor Babaev (Inactive) added a comment - The results of my analysis: The output of EXPLAIN EXTENDED is MariaDB [test]> explain extended select * from t1 -> left join t4 on t1.b = t4.id -> where f1(t1.id, '1980-01-01') in (select id from t2 where x=1); +------+--------------+-------------+--------+---------------+--------------+---------+------+------+----------+-------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +------+--------------+-------------+--------+---------------+--------------+---------+------+------+----------+-------------------------------------------------+ | 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 4 | 100.00 | | | 1 | PRIMARY | <subquery2> | eq_ref | distinct_key | distinct_key | 4 | func | 1 | 100.00 | Using where | | 1 | PRIMARY | t4 | ALL | NULL | NULL | NULL | NULL | 2 | 100.00 | Using where; Using join buffer (flat, BNL join) | | 2 | MATERIALIZED | t2 | ALL | NULL | NULL | NULL | NULL | 3 | 100.00 | Using where | +------+--------------+-------------+--------+---------------+--------------+---------+------+------+----------+-------------------------------------------------+ MariaDB [test]> show warnings; +-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Level | Code | Message | +-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Note | 1003 | select `test`.`t1`.`id` AS `id`,`test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS `b`,`test`.`t4`.`id` AS `id`,`test`.`t4`.`a` AS `a` from `test`.`t1` semi join (`test`.`t2`) left join `test`.`t4` on((`test`.`t4`.`id` = `test`.`t1`.`b`)) where ((`test`.`t2`.`x` = 1) and (`f1`(`test`.`t1`.`id`,'1980-01-01') = `test`.`t2`.`id`)) | +-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ From this output I see that the IN subquery in converted to a semi-join with a extra conjunct in WHERE (`f1`(`test`.`t1`.`id`,'1980-01-01') = `test`.`t2`.`id`)) and that semi-join is materialized. I debugger I see that the added conjunct is attached to the result of materialization of the semi-join as (`f1`(`test`.`t1`.`id`,'1980-01-01') = `sj-materialize`.`id`) and to the last table t4 as (`f1`(`test`.`t1`.`id`,'1980-01-01') = `test`.`t2`.`id`). The latter condition cannot be correctly evaluated as table t2 is used only during materialization and after this `t2`.`id` is set to NULL.

          Review input provided

          psergei Sergei Petrunia added a comment - Review input provided

          (Will reply to the second patch today)

          psergei Sergei Petrunia added a comment - (Will reply to the second patch today)

          A fix for this bug was pushed into 5.5.
          It should be merged upstream as it is.

          igor Igor Babaev (Inactive) added a comment - A fix for this bug was pushed into 5.5. It should be merged upstream as it is.

          People

            igor Igor Babaev (Inactive)
            skotos Scott Tester
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.