[MDEV-19580] Unrelated JOINs corrupt usage of 'WHERE function() IN (subquery)' Created: 2019-05-24  Updated: 2019-06-11  Resolved: 2019-06-11

Status: Closed
Project: MariaDB Server
Component/s: Optimizer, Server
Affects Version/s: 5.5, 10.0, 10.1, 10.1.38, 10.3.13, 10.2, 10.3, 10.4
Fix Version/s: 10.2.25, 5.5.65, 10.1.41, 10.3.16, 10.4.6

Type: Bug Priority: Major
Reporter: Scott Tester Assignee: Igor Babaev
Resolution: Fixed Votes: 0
Labels: None
Environment:

Debian 9 with MariaDB 10.1.38
Gentoo with MariaDB 10.3.13


Attachments: File prime.sql     File test.sql    

 Description   

In certain circumstances a LEFT JOIN can prevent a WHERE clause comparing a function's result with a subquery from working:

SELECT * FROM `employee`
    LEFT JOIN `division` ON `employee`.`division_id` = `division`.`id`
    WHERE pastPosition(`employee`.`id`, '1980-01-01') IN
        (SELECT `id` FROM `position` WHERE `is_management`=1);

Expected results:

id name division_id id name
1 Mrs Robinson 2 2 Song Character
3 Paul Simon 1 1 Songwriter
4 Art Garfunkel 1 1 Songwriter

Actual results:
No rows returned

If LEFT is omitted from the JOIN it seems to work.

pastPosition function:

CREATE FUNCTION `pastPosition`(`who` INT(11), `when` DATE) RETURNS int(11)
BEGIN
    DECLARE `result` int(11);
    SELECT `position_id` INTO `result`
        FROM `position_history`
        WHERE `when`>=`from` AND `when`<=`to` AND `employee_id`=`who`;
    RETURN `result`;
END$$
DELIMITER ;

Data:

employee table:

id name division_id
U.INT(11) VARCHAR(50) U.INT(11)
1 'Mrs Robinson' 2
2 'Joe DiMaggio' 2
3 'Paul Simon' 1
4 'Art Garfunkel' 1

position table:

id position_name is_management
U.INT(11) VARCHAR(50) TINYINT(1)
1 'Grand Poobah' 1
2 'Average Poobah' 1
3 'Serf' 0

position_history table:

from to employee_id position_id
DATE DATE U.INT(11) U.INT(11)
'1972-01-01' '1988-12-31' 3 1
'1972-01-01' '1988-12-31' 4 1
'1972-01-01' '1988-12-31' 1 2
'1972-01-01' '1988-12-31' 2 3

division table:

id name
U.INT(11) VARCHAR(50)
1 'Songwriter'
2 'Song Character'


 Comments   
Comment by Alice Sherepa [ 2019-05-24 ]

Thanks a lot!
Reproducible on MariaDB 5.5-10.4 with optimizer_switch='materialization=on'.

CREATE TABLE t1 (id int, a varchar(50), b int);
INSERT INTO t1 VALUES (1, 'mrs', 2),(2, 'joe', 2), (3, 'paul', 1),(4, 'art', 1);
 
CREATE TABLE t2 (id int, a varchar(50), x int);
INSERT INTO t2 VALUES(1, 'grand', 1),(2, 'average', 1),(3, 'serf', 0);
 
CREATE TABLE t3 (d1 date, d2 date, t1_id int, t2_id int );
INSERT INTO t3 VALUES ('1972-01-01','1988-12-31', 3, 1),('1972-01-01','1988-12-31', 4, 1),('1972-01-01','1988-12-31', 1, 2),('1972-01-01','1988-12-31', 2, 3);
 
CREATE TABLE t4 ( id int, a varchar(50) );
INSERT INTO t4 VALUES (1, 'songwriter'),(2, 'song character');
 
delimiter $$;
create function f1(who int, dt date) returns int 
begin
    declare result int;
    select t2_id into result from t3 where dt>=d1 and dt<=d2 and t1_id=who;
    return result;
end$$
delimiter ;$$
 
select * from t1
    left join t4 on t1.b = t4.id
    where f1(t1.id, '1980-01-01') in (select id from t2 where x=1);
 
drop table t1,t2,t3,t4;
drop function f1;

 
MariaDB [test]> select * from t1
    ->     left join t4 on t1.b = t4.id
    ->     where f1(t1.id, '1980-01-01') in (select id from t2 where x=1);
Empty set (0.01 sec)
 
MariaDB [test]> set optimizer_switch='materialization=off';
Query OK, 0 rows affected (0.00 sec)
 
MariaDB [test]> select * from t1     left join t4 on t1.b = t4.id     where f1(t1.id, '1980-01-01') in (select id from t2 where x=1);+------+------+------+------+----------------+
| id   | a    | b    | id   | a              |
+------+------+------+------+----------------+
|    3 | paul |    1 |    1 | songwriter     |
|    4 | art  |    1 |    1 | songwriter     |
|    1 | mrs  |    2 |    2 | song character |
+------+------+------+------+----------------+
3 rows in set (0.01 sec)
 
MariaDB [test]> explain extended 
    -> select * from t1
    ->     left join t4 on t1.b = t4.id
    ->     where f1(t1.id, '1980-01-01') in (select id from t2 where x=1);
+------+--------------+-------------+--------+---------------+--------------+---------+------+------+----------+-------------------------------------------------+
| id   | select_type  | table       | type   | possible_keys | key          | key_len | ref  | rows | filtered | Extra                                           |
+------+--------------+-------------+--------+---------------+--------------+---------+------+------+----------+-------------------------------------------------+
|    1 | PRIMARY      | t1          | ALL    | NULL          | NULL         | NULL    | NULL |    4 |   100.00 |                                                 |
|    1 | PRIMARY      | <subquery2> | eq_ref | distinct_key  | distinct_key | 4       | func |    1 |   100.00 | Using where                                     |
|    1 | PRIMARY      | t4          | ALL    | NULL          | NULL         | NULL    | NULL |    2 |   100.00 | Using where; Using join buffer (flat, BNL join) |
|    2 | MATERIALIZED | t2          | ALL    | NULL          | NULL         | NULL    | NULL |    3 |   100.00 | Using where                                     |
+------+--------------+-------------+--------+---------------+--------------+---------+------+------+----------+-------------------------------------------------+
4 rows in set, 1 warning (0.00 sec)
 
Note (Code 1003): select `test`.`t1`.`id` AS `id`,`test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS `b`,`test`.`t4`.`id` AS `id`,`test`.`t4`.`a` AS `a` from `test`.`t1` semi join (`test`.`t2`) left join `test`.`t4` on(`test`.`t4`.`id` = `test`.`t1`.`b`) where `test`.`t2`.`x` = 1 and `f1`(`test`.`t1`.`id`,'1980-01-01') = `test`.`t2`.`id`

mysql> select * from t1
    ->     left join t4 on t1.b = t4.id
    ->     where f1(t1.id, '1980-01-01') in (select id from t2 where x=1);
+------+------+------+------+----------------+
| id   | a    | b    | id   | a              |
+------+------+------+------+----------------+
|    3 | paul |    1 |    1 | songwriter     |
|    4 | art  |    1 |    1 | songwriter     |
|    1 | mrs  |    2 |    2 | song character |
+------+------+------+------+----------------+
3 rows in set (0.00 sec)

Comment by Igor Babaev [ 2019-06-01 ]

After adding to the function the specifier 'deterministic'

create function f1(who int, dt date) returns int 
begin
    declare result int;
    select t2_id into result from t3 where dt>=d1 and dt<=d2 and t1_id=who;
    return result;
end$$

the query returns the expected result.

Comment by Igor Babaev [ 2019-06-01 ]

The results of my analysis:
The output of EXPLAIN EXTENDED is

MariaDB [test]> explain extended select * from t1
    ->     left join t4 on t1.b = t4.id
    ->     where f1(t1.id, '1980-01-01') in (select id from t2 where x=1);
+------+--------------+-------------+--------+---------------+--------------+---------+------+------+----------+-------------------------------------------------+
| id   | select_type  | table       | type   | possible_keys | key          | key_len | ref  | rows | filtered | Extra                                           |
+------+--------------+-------------+--------+---------------+--------------+---------+------+------+----------+-------------------------------------------------+
|    1 | PRIMARY      | t1          | ALL    | NULL          | NULL         | NULL    | NULL |    4 |   100.00 |                                                 |
|    1 | PRIMARY      | <subquery2> | eq_ref | distinct_key  | distinct_key | 4       | func |    1 |   100.00 | Using where                                     |
|    1 | PRIMARY      | t4          | ALL    | NULL          | NULL         | NULL    | NULL |    2 |   100.00 | Using where; Using join buffer (flat, BNL join) |
|    2 | MATERIALIZED | t2          | ALL    | NULL          | NULL         | NULL    | NULL |    3 |   100.00 | Using where                                     |
+------+--------------+-------------+--------+---------------+--------------+---------+------+------+----------+-------------------------------------------------+
MariaDB [test]> show warnings;
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                                                                                                                                                                          |
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | select `test`.`t1`.`id` AS `id`,`test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS `b`,`test`.`t4`.`id` AS `id`,`test`.`t4`.`a` AS `a` from `test`.`t1` semi join (`test`.`t2`) left join `test`.`t4` on((`test`.`t4`.`id` = `test`.`t1`.`b`)) where ((`test`.`t2`.`x` = 1) and (`f1`(`test`.`t1`.`id`,'1980-01-01') = `test`.`t2`.`id`)) |
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

From this output I see that the IN subquery in converted to a semi-join with a extra conjunct in
WHERE (`f1`(`test`.`t1`.`id`,'1980-01-01') = `test`.`t2`.`id`)) and that semi-join is materialized.
I debugger I see that the added conjunct is attached to the result of materialization of the semi-join as (`f1`(`test`.`t1`.`id`,'1980-01-01') = `sj-materialize`.`id`) and to the last table t4 as
(`f1`(`test`.`t1`.`id`,'1980-01-01') = `test`.`t2`.`id`).
The latter condition cannot be correctly evaluated as table t2 is used only during materialization and after this `t2`.`id` is set to NULL.

Comment by Sergei Petrunia [ 2019-06-05 ]

Review input provided

Comment by Sergei Petrunia [ 2019-06-07 ]

(Will reply to the second patch today)

Comment by Igor Babaev [ 2019-06-11 ]

A fix for this bug was pushed into 5.5.
It should be merged upstream as it is.

Generated at Thu Feb 08 08:52:46 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.