Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
None
-
None
-
None
Description
A query with IN subquery that can be converted to a semi-join may return a wrong result in maridb-5.3 if the where clause of the subquery contains OR condition.
The following test case provides such a query.
create table t1 (a int, b int);
insert into t1 values (7,5), (3,3), (5,4), (9,3);
create table t2 (a int, b int, index i_a(a));
insert into t2 values
(4,2), (7,9), (7,4), (3,1), (5,3), (3,1), (9,4), (8,1);
set optimizer_switch='semijoin=on,materialization=on';
select * from t1 where t1.a in (select a from t2 where t2.a=7 or t2.b<=1);
The query in from the test case returns a wrong result if the optimizer switch flags 'semijoin' and 'materialization' are set to 'on', a it returns the correct answer if these flags are set to 'off'.
MariaDB [test]> set optimizer_switch='semijoin=on,materialization=on';
Query OK, 0 rows affected (0.00 sec)
MariaDB [test]> select * from t1 where t1.a in (select a from t2 where t2.a=7 or t2.b<=1);
----------+
a | b |
----------+
7 | 5 |
3 | 3 |
5 | 4 |
9 | 3 |
----------+
4 rows in set (0.00 sec)
MariaDB [test]> set optimizer_switch='semijoin=off,materialization=off';
Query OK, 0 rows affected (0.00 sec)
MariaDB [test]> select * from t1 where t1.a in (select a from t2 where t2.a=7 or t2.b<=1);
----------+
a | b |
----------+
7 | 5 |
3 | 3 |
----------+
2 rows in set (0.00 sec)
The warning returned by EXPLAIN EXTENDED executed for the query with
optimizer_switch set to 'semijoin=on,materialization=on'
shows that it happens because in this case the optimizer generates an invalid execution plan:
MariaDB [test]> set optimizer_switch='semijoin=on,materialization=on';
Query OK, 0 rows affected (0.00 sec)
MariaDB [test]> explain extended
-> select * from t1 where t1.a in (select a from t2 where t2.a=7 or t2.b<=1);
--------------------------------------------------------------------------------------
id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
--------------------------------------------------------------------------------------
1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 4 | 100.00 | |
1 | PRIMARY | <subquery2> | eq_ref | distinct_key | distinct_key | 5 | func | 1 | 100.00 | |
2 | MATERIALIZED | t2 | ALL | i_a | NULL | NULL | NULL | 8 | 100.00 |
--------------------------------------------------------------------------------------
3 rows in set, 1 warning (0.00 sec)
MariaDB [test]> show warnings;
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Level | Code | Message |
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Note | 1003 | select `test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS `b` from `test`.`t1` semi join (`test`.`t2`) where (((`test`.`t1`.`a` = 7) or (`test`.`t2`.`b` <= 1))) |
------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 row in set (0.00 sec)
Re: Query containing IN subquery with OR in the where clause returns a wrong result
== Analysis ==
Equality propagation converts the WHERE clause into this:
(multiple equal(7, t1.a, t2.a) or (t2.b <= 1))
and
multiple equal(t1.a, t2.a)
This is ok.
Then, equality substitution produces this WHERE clause:
(t1.a = 7) or (t2.b <= 1)
we dont expect this kind of WHERE clauses to be produced when
SJ-Materialization-Lookup strategy is used.
With that strategy, we expect that the WHERE clause can be broken into two
AND-parts:
The only thing joining the two parts is the IN-equality. IN-equality is not
put into the WHERE condition, it is generated and checked inside the
SJ-Materialization code.
However, make_join_select() gets this clause:
(t1.a = 7) or (t2.b <= 1)
which can only be checked when one has both t1.a and t2.b. This never happens,
so make_join_select() is unable to attach this condition anywhere, and it is
never checked.