[MDEV-3708] LP:928048 - Query containing IN subquery with OR in the where clause returns a wrong result Created: 2012-02-07  Updated: 2015-02-02  Resolved: 2012-10-04

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Igor Babaev Assignee: Sergei Petrunia
Resolution: Fixed Votes: 0
Labels: Launchpad

Attachments: XML File LPexportBug928048.xml    

 Description   

A query with IN subquery that can be converted to a semi-join may return a wrong result in maridb-5.3 if the where clause of the subquery contains OR condition.

The following test case provides such a query.

create table t1 (a int, b int);
insert into t1 values (7,5), (3,3), (5,4), (9,3);

create table t2 (a int, b int, index i_a(a));

insert into t2 values
(4,2), (7,9), (7,4), (3,1), (5,3), (3,1), (9,4), (8,1);

set optimizer_switch='semijoin=on,materialization=on';
select * from t1 where t1.a in (select a from t2 where t2.a=7 or t2.b<=1);

The query in from the test case returns a wrong result if the optimizer switch flags 'semijoin' and 'materialization' are set to 'on', a it returns the correct answer if these flags are set to 'off'.

MariaDB [test]> set optimizer_switch='semijoin=on,materialization=on';
Query OK, 0 rows affected (0.00 sec)

MariaDB [test]> select * from t1 where t1.a in (select a from t2 where t2.a=7 or t2.b<=1);
----------+

a b

----------+

7 5
3 3
5 4
9 3

----------+
4 rows in set (0.00 sec)

MariaDB [test]> set optimizer_switch='semijoin=off,materialization=off';
Query OK, 0 rows affected (0.00 sec)

MariaDB [test]> select * from t1 where t1.a in (select a from t2 where t2.a=7 or t2.b<=1);
----------+

a b

----------+

7 5
3 3

----------+
2 rows in set (0.00 sec)

The warning returned by EXPLAIN EXTENDED executed for the query with
optimizer_switch set to 'semijoin=on,materialization=on'
shows that it happens because in this case the optimizer generates an invalid execution plan:

MariaDB [test]> set optimizer_switch='semijoin=on,materialization=on';
Query OK, 0 rows affected (0.00 sec)

MariaDB [test]> explain extended
-> select * from t1 where t1.a in (select a from t2 where t2.a=7 or t2.b<=1);
--------------------------------------------------------------------------------------

id select_type table type possible_keys key key_len ref rows filtered Extra

--------------------------------------------------------------------------------------

1 PRIMARY t1 ALL NULL NULL NULL NULL 4 100.00  
1 PRIMARY <subquery2> eq_ref distinct_key distinct_key 5 func 1 100.00  
2 MATERIALIZED t2 ALL i_a NULL NULL NULL 8 100.00  

--------------------------------------------------------------------------------------
3 rows in set, 1 warning (0.00 sec)

MariaDB [test]> show warnings;
------------------------------------------------------------------------------------------------------------------------------------------------------------------

Level Code Message

------------------------------------------------------------------------------------------------------------------------------------------------------------------

Note 1003 select `test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS `b` from `test`.`t1` semi join (`test`.`t2`) where (((`test`.`t1`.`a` = 7) or (`test`.`t2`.`b` <= 1)))

------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 row in set (0.00 sec)



 Comments   
Comment by Sergei Petrunia [ 2012-02-08 ]

Re: Query containing IN subquery with OR in the where clause returns a wrong result
== Analysis ==

Equality propagation converts the WHERE clause into this:

(multiple equal(7, t1.a, t2.a) or (t2.b <= 1))
and
multiple equal(t1.a, t2.a)

This is ok.

Then, equality substitution produces this WHERE clause:

(t1.a = 7) or (t2.b <= 1)

we dont expect this kind of WHERE clauses to be produced when
SJ-Materialization-Lookup strategy is used.

With that strategy, we expect that the WHERE clause can be broken into two
AND-parts:

  • Part#1. depend only on SJ-inner tables
  • Part#2. depend only on SJ-outer tables

The only thing joining the two parts is the IN-equality. IN-equality is not
put into the WHERE condition, it is generated and checked inside the
SJ-Materialization code.

However, make_join_select() gets this clause:

(t1.a = 7) or (t2.b <= 1)

which can only be checked when one has both t1.a and t2.b. This never happens,
so make_join_select() is unable to attach this condition anywhere, and it is
never checked.

Comment by Igor Babaev [ 2012-02-08 ]

Re: Query containing IN subquery with OR in the where clause returns a wrong result
Some fix of this bug was pushed into mysql-5.6 (see the fix for bug #13414014).
I did not review this fix, so I cannot say anything about about its validity.

Comment by Sergei Petrunia [ 2012-02-13 ]

Re: Query containing IN subquery with OR in the where clause returns a wrong result
== Analysis continued ==

Equality propagation assumes certain ordering of tables, and tries to
move make all substitutions to evaluate as early as possible.

Graphically, if we draw join order like this:

>tb1->-tbl2->-tbl3-...

equality propagation will try pushing conditions to the left. The problem
with semi-joins (or bushy joins in general) is that there is no global
ordering anymore.

If we take a query

select * from ot1,ot2,ot3,ot4... where col in (select ... from it1, it2)

and its SJM query plan:

>ot1-ot2>------------(sjm)>ot3--ot4-....

it1>it2>+

then one can push the condition left along the top line through tables otX, or
go down to tables itX, and follow to the left along the lower level.

What equality substitution currently does is to assume the ordering of
(ot1, ot2, it1, it2, sjm, ot3, ...) and mostly ignore the fact that there are
multiple ways to go to the left.

This almost works, because materialization is only applied to un-
correlated queries, which gives us warranty that there are no expressions
that refer to both an otX.col and an itY.col.
The only exception is the IN-equality, which usually has the form of

otX.col=itY.col

presense of equalities of this form changes a lot.

Comment by Sergei Petrunia [ 2012-02-13 ]

Re: Query containing IN subquery with OR in the where clause returns a wrong result
=== Deficiency ===

First, the approach "move each condition as much as possible to the left"
doesn't guarantee maximum possible filtering anymore. When one is following left
and he reaches the (sjm) split, he should not take one way. He should use both!

Consider the queries:

Q1 select * from t1 where t1.col in (select t2.col from t2 where cond(t2.col))
Q2 select * from t1 where t1.col in (select t2.col from t2) and cond(t1.col)

Suppose they both are executed with this query plan:

id select_type table type
1 PRIMARY t1 ALL
1 PRIMARY <subquery2> eq_ref
2 MATERIALIZED t2 ALL

it is apparent that, for both queries

1. cond(t1.col) should be attached to table t1
2. cond(t2.col) should be attached to table t2

yet, current equality propagation/substituion scheme will not do that, at least
not when 'cond' is a condition other than equality.

Fixing this is a non-trivial though, because the fix will need to change the
whole concept of equality substition from the

walk through the WHERE clause and substitute certain occurences of
"tblX.colA" for "tblY.colB"

to something else. (the above definition doesn't cover substitution of
Item_equal objects)

Comment by Sergei Petrunia [ 2012-02-13 ]

Re: Query containing IN subquery with OR in the where clause returns a wrong result
=== Fixing the wrong query result ===

Let's set a more modest goal of

  • not producing wrong query results
  • not generating regressions wrt non-semijoin materialization

The idea is that given this chart

>ot1-ot2>------------(sjm)>ot3--ot4-....

it1>it2>+

we

"must not attempt to move a condition from one line to another"

e.g. don't move the condition from the top line into the bottom or vice
versa. (This bug is a result of us moving one part of OR-clause from
bottom to the top).

The implementation of is:

  • Don't substitute SJ-inner columns for SJ-outer, or vice versa (however it
    is always okay to substitute with references to const tables)
  • When exploding multiple-equalities:

if (equality has a const item)

{ for each equality member tblX.colY produce "tblX.colY=const_item" }

else
{
for X in

{top_level, each semi-join nest}

{ FIRST= first member in X; for each other element OTHER produce "FIRST=OTHER" }

}

As for not producing equalities that were "fixed on the upper level":
don't produce them only as long as they would have been produced at this
level.

Comment by Rasmus Johansson (Inactive) [ 2012-02-20 ]

Launchpad bug id: 928048

Generated at Thu Feb 08 06:50:37 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.