[MDEV-2722] LP:951937 - Wrong result (missing rows) with semijoin+materialization, IN subquery, InnoDB, TEMPTABLE view or derived_merge=off Created: 2012-03-11  Updated: 2015-02-02  Resolved: 2012-10-04

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Elena Stepanova Assignee: Sergei Petrunia
Resolution: Fixed Votes: 0
Labels: Launchpad

Attachments: XML File LPexportBug951937.xml    

 Description   

The following query

SELECT * FROM v
WHERE ( a, a ) IN (
SELECT alias2.b, alias2.a
FROM t1 AS alias1, t1 AS alias2
WHERE alias2.b = alias1.a
AND ( alias1.b >= alias1.a OR alias2.b = 'z' )

on the test data returns 6 rows if it's executed with semijoin=on and materialization=on, and 19 rows otherwise. The latter is correct.
On a variation of the test data, where column b is nullable, the query returns no rows at all. This variation is added to the test case as ALTER TABLE followed by the same query.

bzr version-info
revision-id: <email address hidden>
date: 2012-03-05 22:33:46 -0800
build-date: 2012-03-11 05:27:06 +0400
revno: 3455

Also reproducible on 5.5 (revno 3316).
Not reproducible on MySQL 5.6 (revno 3706).

EXPLAIN with semijoin=on, materialization=on (wrong result):

id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY <subquery2> ALL distinct_key NULL NULL NULL 19 100.00
1 PRIMARY <derived3> ALL NULL NULL NULL NULL 19 100.00 Using where; Using join buffer (flat, BNL join)
2 MATERIALIZED alias1 ALL a NULL NULL NULL 19 100.00 Using where
2 MATERIALIZED alias2 ref a a 4 test.alias1.a 1 100.00 Using where
3 DERIVED t1 ALL NULL NULL NULL NULL 19 100.00
Warnings:
Note 1003 select `v`.`a` AS `a`,`v`.`b` AS `b` from `test`.`v` semi join (`test`.`t1` `alias1` join `test`.`t1` `alias2`) where ((`test`.`alias2`.`a` = `test`.`alias1`.`a`) and (`test`.`alias2`.`b` = `test`.`alias1`.`a`) and (`v`.`a` = `test`.`alias1`.`a`) and ((`test`.`alias1`.`b` >= `test`.`alias1`.`a`) or ((`test`.`alias1`.`a` = 'z') and (`v`.`a` = 'z'))))

Minimal optimizer_switch: materialization=on,semijoin=on
Full optimizer_switch (default): index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on

Test case (with the variation):

--source include/have_innodb.inc
SET SESSION optimizer_switch = 'materialization=on,semijoin=on';

CREATE TABLE t1 (
a VARCHAR(1),
b VARCHAR(1) NOT NULL,
KEY(a)
) ENGINE=InnoDB;
INSERT INTO t1 VALUES
('j','j'),('v','v'),('c','c'),('m','m'),('d','d'),
('y','y'),('t','t'),('d','d'),('s','s'),('r','r'),
('m','m'),('b','b'),('x','x'),('g','g'),('p','p'),
('q','q'),('w','w'),('d','d'),('e','e');

CREATE ALGORITHM=TEMPTABLE VIEW v AS SELECT * FROM t1;

  1. This query returns 6 rows instead of 19

SELECT * FROM v
WHERE ( a, a ) IN (
SELECT alias2.b, alias2.a
FROM t1 AS alias1, t1 AS alias2
WHERE alias2.b = alias1.a
AND ( alias1.b >= alias1.a OR alias2.b = 'z' )
);

  1. End of the main part.
  2. The rest is the test case variation, where we make column b nullable

ALTER TABLE t1 MODIFY COLUMN b VARCHAR(1);

  1. This query returns an empty set

SELECT * FROM v
WHERE ( a, a ) IN (
SELECT alias2.b, alias2.a
FROM t1 AS alias1, t1 AS alias2
WHERE alias2.b = alias1.a
AND ( alias1.b >= alias1.a OR alias2.b = 'z' )
);

  1. End of test case


 Comments   
Comment by Sergei Petrunia [ 2012-03-19 ]

Re: Wrong result (missing rows) with semijoin+materialization, IN subquery, InnoDB, TEMPTABLE view or derived_merge=off
The part of where condition attached to to the materialized VIEW seems to be wrong:

#0 evaluate_join_record (join=0x9f54a28, join_tab=0x9f5cdb8, error=0) at sql_select.cc:15531
#1 0x08372131 in sub_select (join=0x9f54a28, join_tab=0x9f5cdb8, end_of_records=false) at sql_select.cc:15436
#2 0x0836341b in evaluate_join_record (join=0x9f54a28, join_tab=0x9f5cbb0, error=0) at sql_select.cc:15633
#3 0x08372131 in sub_select (join=0x9f54a28, join_tab=0x9f5cbb0, end_of_records=false) at sql_select.cc:15436
#4 0x083733b2 in do_select (join=0x9f54a28, fields=0x9edb54c, table=0x0, procedure=0x0) at sql_select.cc:15097
#5 0x0838c2b7 in JOIN::exec (this=0x9f54a28) at sql_select.cc:2731
(gdb) p join_tab->table->alias.Ptr
$233 = 0x9f3f158 "v"
(gdb) p dbug_print_item(select_cond)
$234 = 0x8b9f560 "((`j2`.`t1a`.`b` >= `j2`.`t1a`.`a`) or ((`j2`.`t1a`.`a` = 'z') and (`v`.`a` = 'z')))"

Comment by Sergei Petrunia [ 2012-03-19 ]

Re: Wrong result (missing rows) with semijoin+materialization, IN subquery, InnoDB, TEMPTABLE view or derived_merge=off
No, I am not sure whether it's the WHERE clause is wrong or something else is.

Comment by Sergei Petrunia [ 2012-03-19 ]

Re: Wrong result (missing rows) with semijoin+materialization, IN subquery, InnoDB, TEMPTABLE view or derived_merge=off
the wrong result is caused by the following:

this part of WHERE condition:

"(`j2`.`t1a`.`b` >= `j2`.`t1a`.`a`)"

is attached to table v (i.e. <derived3>), and it evaluates to FALSE. This is wrong because the dataset has t1.a=t1.b for all rows.

Comment by Sergei Petrunia [ 2012-03-19 ]

Re: Wrong result (missing rows) with semijoin+materialization, IN subquery, InnoDB, TEMPTABLE view or derived_merge=off
t1a.a has the right value, t1a.b has the wrong one.

Comment by Sergei Petrunia [ 2012-03-19 ]

Re: Wrong result (missing rows) with semijoin+materialization, IN subquery, InnoDB, TEMPTABLE view or derived_merge=off
Sure, one can expect that t1a.b will have the wrong value: it is not in
subquery's select list, and we're referring to it from outside the SJM nest.

The question is why we're doing that. The clause

(t1a.b >= t1a.a OR t1b.b = 'z')

was originally inside the subquery, and has been attached to the subquery's
table. Why do we see parts of it attached to outside table...

Comment by Sergei Petrunia [ 2012-03-19 ]

Re: Wrong result (missing rows) with semijoin+materialization, IN subquery, InnoDB, TEMPTABLE view or derived_merge=off
WHERE condition after equality substitution:

(t1b.a = t1a.a) and
(t1b.b = t1a.a) and
(v.a = t1a.a) and
(t1a.b >= t1a.a or (t1a.a = 'z' and v.a = 'z'))"

so, t1b.b = 'z' got converted into (t1a.a = 'z' and v.a = 'z') (**)

We got it, because we've had a multi-equality of:

t1a.a=t1b.a=t1b.b=v.a

and a join order of

{v, t1a, t1b }

. The first element is `v`, the first one inside the sjm nest is `t1a`. Substitution (**) is generally useful, but not when we get it inside an OR clause. When we do it inside an OR clause, that has other OR-part that can be evaluated only inside the subquery, we end up with un-evaluable clause.

Comment by Sergei Petrunia [ 2012-03-20 ]

Re: Wrong result (missing rows) with semijoin+materialization, IN subquery, InnoDB, TEMPTABLE view or derived_merge=off
Before the equality substitutiuon, the WHERE clause has this form (both in maria-5.3 and mysql-5.6):

"(((t1a.b >= t1a.a) or multiple equal('z', t1b.b, t1a.a, v.a, t1b.a)) and multiple equal(t1b.b, t1a.a, v.a, t1b.a))"

the multiple-equal inside the OR clause is the same as the outer one, except that it also includes constant 'z'.

After equality substitution for the inner multiple equal('z', t1b.b, t1a.a, v.a, t1b.a) , we get

In MariaDB:
$537 = 0x8b9f560 "((t1a.a = 'z') and (v.a = 'z'))"

In MySQL:
$117 = 0x8f7bea0 "((t1a.a = 'z') and (t1b.b = 'z') and (t1b.a = 'z'))"

Comment by Sergei Petrunia [ 2012-03-20 ]

Re: Wrong result (missing rows) with semijoin+materialization, IN subquery, InnoDB, TEMPTABLE view or derived_merge=off
The problem is related to fix for BUG#928048.

Comment by Sergei Petrunia [ 2012-03-21 ]

Re: Wrong result (missing rows) with semijoin+materialization, IN subquery, InnoDB, TEMPTABLE view or derived_merge=off
Consider an example:

select * from ot
where (ot.a, ot.b) in (select it.a, it.b
from it
where (it.a=111 and func1(it.c)) or (it.b=222 and func2(it.c))
)
and (ot.a=333 or func3(ot.c))

convert_subq_to_sj(), followed by equality propagation, will produce this WHERE
clause:

01 multi-equal(ot.a, it.a) and
02 multi-equal(ot.b, it.b) and
03 (multi-equal(inherited(ot.a, it.a), 111) and func1(it.c) or
04 multi-equal(inherited(ot.b, it.b), 222) and func2(it.c)
05 ) and
06 (multi-equal(inherited(ot.a, it.a), 333) or func3(ot.c))

Suppose the chosen join order is
ot, sj-materialization(it)
or the other way around, but using materialization.

Now, we do equality substitution:

  • in line 03, we must not generate equalities that include ot.a, because the
    other part of OR uses func1(it.c) can only be checked when doing
    materialization.
  • in line 06, we must not generate qualities that include it.a, because the
    other part of uses "func3(ot.c)", which cannot be checked when doing
    materialization.

This shows the problem. Suppose we're doing equality substitution for an OR
clause and its first child element is:

multi-equal(inherited(ot.a, it.a), 111)

is this a case like in #03, or like in #06? There is no way to tell.

Comment by Sergei Petrunia [ 2012-03-23 ]

Re: Wrong result (missing rows) with semijoin+materialization, IN subquery, InnoDB, TEMPTABLE view or derived_merge=off
Patch committed, need review.

Comment by Rasmus Johansson (Inactive) [ 2012-04-03 ]

Launchpad bug id: 951937

Generated at Thu Feb 08 06:43:46 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.