[MDEV-3574] LP:951283 - Wrong result (missing rows) with semijoin+firstmatch, IN/ANY subquery Created: 2012-03-10  Updated: 2015-02-02  Resolved: 2012-10-04

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Elena Stepanova Assignee: Sergei Petrunia
Resolution: Fixed Votes: 0
Labels: Launchpad

Attachments: XML File LPexportBug951283.xml     File LPexportBug951283_bug951283.tgz    

 Description   

The following query

SELECT * FROM t1 AS alias1, t1 AS alias2, t2 AS alias3
WHERE alias3.d IN (
SELECT alias4.c FROM t2 AS alias4, t2 AS alias5
WHERE alias5.b = alias4.b
AND ( alias5.b >= alias3.b OR alias5.c != alias3.c )
)

on test data returns 2940 rows if it is executed with semijoin=on, firstmatch=on, optimizer_prune_level=0, and 3724 rows otherwise. The latter is correct.
In the test case SELECT * is replaced with SELECT COUNT for convenience, it does not change the result.

bzr version-info
revision-id: <email address hidden>
date: 2012-03-05 22:33:46 -0800
build-date: 2012-03-10 05:45:32 +0400
revno: 3455

Also reproducible on MariaDB 5.5 (revno 3316). Not reproducible on MySQL 5.6 (revno 3706).
Could not reproduce on MyISAM or Aria.

EXPLAIN with semijoin=on, firstmatch=on, optimizer_prune_level=0 (wrong result):

id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY alias3 ALL PRIMARY NULL NULL NULL 19 100.00 Using where
1 PRIMARY alias4 ref PRIMARY,c c 4 test.alias3.d 1 100.00 Using index; FirstMatch(alias3)
1 PRIMARY alias1 ALL NULL NULL NULL NULL 14 100.00
1 PRIMARY alias2 ALL NULL NULL NULL NULL 14 100.00
1 PRIMARY alias5 eq_ref PRIMARY PRIMARY 4 test.alias4.b 1 100.00 Using where; FirstMatch(alias2)
Warnings:
Note 1276 Field or reference 'test.alias3.b' of SELECT #2 was resolved in SELECT #1
Note 1276 Field or reference 'test.alias3.c' of SELECT #2 was resolved in SELECT #1
Note 1003 select count(0) AS `COUNT` from `test`.`t1` `alias1` semi join (`test`.`t2` `alias4` join `test`.`t2` `alias5`) join `test`.`t1` `alias2` join `test`.`t2` `alias3` where ((`test`.`alias4`.`c` = `test`.`alias3`.`d`) and (`test`.`alias5`.`b` = `test`.`alias4`.`b`) and ((`test`.`alias4`.`b` >= `test`.`alias3`.`b`) or (`test`.`alias5`.`c` <> `test`.`alias3`.`c`)))

EXPLAIN with semijoin=on, firstmatch=on, optimizer_prune_level=1 (correct result):

id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY alias1 ALL NULL NULL NULL NULL 14 100.00
1 PRIMARY alias2 ALL NULL NULL NULL NULL 14 100.00 Using join buffer (flat, BNL join)
1 PRIMARY alias3 ALL PRIMARY NULL NULL NULL 19 100.00 Using where; Using join buffer (incremental, BNL join)
1 PRIMARY alias4 ref PRIMARY,c c 4 test.alias3.d 1 100.00 Using index
1 PRIMARY alias5 eq_ref PRIMARY PRIMARY 4 test.alias4.b 1 100.00 Using where; FirstMatch(alias3)
Warnings:
Note 1276 Field or reference 'test.alias3.b' of SELECT #2 was resolved in SELECT #1
Note 1276 Field or reference 'test.alias3.c' of SELECT #2 was resolved in SELECT #1
Note 1003 select count(0) AS `COUNT` from `test`.`t1` `alias1` semi join (`test`.`t2` `alias4` join `test`.`t2` `alias5`) join `test`.`t1` `alias2` join `test`.`t2` `alias3` where ((`test`.`alias4`.`c` = `test`.`alias3`.`d`) and (`test`.`alias5`.`b` = `test`.`alias4`.`b`) and ((`test`.`alias4`.`b` >= `test`.`alias3`.`b`) or (`test`.`alias5`.`c` <> `test`.`alias3`.`c`)))

Minimal optimizer_switch: firstmatch=on,semijoin=on
Full optimizer_switch (default): index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on

Test case:

--source include/have_innodb.inc

SET optimizer_switch='firstmatch=on,semijoin=on';
SET optimizer_prune_level=0;

CREATE TABLE t1 ( a INT ) ENGINE=InnoDB;
INSERT INTO t1 VALUES
(10),(11),(12),(13),(14),(15),(16),
(17),(18),(19),(20),(21),(22),(23);

CREATE TABLE t2 (
b INT PRIMARY KEY,
c VARCHAR(1),
d VARCHAR(1),
KEY(c)
) ENGINE=InnoDB;

INSERT INTO t2 VALUES
(1,'j','j'),(2,'v','v'),(3,'c','c'),(4,'m','m'),
(5,'d','d'),(6,'d','d'),(7,'y','y'),(8,'t','t'),
(9,'d','d'),(10,'s','s'),(11,'r','r'),(12,'m','m'),
(13,'b','b'),(14,'x','x'),(15,'g','g'),(16,'p','p'),
(17,'q','q'),(18,'w','w'),(19,'d','d');

SELECT COUNT FROM t1 AS alias1, t1 AS alias2, t2 AS alias3
WHERE alias3.d IN (
SELECT alias4.c FROM t2 AS alias4, t2 AS alias5
WHERE alias5.b = alias4.b
AND ( alias5.b >= alias3.b OR alias5.c != alias3.c )
);

  1. End of test case
  1. Expected result
  2. COUNT
  3. 3724
  1. Actual result
  2. COUNT
  3. 2940


 Comments   
Comment by Elena Stepanova [ 2012-03-10 ]

Re: Wrong result (missing rows) with semijoin+firstmatch, IN/ANY subquery
While the provided test requires optimizer_prune_level=0 and InnoDB tables, later I had similar mismatches with MyISAM tables and default optimizer_prune_level, so I am removing these conditions from the bug summary.

Comment by Sergei Petrunia [ 2012-03-11 ]

Re: Wrong result (missing rows) with semijoin+firstmatch, IN/ANY subquery
The query plan seems to be valid.

I've compared the query output resultsets:

  • semi-join's resultset is a proper subset of non-semijoin's resultset
  • whether a row is missing is a function of value of alias3.b: rows with values of 6,9,12,19 are missing. All others are present.
Comment by Sergei Petrunia [ 2012-03-11 ]

Re: Wrong result (missing rows) with semijoin+firstmatch, IN/ANY subquery
The join order is:

O alias3
I alias4 FirstMatch(alias3)
n alias1
n alias2
I alias5 FirstMatch(alias2)

Here "O" means outer, "I" means inner, "n" - outer, not correlated with the subquery table.

When I debug to see why records with alias3.b=6 do not make it to the output, I see this scenario:

  • the code arrives at sub_select(...,alias4, .. ) call
  • it sets join->return_tab= {alias3}

    // this what FirstMatch is about.

  • then it continues execution. The code builds full record combinations, but neither of them matches the condition attached to table alias5.
  • at some point, execution will return into sub_select(...,alias4, .. ) function.
    At this point:
  • It should search for next matching record, because no match have been found yet.
  • It does not do that, though. Instead, it jumps out here:

if (join->return_tab < join_tab)
DBUG_RETURN(NESTED_LOOP_OK);

Apparently, FirstMatch code is malfunctioning when it is handling the case where inner tables are interleaved with outer tables.

Comment by Sergei Petrunia [ 2012-03-26 ]

Re: Wrong result (missing rows) with semijoin+firstmatch, IN/ANY subquery
Alternate proposal: unfinished patch and exploration

Comment by Sergei Petrunia [ 2012-03-26 ]

Alternate proposal: unfinished patch and exploration
bug951283.tgz
LPexportBug951283_bug951283.tgz

Comment by Rasmus Johansson (Inactive) [ 2012-04-03 ]

Launchpad bug id: 951283

Generated at Thu Feb 08 06:49:37 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.