[MDEV-4173]  Wrong result (extra row) with semijoin=on, joins in outer query, LEFT JOIN in the subquery Created: 2013-02-15  Updated: 2013-07-16  Resolved: 2013-07-16

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.0.1, 5.5.29
Fix Version/s: 10.0.4, 5.5.32

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Sergei Petrunia
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates

 Description   

The following test case produces COUNT = 23712 when it is executed with exists_to_in=off, and 23713 when it is executed with exists_to_in=on.
Postgres, MySQL 5.6 and older versions of MariaDB all return 23712, so I assume it is the correct result.

Test case:

set optimizer_switch='exists_to_in=on';
 
CREATE TABLE t1 (i1 INT) ENGINE=MyISAM;
INSERT INTO t1 VALUES 
(4),(8),(0),(0),(0),(7),(7),(5),(3),(4),(9),(6),(1),(5),(6),(2),(4),(4),
(4),(3),(3),(7),(6),(7),(9),(4),(4),(2),(1),(2),(2),(3),(8),(4),(1),(7),
(9),(4),(5),(5),(9),(3),(8),(0),(3),(1),(0),(8),(3),(3),(9),(6),(1),(0),
(8),(3),(9),(5),(9),(2),(5),(9),(1),(8),(7),(6),(2),(4),(7),(3),(8),(6);
 
CREATE TABLE t2 (i2 INT, j2 INT) ENGINE=MyISAM;
INSERT INTO t2 VALUES 
(7,1),(0,7),(9,4),(3,7),(4,0),(2,2),(5,9),(3,4),(1,0),(3,9),
(5,8),(1,8),(204,18),(224,84),(9,6),(5,3),(0,6),(6,1),(7,3);
 
CREATE TABLE t3 (i3 INT, KEY(i3)) ENGINE=MyISAM;
INSERT INTO t3 VALUES 
(0),(8),(1),(8),(9),(24),(6),(1),(6),
(2),(4),(8),(4),(4),(7),(4),(1),(9),(4);
 
SELECT COUNT(*) FROM t1 outer_t1, t2 outer_t2, t3 
WHERE EXISTS ( 
  SELECT 1 FROM t2 LEFT JOIN t3 ON ( i3 = j2 ) 
  WHERE j2 = outer_t1.i1 AND i2 <> outer_t2.j2 
);

Result with exists_to_in=off:

COUNT(*)
23712

Result with exists_to_in=on:

COUNT(*)
23713

EXPLAIN with exists_to_in=on, otherwise default optimizer_switch:

id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	PRIMARY	outer_t2	ALL	NULL	NULL	NULL	NULL	19	100.00	
1	PRIMARY	t3	index	NULL	i3	5	NULL	19	100.00	Using index; Using join buffer (flat, BNL join)
1	PRIMARY	t2	ALL	NULL	NULL	NULL	NULL	19	100.00	Using where; Start temporary; Using join buffer (incremental, BNL join)
1	PRIMARY	t3	ref	i3	i3	5	test.t2.j2	2	100.00	Using where; Using index
1	PRIMARY	outer_t1	ALL	NULL	NULL	NULL	NULL	72	100.00	Using where; End temporary; Using join buffer (flat, BNL join)
Warnings:
Note	1276	Field or reference 'test.outer_t1.i1' of SELECT #2 was resolved in SELECT #1
Note	1276	Field or reference 'test.outer_t2.j2' of SELECT #2 was resolved in SELECT #1
Note	1003	select count(0) AS `COUNT(*)` from `test`.`t1` `outer_t1` semi join (`test`.`t2` left join `test`.`t3` on(((`test`.`t3`.`i3` = `test`.`t2`.`j2`) and (`test`.`t2`.`j2` is not null)))) join `test`.`t2` `outer_t2` join `test`.`t3` where ((`test`.`outer_t1`.`i1` = `test`.`t2`.`j2`) and (`test`.`t2`.`i2` <> `test`.`outer_t2`.`j2`))



 Comments   
Comment by Oleksandr Byelkin [ 2013-02-18 ]

Plan chosen for exist2in then semi-join is 3-4 times worse then usual subquery execution.

Comment by Oleksandr Byelkin [ 2013-02-19 ]

It looks like a semi-join problem...

SELECT COUNT FROM t1 outer_t1, t2 outer_t2, t3
WHERE outer_t1.i1 IN (
SELECT j2 FROM t2 LEFT JOIN t3 ON ( i3 = j2 )
WHERE i2 <> outer_t2.j2
);
COUNT
23713
explain
SELECT COUNT FROM t1 outer_t1, t2 outer_t2, t3
WHERE outer_t1.i1 IN (
SELECT j2 FROM t2 LEFT JOIN t3 ON ( i3 = j2 )
WHERE i2 <> outer_t2.j2
);
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY outer_t2 ALL NULL NULL NULL NULL 19
1 PRIMARY t3 index NULL i3 5 NULL 19 Using index; Using join buffer (flat, BNL join)
1 PRIMARY t2 ALL NULL NULL NULL NULL 19 Using where; Start temporary; Using join buffer (incremental, BNL join)
1 PRIMARY t3 ref i3 i3 5 test.t2.j2 2 Using where; Using index
1 PRIMARY outer_t1 ALL NULL NULL NULL NULL 72 Using where; End temporary; Using join buffer (flat, BNL join)
set optimizer_switch='semijoin=off';
SELECT COUNT FROM t1 outer_t1, t2 outer_t2, t3
WHERE outer_t1.i1 IN (
SELECT j2 FROM t2 LEFT JOIN t3 ON ( i3 = j2 )
WHERE i2 <> outer_t2.j2
);
COUNT
23712
explain
SELECT COUNT FROM t1 outer_t1, t2 outer_t2, t3
WHERE outer_t1.i1 IN (
SELECT j2 FROM t2 LEFT JOIN t3 ON ( i3 = j2 )
WHERE i2 <> outer_t2.j2
);
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY outer_t2 ALL NULL NULL NULL NULL 19
1 PRIMARY t3 index NULL i3 5 NULL 19 Using index; Using join buffer (flat, BNL join)
1 PRIMARY outer_t1 ALL NULL NULL NULL NULL 72 Using where; Using join buffer (incremental, BNL join)
2 DEPENDENT SUBQUERY t2 ALL NULL NULL NULL NULL 19 Using where
2 DEPENDENT SUBQUERY t3 ref i3 i3 5 test.t2.j2 2 Using where; Using index

Comment by Oleksandr Byelkin [ 2013-02-19 ]

new test suite:
CREATE TABLE t1 (i1 INT);
INSERT INTO t1 VALUES
(4),(8),(0),(0),(0),(7),(7),(5),(3),(4),(9),(6),(1),(5),(6),(2),(4),(4),
(4),(3),(3),(7),(6),(7),(9),(4),(4),(2),(1),(2),(2),(3),(8),(4),(1),(7),
(9),(4),(5),(5),(9),(3),(8),(0),(3),(1),(0),(8),(3),(3),(9),(6),(1),(0),
(8),(3),(9),(5),(9),(2),(5),(9),(1),(8),(7),(6),(2),(4),(7),(3),(8),(6);

CREATE TABLE t2 (i2 INT, j2 INT);
INSERT INTO t2 VALUES
(7,1),(0,7),(9,4),(3,7),(4,0),(2,2),(5,9),(3,4),(1,0),(3,9),
(5,8),(1,8),(204,18),(224,84),(9,6),(5,3),(0,6),(6,1),(7,3);

CREATE TABLE t3 (i3 INT, KEY(i3));
INSERT INTO t3 VALUES
(0),(8),(1),(8),(9),(24),(6),(1),(6),
(2),(4),(8),(4),(4),(7),(4),(1),(9),(4);

SELECT COUNT FROM t1 outer_t1, t2 outer_t2, t3
WHERE outer_t1.i1 IN (
SELECT j2 FROM t2 LEFT JOIN t3 ON ( i3 = j2 )
WHERE i2 <> outer_t2.j2
);
explain
SELECT COUNT FROM t1 outer_t1, t2 outer_t2, t3
WHERE outer_t1.i1 IN (
SELECT j2 FROM t2 LEFT JOIN t3 ON ( i3 = j2 )
WHERE i2 <> outer_t2.j2
);
set optimizer_switch='semijoin=off';
SELECT COUNT FROM t1 outer_t1, t2 outer_t2, t3
WHERE outer_t1.i1 IN (
SELECT j2 FROM t2 LEFT JOIN t3 ON ( i3 = j2 )
WHERE i2 <> outer_t2.j2
);
explain
SELECT COUNT FROM t1 outer_t1, t2 outer_t2, t3
WHERE outer_t1.i1 IN (
SELECT j2 FROM t2 LEFT JOIN t3 ON ( i3 = j2 )
WHERE i2 <> outer_t2.j2
);

Comment by Patryk Pomykalski [ 2013-05-15 ]

I think its something with join cache.
The bug depends on values of join_buffer_size and max_heap_table_size, rev 3349 just changes which combinations might trigger it.

Comment by Patryk Pomykalski [ 2013-05-16 ]

Also on 5.3:

CREATE TABLE t1 (i1 INT) engine=myisam;
INSERT INTO t1 VALUES
(4),(8),(0),(0),(0),(7),(7),(5),(3),(4),(9),(6),(1),(5),(6),(2),(4),(4);

CREATE TABLE t2 (i2 INT, j2 INT) engine=myisam;
INSERT INTO t2 VALUES
(7,1),(0,7),(9,4),(3,7),(4,0),(2,2),(5,9),(3,4),(1,0),(3,9),(5,8),(1,8),(204,18),(224,84),(9,6);

CREATE TABLE t3 (i3 INT, KEY(i3)) engine=myisam;
INSERT INTO t3 VALUES
(0),(8),(1),(8),(9),(24),(6),(1),(6),(2),(4);

set max_heap_table_size=16*1024;
set optimizer_switch='semijoin=on';

--disable_query_log
let $n = 5;
while ($n) {
eval set join_buffer_size=128 + 128*$n;
SELECT COUNT FROM t1 outer_t1, t2 outer_t2, t3
WHERE outer_t1.i1 IN (
SELECT j2 FROM t2 LEFT JOIN t3 ON ( i3 = j2 )
WHERE i2 <> outer_t2.j2
);
dec $n;
}

Tested on 32bit machine outputs:
COUNT
2398
COUNT
2398
COUNT
2399
COUNT
2398
COUNT
2399

Comment by Patryk Pomykalski [ 2013-05-17 ]

This bug is since 5.3.0

Comment by Patryk Pomykalski [ 2013-05-19 ]

I have analyzed and fixed this. The problem was due to ignoring overflowing duplicate value when converting from heap to aria/myisam temp table.
It could lead to extra rows in semijoin or missing row in union query.

Patch for 5.3 branch:
http://bazaar.launchpad.net/~pomyks/maria/5.3-mdev-4173/revision/3658

Comment by Sergei Petrunia [ 2013-07-13 ]

Confirm Patryk's analysis. Apparently, DuplicateElimination strategy is the only part of SQL layer that cares about whether a write to temporary table resulted in HA_ERR_DUPP_KEY or not. The patch seems big, but most of the changes are trivial. MySQL 5.6 also has such change. They have introduced it as a part of some big patch.

Comment by Sergei Petrunia [ 2013-07-15 ]

Patryk, can I assume that the patch you've posted above is available to MariaDB under the New BSD licence?

Comment by Patryk Pomykalski [ 2013-07-15 ]

Yes, all my patches are available under New BSD license. This one is mostly from mysql code.

Comment by Sergei Petrunia [ 2013-07-16 ]

Thanks.

Generated at Thu Feb 08 06:54:21 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.