[MDEV-3913] Wrong result (NULLs instead of real values, missing rows) with semijoin+materialization, LEFT JOIN, key, IN subquery, Aria Created: 2012-12-03  Updated: 2013-02-25  Resolved: 2013-02-25

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.0.0, 5.5.28, 5.3.10
Fix Version/s: 10.0.2, 5.5.30, 5.3.13

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Igor Babaev
Resolution: Fixed Votes: 0
Labels: optimizer

Issue Links:
Relates

 Description   

The following test case

SET optimizer_switch = 'materialization=on,semijoin=on';
 
CREATE TABLE t1 ( a1 CHAR(1), b1 CHAR(1), KEY(b1,a1) ) ENGINE=Aria;
INSERT INTO t1 VALUES ('f','c'),('d','m'),('g','y');
 
CREATE TABLE t2 ( a2 CHAR(1), b2 CHAR(1) ) ENGINE=Aria;
INSERT INTO t2 VALUES ('y','y'),('y','y'),('w','w');
 
CREATE TABLE t3 (a3 INT) ENGINE=Aria;
INSERT INTO t3 VALUES (8),(6);
 
SELECT * FROM t1 LEFT JOIN t2 ON ( b1 = a2 ) WHERE ( b1, b1 )  IN ( SELECT a2, b2 FROM t2, t3 );

produces one row with NULLs instead of t1.* values:

a1	b1	a2	b2
------------------------------
g	y	NULL	NULL

The expected result is 2 rows without NULLs:

a1	b1	a2	b2
------------------------------
g	y	y	y
g	y	y	y

branch: maria/5.3
revision-id: sergii@pisem.net-20121123121131-p4nfv8j2cbh68dbg
date: 2012-11-23 13:11:31 +0100
revno: 3605

Also reproducible on older versions of 5.3 (I checked down to 5.3.4).
Reproducible on maria/5.5, maria/10.0.
With the given test case, not reproducible on MyISAM or InnoDB, only on Aria.

Reproducible with minimal optimizer_switch materialization=on,semijoin=on as well as with the default optimizer_switch:

index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on

EXPLAIN with the minimal optimizer_switch (wrong result):

id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	PRIMARY	<subquery2>	ALL	distinct_key	NULL	NULL	NULL	3100.00	
1	PRIMARY	t1	ref	b1	b1	2	test.t2.a2	1	100.00	Using index
1	PRIMARY	t2	ALL	NULL	NULL	NULL	NULL	3	100.00	Using where
2	MATERIALIZED	t3	ALL	NULL	NULL	NULL	NULL	2	100.00	
2	MATERIALIZED	t2	ALL	NULL	NULL	NULL	NULL	3	100.00	Using where
Warnings:
Note	1003	select `test`.`t1`.`a1` AS `a1`,`test`.`t1`.`b1` AS `b1`,`test`.`t2`.`a2` AS `a2`,`test`.`t2`.`b2` AS `b2` from `test`.`t1` semi join (`test`.`t2` join `test`.`t3`) left join `test`.`t2` on(((`test`.`t2`.`b2` = `test`.`t2`.`a2`) and (`test`.`t1`.`b1` = `test`.`t2`.`a2`) and (`test`.`t2`.`a2` = `test`.`t2`.`a2`))) where ((`test`.`t2`.`b2` = `test`.`t2`.`a2`) and (`test`.`t1`.`b1` = `test`.`t2`.`a2`))

EXPLAIN with the default optimizer_switch (also wrong result):

id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	PRIMARY	<subquery2>	ALL	distinct_key	NULL	NULL	NULL	3100.00	
1	PRIMARY	t1	ref	b1	b1	2	test.t2.a2	1	100.00	Using index
1	PRIMARY	t2	ALL	NULL	NULL	NULL	NULL	3	100.00	Using where; Using join buffer (flat, BNL join)
2	MATERIALIZED	t3	ALL	NULL	NULL	NULL	NULL	2	100.00	
2	MATERIALIZED	t2	ALL	NULL	NULL	NULL	NULL	3	100.00	Using where; Using join buffer (flat, BNL join)
Warnings:
Note	1003	select `test`.`t1`.`a1` AS `a1`,`test`.`t1`.`b1` AS `b1`,`test`.`t2`.`a2` AS `a2`,`test`.`t2`.`b2` AS `b2` from `test`.`t1` semi join (`test`.`t2` join `test`.`t3`) left join `test`.`t2` on(((`test`.`t2`.`b2` = `test`.`t2`.`a2`) and (`test`.`t1`.`b1` = `test`.`t2`.`a2`) and (`test`.`t2`.`a2` = `test`.`t2`.`a2`))) where ((`test`.`t2`.`b2` = `test`.`t2`.`a2`) and (`test`.`t1`.`b1` = `test`.`t2`.`a2`))



 Comments   
Comment by Timour Katchaounov (Inactive) [ 2013-01-09 ]

The wrong result is strictly specific to semi-join materialization.
Both if the query is run with non-SJ materialization, or with SJ-only the result is correct.

Comment by Timour Katchaounov (Inactive) [ 2013-01-09 ]

The following simplified test case compares the query plans chosen when MyISAM and Aria are used.
The Aria plan produces wrong result. It is not clear whether if the same plan is forced with MyISAM,
we will get the same wrong result. The reason is that it is not easy to force a semi-join plan.

– Aria tables
CREATE TABLE t1a (a1 CHAR(1), b1 CHAR(1), KEY(b1,a1) ) ENGINE=Aria;
INSERT INTO t1a VALUES ('f','c'),('d','m'),('g','y');
CREATE TABLE t2a (a2 CHAR(1), b2 CHAR(1) ) ENGINE=Aria;
INSERT INTO t2a VALUES ('y','y'),('y','y'),('w','w');
CREATE TABLE t3a (a3 CHAR(1), b3 CHAR(1) ) ENGINE=Aria;
INSERT INTO t3a VALUES ('y','y'),('y','y'),('w','w');

– MyISAM tables
CREATE TABLE t1m (a1 CHAR(1), b1 CHAR(1), KEY(b1,a1) ) ENGINE=MyISAM;
INSERT INTO t1m VALUES ('f','c'),('d','m'),('g','y');
CREATE TABLE t2m (a2 CHAR(1), b2 CHAR(1) ) ENGINE=MyISAM;
INSERT INTO t2m VALUES ('y','y'),('y','y'),('w','w');
CREATE TABLE t3m (a3 CHAR(1), b3 CHAR(1) ) ENGINE=MyISAM;
INSERT INTO t3m VALUES ('y','y'),('y','y'),('w','w');

EXPLAIN
SELECT *
FROM t1a LEFT JOIN t2a ON (b1 = a2)
WHERE (b1, b1) IN (SELECT a3, b3 FROM t3a);

SELECT *
FROM t1a LEFT JOIN t2a ON (b1 = a2)
WHERE (b1, b1) IN (SELECT a3, b3 FROM t3a);

EXPLAIN
SELECT *
FROM t1m LEFT JOIN t2m ON (b1 = a2)
WHERE (b1, b1) IN (SELECT a3, b3 FROM t3m);

SELECT *
FROM t1m LEFT JOIN t2m ON (b1 = a2)
WHERE (b1, b1) IN (SELECT a3, b3 FROM t3m);

MariaDB [test]> EXPLAIN SELECT *
-> FROM t1a LEFT JOIN t2a ON (b1 = a2)
-> WHERE (b1, b1) IN (SELECT a3, b3 FROM t3a);
-----------------------------------------------------------------------------------------------------------------------+

id select_type table type possible_keys key key_len ref rows Extra

-----------------------------------------------------------------------------------------------------------------------+

1 PRIMARY <subquery2> ALL distinct_key NULL NULL NULL 3  
1 PRIMARY t1a ref b1 b1 2 test.t3a.a3 1 Using index
1 PRIMARY t2a ALL NULL NULL NULL NULL 3 Using where; Using join buffer (flat, BNL join)
2 MATERIALIZED t3a ALL NULL NULL NULL NULL 3 Using where

-----------------------------------------------------------------------------------------------------------------------+
4 rows in set (0.00 sec)

MariaDB [test]>
MariaDB [test]> SELECT *
-> FROM t1a LEFT JOIN t2a ON (b1 = a2)
-> WHERE (b1, b1) IN (SELECT a3, b3 FROM t3a);
------------------+

a1 b1 a2 b2

------------------+

g y NULL NULL

------------------+
1 row in set (0.00 sec)

MariaDB [test]>
MariaDB [test]>
MariaDB [test]> EXPLAIN SELECT *
-> FROM t1m LEFT JOIN t2m ON (b1 = a2)
-> WHERE (b1, b1) IN (SELECT a3, b3 FROM t3m);
-------------------------------------------------------------------------------------------------------------------------------+

id select_type table type possible_keys key key_len ref rows Extra

-------------------------------------------------------------------------------------------------------------------------------+

1 PRIMARY t1m index b1 b1 4 NULL 3 Using index
1 PRIMARY <subquery2> eq_ref distinct_key distinct_key 2 func,func 1  
1 PRIMARY t2m ALL NULL NULL NULL NULL 3 Using where; Using join buffer (flat, BNL join)
2 MATERIALIZED t3m ALL NULL NULL NULL NULL 3 Using where

-------------------------------------------------------------------------------------------------------------------------------+
4 rows in set (0.00 sec)

MariaDB [test]>
MariaDB [test]> SELECT *
-> FROM t1m LEFT JOIN t2m ON (b1 = a2)
-> WHERE (b1, b1) IN (SELECT a3, b3 FROM t3m);
------------------+

a1 b1 a2 b2

------------------+

g y y y
g y y y

------------------+
2 rows in set (0.01 sec)

Comment by Timour Katchaounov (Inactive) [ 2013-01-14 ]

The following test with MyISAM tables demonstrates that the wrong result is not aria-specific, but it is a semi-join materialization bug:

– MyISAM tables
CREATE TABLE t1 (a1 CHAR(1), b1 CHAR(1), KEY(b1,a1) );
INSERT INTO t1 VALUES ('d','m'),('g','y');
CREATE TABLE t2 (a2 CHAR(1), b2 CHAR(1) );
INSERT INTO t2 VALUES ('y','y'),('w','w');
CREATE TABLE t3 (a3 CHAR(1), b3 CHAR(1) );
INSERT INTO t3 VALUES ('y','y'),('w','w');

set @@join_cache_level=0;

set @@optimizer_switch='semijoin=on';
EXPLAIN EXTENDED
SELECT *
FROM t1 LEFT JOIN t2 ON (b1 = a2)
WHERE (b1, b1) IN (SELECT a3, b3 FROM t3);

SELECT *
FROM t1 LEFT JOIN t2 ON (b1 = a2)
WHERE (b1, b1) IN (SELECT a3, b3 FROM t3);

set @@optimizer_switch='semijoin=off';
EXPLAIN EXTENDED
SELECT *
FROM t1 LEFT JOIN t2 ON (b1 = a2)
WHERE (b1, b1) IN (SELECT a3, b3 FROM t3);

SELECT *
FROM t1 LEFT JOIN t2 ON (b1 = a2)
WHERE (b1, b1) IN (SELECT a3, b3 FROM t3);

Comment by Igor Babaev [ 2013-01-16 ]

Apparently this is a duplicate of mdev-3995 whose test case is much simpler though much bigger.

Comment by Timour Katchaounov (Inactive) [ 2013-01-16 ]

I am not sure this is a duplicate based on the following observations:

  • the latest test case contains columns of exactly the same type
  • the outer join produces a NULL-complemented row because the condition
    JOIN_TAB::on_precond attached to table t2 is evaluated to false.
    This condition on_precond is (t3.b3 = t3.a3). It is seems wrong to me that after the subquery has
    been materialized, there is a condition that still references columns from the subquery table t3.
    In addition, this conditions seems strange because it references only columns of a table that is
    a lot earlier in the join plan.
Comment by Timour Katchaounov (Inactive) [ 2013-01-16 ]

I am not sure this is a duplicate based on the following observations:

  • the latest test case contains columns of exactly the same type
  • the outer join produces a NULL-complemented row because the condition
    JOIN_TAB::on_precond attached to table t2 is evaluated to false.
    This condition on_precond is (t3.b3 = t3.a3). It is seems wrong to me that after the subquery has
    been materialized, there is a condition that still references columns from the subquery table t3.
    In addition, this conditions seems strange because it references only columns of a table that is
    a lot earlier in the join plan.
Comment by Igor Babaev [ 2013-02-25 ]

The original bug was not a duplicate of bug mdev-3995.
The wrong results of the MyISAM test case constructed by Timour is caused by bug mdev-3995 though.

The original bug can be reproduced with MyISAM table with the following test case:

CREATE TABLE t1 (a1 char(1), b1 char(1), index idx(b1,a1));
INSERT INTO t1 VALUES ('f','c'),('d','m'),('g','y');
INSERT INTO t1 VALUES ('f','c'),('d','m'),('g','y');

CREATE TABLE t2 (a2 char(1), b2 char(1));
INSERT INTO t2 VALUES ('y','y'),('y','y'),('w','w');

CREATE TABLE t3 (a3 int);
INSERT INTO t3 VALUES (8),(6);

CREATE TABLE t4 (a4 char(1), b4 char(1));
INSERT INTO t4 VALUES ('y','y'),('y','y'),('w','w');

set optimizer_switch='materialization=off';

EXPLAIN EXTENDED
SELECT * FROM t1 LEFT JOIN t2 ON ( b1 = a2 )
WHERE ( b1, b1 ) IN ( SELECT a4, b4 FROM t3, t4);
SELECT * FROM t1 LEFT JOIN t2 ON ( b1 = a2 )
WHERE ( b1, b1 ) IN ( SELECT a4, b4 FROM t3, t4);

set optimizer_switch='materialization=on';

EXPLAIN EXTENDED
SELECT * FROM t1 LEFT JOIN t2 ON ( b1 = a2 )
WHERE ( b1, b1 ) IN ( SELECT a4, b4 FROM t3, t4);
SELECT * FROM t1 LEFT JOIN t2 ON ( b1 = a2 )
WHERE ( b1, b1 ) IN ( SELECT a4, b4 FROM t3, t4);

DROP TABLE t1,t2,t3,t4;

Comment by Igor Babaev [ 2013-02-25 ]

The fix for the bug was pushed into the 5.3 tree on 20-02-2013.

Generated at Thu Feb 08 06:52:13 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.