Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-3913

Wrong result (NULLs instead of real values, missing rows) with semijoin+materialization, LEFT JOIN, key, IN subquery, Aria

Details

    Description

      The following test case

      SET optimizer_switch = 'materialization=on,semijoin=on';
       
      CREATE TABLE t1 ( a1 CHAR(1), b1 CHAR(1), KEY(b1,a1) ) ENGINE=Aria;
      INSERT INTO t1 VALUES ('f','c'),('d','m'),('g','y');
       
      CREATE TABLE t2 ( a2 CHAR(1), b2 CHAR(1) ) ENGINE=Aria;
      INSERT INTO t2 VALUES ('y','y'),('y','y'),('w','w');
       
      CREATE TABLE t3 (a3 INT) ENGINE=Aria;
      INSERT INTO t3 VALUES (8),(6);
       
      SELECT * FROM t1 LEFT JOIN t2 ON ( b1 = a2 ) WHERE ( b1, b1 )  IN ( SELECT a2, b2 FROM t2, t3 );

      produces one row with NULLs instead of t1.* values:

      a1	b1	a2	b2
      ------------------------------
      g	y	NULL	NULL

      The expected result is 2 rows without NULLs:

      a1	b1	a2	b2
      ------------------------------
      g	y	y	y
      g	y	y	y

      branch: maria/5.3
      revision-id: sergii@pisem.net-20121123121131-p4nfv8j2cbh68dbg
      date: 2012-11-23 13:11:31 +0100
      revno: 3605

      Also reproducible on older versions of 5.3 (I checked down to 5.3.4).
      Reproducible on maria/5.5, maria/10.0.
      With the given test case, not reproducible on MyISAM or InnoDB, only on Aria.

      Reproducible with minimal optimizer_switch materialization=on,semijoin=on as well as with the default optimizer_switch:

      index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on

      EXPLAIN with the minimal optimizer_switch (wrong result):

      id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
      1	PRIMARY	<subquery2>	ALL	distinct_key	NULL	NULL	NULL	3100.00	
      1	PRIMARY	t1	ref	b1	b1	2	test.t2.a2	1	100.00	Using index
      1	PRIMARY	t2	ALL	NULL	NULL	NULL	NULL	3	100.00	Using where
      2	MATERIALIZED	t3	ALL	NULL	NULL	NULL	NULL	2	100.00	
      2	MATERIALIZED	t2	ALL	NULL	NULL	NULL	NULL	3	100.00	Using where
      Warnings:
      Note	1003	select `test`.`t1`.`a1` AS `a1`,`test`.`t1`.`b1` AS `b1`,`test`.`t2`.`a2` AS `a2`,`test`.`t2`.`b2` AS `b2` from `test`.`t1` semi join (`test`.`t2` join `test`.`t3`) left join `test`.`t2` on(((`test`.`t2`.`b2` = `test`.`t2`.`a2`) and (`test`.`t1`.`b1` = `test`.`t2`.`a2`) and (`test`.`t2`.`a2` = `test`.`t2`.`a2`))) where ((`test`.`t2`.`b2` = `test`.`t2`.`a2`) and (`test`.`t1`.`b1` = `test`.`t2`.`a2`))

      EXPLAIN with the default optimizer_switch (also wrong result):

      id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
      1	PRIMARY	<subquery2>	ALL	distinct_key	NULL	NULL	NULL	3100.00	
      1	PRIMARY	t1	ref	b1	b1	2	test.t2.a2	1	100.00	Using index
      1	PRIMARY	t2	ALL	NULL	NULL	NULL	NULL	3	100.00	Using where; Using join buffer (flat, BNL join)
      2	MATERIALIZED	t3	ALL	NULL	NULL	NULL	NULL	2	100.00	
      2	MATERIALIZED	t2	ALL	NULL	NULL	NULL	NULL	3	100.00	Using where; Using join buffer (flat, BNL join)
      Warnings:
      Note	1003	select `test`.`t1`.`a1` AS `a1`,`test`.`t1`.`b1` AS `b1`,`test`.`t2`.`a2` AS `a2`,`test`.`t2`.`b2` AS `b2` from `test`.`t1` semi join (`test`.`t2` join `test`.`t3`) left join `test`.`t2` on(((`test`.`t2`.`b2` = `test`.`t2`.`a2`) and (`test`.`t1`.`b1` = `test`.`t2`.`a2`) and (`test`.`t2`.`a2` = `test`.`t2`.`a2`))) where ((`test`.`t2`.`b2` = `test`.`t2`.`a2`) and (`test`.`t1`.`b1` = `test`.`t2`.`a2`))

      Attachments

        Activity

          The wrong result is strictly specific to semi-join materialization.
          Both if the query is run with non-SJ materialization, or with SJ-only the result is correct.

          timour Timour Katchaounov (Inactive) added a comment - The wrong result is strictly specific to semi-join materialization. Both if the query is run with non-SJ materialization, or with SJ-only the result is correct.

          The following simplified test case compares the query plans chosen when MyISAM and Aria are used.
          The Aria plan produces wrong result. It is not clear whether if the same plan is forced with MyISAM,
          we will get the same wrong result. The reason is that it is not easy to force a semi-join plan.

          – Aria tables
          CREATE TABLE t1a (a1 CHAR(1), b1 CHAR(1), KEY(b1,a1) ) ENGINE=Aria;
          INSERT INTO t1a VALUES ('f','c'),('d','m'),('g','y');
          CREATE TABLE t2a (a2 CHAR(1), b2 CHAR(1) ) ENGINE=Aria;
          INSERT INTO t2a VALUES ('y','y'),('y','y'),('w','w');
          CREATE TABLE t3a (a3 CHAR(1), b3 CHAR(1) ) ENGINE=Aria;
          INSERT INTO t3a VALUES ('y','y'),('y','y'),('w','w');

          – MyISAM tables
          CREATE TABLE t1m (a1 CHAR(1), b1 CHAR(1), KEY(b1,a1) ) ENGINE=MyISAM;
          INSERT INTO t1m VALUES ('f','c'),('d','m'),('g','y');
          CREATE TABLE t2m (a2 CHAR(1), b2 CHAR(1) ) ENGINE=MyISAM;
          INSERT INTO t2m VALUES ('y','y'),('y','y'),('w','w');
          CREATE TABLE t3m (a3 CHAR(1), b3 CHAR(1) ) ENGINE=MyISAM;
          INSERT INTO t3m VALUES ('y','y'),('y','y'),('w','w');

          EXPLAIN
          SELECT *
          FROM t1a LEFT JOIN t2a ON (b1 = a2)
          WHERE (b1, b1) IN (SELECT a3, b3 FROM t3a);

          SELECT *
          FROM t1a LEFT JOIN t2a ON (b1 = a2)
          WHERE (b1, b1) IN (SELECT a3, b3 FROM t3a);

          EXPLAIN
          SELECT *
          FROM t1m LEFT JOIN t2m ON (b1 = a2)
          WHERE (b1, b1) IN (SELECT a3, b3 FROM t3m);

          SELECT *
          FROM t1m LEFT JOIN t2m ON (b1 = a2)
          WHERE (b1, b1) IN (SELECT a3, b3 FROM t3m);

          MariaDB [test]> EXPLAIN SELECT *
          -> FROM t1a LEFT JOIN t2a ON (b1 = a2)
          -> WHERE (b1, b1) IN (SELECT a3, b3 FROM t3a);
          -----------------------------------------------------------------------------------------------------------------------+

          id select_type table type possible_keys key key_len ref rows Extra

          -----------------------------------------------------------------------------------------------------------------------+

          1 PRIMARY <subquery2> ALL distinct_key NULL NULL NULL 3  
          1 PRIMARY t1a ref b1 b1 2 test.t3a.a3 1 Using index
          1 PRIMARY t2a ALL NULL NULL NULL NULL 3 Using where; Using join buffer (flat, BNL join)
          2 MATERIALIZED t3a ALL NULL NULL NULL NULL 3 Using where

          -----------------------------------------------------------------------------------------------------------------------+
          4 rows in set (0.00 sec)

          MariaDB [test]>
          MariaDB [test]> SELECT *
          -> FROM t1a LEFT JOIN t2a ON (b1 = a2)
          -> WHERE (b1, b1) IN (SELECT a3, b3 FROM t3a);
          ------------------+

          a1 b1 a2 b2

          ------------------+

          g y NULL NULL

          ------------------+
          1 row in set (0.00 sec)

          MariaDB [test]>
          MariaDB [test]>
          MariaDB [test]> EXPLAIN SELECT *
          -> FROM t1m LEFT JOIN t2m ON (b1 = a2)
          -> WHERE (b1, b1) IN (SELECT a3, b3 FROM t3m);
          -------------------------------------------------------------------------------------------------------------------------------+

          id select_type table type possible_keys key key_len ref rows Extra

          -------------------------------------------------------------------------------------------------------------------------------+

          1 PRIMARY t1m index b1 b1 4 NULL 3 Using index
          1 PRIMARY <subquery2> eq_ref distinct_key distinct_key 2 func,func 1  
          1 PRIMARY t2m ALL NULL NULL NULL NULL 3 Using where; Using join buffer (flat, BNL join)
          2 MATERIALIZED t3m ALL NULL NULL NULL NULL 3 Using where

          -------------------------------------------------------------------------------------------------------------------------------+
          4 rows in set (0.00 sec)

          MariaDB [test]>
          MariaDB [test]> SELECT *
          -> FROM t1m LEFT JOIN t2m ON (b1 = a2)
          -> WHERE (b1, b1) IN (SELECT a3, b3 FROM t3m);
          ------------------+

          a1 b1 a2 b2

          ------------------+

          g y y y
          g y y y

          ------------------+
          2 rows in set (0.01 sec)

          timour Timour Katchaounov (Inactive) added a comment - - edited The following simplified test case compares the query plans chosen when MyISAM and Aria are used. The Aria plan produces wrong result. It is not clear whether if the same plan is forced with MyISAM, we will get the same wrong result. The reason is that it is not easy to force a semi-join plan. – Aria tables CREATE TABLE t1a (a1 CHAR(1), b1 CHAR(1), KEY(b1,a1) ) ENGINE=Aria; INSERT INTO t1a VALUES ('f','c'),('d','m'),('g','y'); CREATE TABLE t2a (a2 CHAR(1), b2 CHAR(1) ) ENGINE=Aria; INSERT INTO t2a VALUES ('y','y'),('y','y'),('w','w'); CREATE TABLE t3a (a3 CHAR(1), b3 CHAR(1) ) ENGINE=Aria; INSERT INTO t3a VALUES ('y','y'),('y','y'),('w','w'); – MyISAM tables CREATE TABLE t1m (a1 CHAR(1), b1 CHAR(1), KEY(b1,a1) ) ENGINE=MyISAM; INSERT INTO t1m VALUES ('f','c'),('d','m'),('g','y'); CREATE TABLE t2m (a2 CHAR(1), b2 CHAR(1) ) ENGINE=MyISAM; INSERT INTO t2m VALUES ('y','y'),('y','y'),('w','w'); CREATE TABLE t3m (a3 CHAR(1), b3 CHAR(1) ) ENGINE=MyISAM; INSERT INTO t3m VALUES ('y','y'),('y','y'),('w','w'); EXPLAIN SELECT * FROM t1a LEFT JOIN t2a ON (b1 = a2) WHERE (b1, b1) IN (SELECT a3, b3 FROM t3a); SELECT * FROM t1a LEFT JOIN t2a ON (b1 = a2) WHERE (b1, b1) IN (SELECT a3, b3 FROM t3a); EXPLAIN SELECT * FROM t1m LEFT JOIN t2m ON (b1 = a2) WHERE (b1, b1) IN (SELECT a3, b3 FROM t3m); SELECT * FROM t1m LEFT JOIN t2m ON (b1 = a2) WHERE (b1, b1) IN (SELECT a3, b3 FROM t3m); MariaDB [test] > EXPLAIN SELECT * -> FROM t1a LEFT JOIN t2a ON (b1 = a2) -> WHERE (b1, b1) IN (SELECT a3, b3 FROM t3a); ----- ------------ ----------- ---- ------------- ---- ------- ----------- ---- ------------------------------------------------+ id select_type table type possible_keys key key_len ref rows Extra ----- ------------ ----------- ---- ------------- ---- ------- ----------- ---- ------------------------------------------------+ 1 PRIMARY <subquery2> ALL distinct_key NULL NULL NULL 3   1 PRIMARY t1a ref b1 b1 2 test.t3a.a3 1 Using index 1 PRIMARY t2a ALL NULL NULL NULL NULL 3 Using where; Using join buffer (flat, BNL join) 2 MATERIALIZED t3a ALL NULL NULL NULL NULL 3 Using where ----- ------------ ----------- ---- ------------- ---- ------- ----------- ---- ------------------------------------------------+ 4 rows in set (0.00 sec) MariaDB [test] > MariaDB [test] > SELECT * -> FROM t1a LEFT JOIN t2a ON (b1 = a2) -> WHERE (b1, b1) IN (SELECT a3, b3 FROM t3a); ----- ---- ---- -----+ a1 b1 a2 b2 ----- ---- ---- -----+ g y NULL NULL ----- ---- ---- -----+ 1 row in set (0.00 sec) MariaDB [test] > MariaDB [test] > MariaDB [test] > EXPLAIN SELECT * -> FROM t1m LEFT JOIN t2m ON (b1 = a2) -> WHERE (b1, b1) IN (SELECT a3, b3 FROM t3m); ----- ------------ ----------- ------ ------------- ------------ ------- --------- ---- ------------------------------------------------+ id select_type table type possible_keys key key_len ref rows Extra ----- ------------ ----------- ------ ------------- ------------ ------- --------- ---- ------------------------------------------------+ 1 PRIMARY t1m index b1 b1 4 NULL 3 Using index 1 PRIMARY <subquery2> eq_ref distinct_key distinct_key 2 func,func 1   1 PRIMARY t2m ALL NULL NULL NULL NULL 3 Using where; Using join buffer (flat, BNL join) 2 MATERIALIZED t3m ALL NULL NULL NULL NULL 3 Using where ----- ------------ ----------- ------ ------------- ------------ ------- --------- ---- ------------------------------------------------+ 4 rows in set (0.00 sec) MariaDB [test] > MariaDB [test] > SELECT * -> FROM t1m LEFT JOIN t2m ON (b1 = a2) -> WHERE (b1, b1) IN (SELECT a3, b3 FROM t3m); ----- ---- ---- -----+ a1 b1 a2 b2 ----- ---- ---- -----+ g y y y g y y y ----- ---- ---- -----+ 2 rows in set (0.01 sec)

          The following test with MyISAM tables demonstrates that the wrong result is not aria-specific, but it is a semi-join materialization bug:

          – MyISAM tables
          CREATE TABLE t1 (a1 CHAR(1), b1 CHAR(1), KEY(b1,a1) );
          INSERT INTO t1 VALUES ('d','m'),('g','y');
          CREATE TABLE t2 (a2 CHAR(1), b2 CHAR(1) );
          INSERT INTO t2 VALUES ('y','y'),('w','w');
          CREATE TABLE t3 (a3 CHAR(1), b3 CHAR(1) );
          INSERT INTO t3 VALUES ('y','y'),('w','w');

          set @@join_cache_level=0;

          set @@optimizer_switch='semijoin=on';
          EXPLAIN EXTENDED
          SELECT *
          FROM t1 LEFT JOIN t2 ON (b1 = a2)
          WHERE (b1, b1) IN (SELECT a3, b3 FROM t3);

          SELECT *
          FROM t1 LEFT JOIN t2 ON (b1 = a2)
          WHERE (b1, b1) IN (SELECT a3, b3 FROM t3);

          set @@optimizer_switch='semijoin=off';
          EXPLAIN EXTENDED
          SELECT *
          FROM t1 LEFT JOIN t2 ON (b1 = a2)
          WHERE (b1, b1) IN (SELECT a3, b3 FROM t3);

          SELECT *
          FROM t1 LEFT JOIN t2 ON (b1 = a2)
          WHERE (b1, b1) IN (SELECT a3, b3 FROM t3);

          timour Timour Katchaounov (Inactive) added a comment - The following test with MyISAM tables demonstrates that the wrong result is not aria-specific, but it is a semi-join materialization bug: – MyISAM tables CREATE TABLE t1 (a1 CHAR(1), b1 CHAR(1), KEY(b1,a1) ); INSERT INTO t1 VALUES ('d','m'),('g','y'); CREATE TABLE t2 (a2 CHAR(1), b2 CHAR(1) ); INSERT INTO t2 VALUES ('y','y'),('w','w'); CREATE TABLE t3 (a3 CHAR(1), b3 CHAR(1) ); INSERT INTO t3 VALUES ('y','y'),('w','w'); set @@join_cache_level=0; set @@optimizer_switch='semijoin=on'; EXPLAIN EXTENDED SELECT * FROM t1 LEFT JOIN t2 ON (b1 = a2) WHERE (b1, b1) IN (SELECT a3, b3 FROM t3); SELECT * FROM t1 LEFT JOIN t2 ON (b1 = a2) WHERE (b1, b1) IN (SELECT a3, b3 FROM t3); set @@optimizer_switch='semijoin=off'; EXPLAIN EXTENDED SELECT * FROM t1 LEFT JOIN t2 ON (b1 = a2) WHERE (b1, b1) IN (SELECT a3, b3 FROM t3); SELECT * FROM t1 LEFT JOIN t2 ON (b1 = a2) WHERE (b1, b1) IN (SELECT a3, b3 FROM t3);

          Apparently this is a duplicate of mdev-3995 whose test case is much simpler though much bigger.

          igor Igor Babaev (Inactive) added a comment - Apparently this is a duplicate of mdev-3995 whose test case is much simpler though much bigger.

          I am not sure this is a duplicate based on the following observations:

          • the latest test case contains columns of exactly the same type
          • the outer join produces a NULL-complemented row because the condition
            JOIN_TAB::on_precond attached to table t2 is evaluated to false.
            This condition on_precond is (t3.b3 = t3.a3). It is seems wrong to me that after the subquery has
            been materialized, there is a condition that still references columns from the subquery table t3.
            In addition, this conditions seems strange because it references only columns of a table that is
            a lot earlier in the join plan.
          timour Timour Katchaounov (Inactive) added a comment - I am not sure this is a duplicate based on the following observations: the latest test case contains columns of exactly the same type the outer join produces a NULL-complemented row because the condition JOIN_TAB::on_precond attached to table t2 is evaluated to false. This condition on_precond is (t3.b3 = t3.a3). It is seems wrong to me that after the subquery has been materialized, there is a condition that still references columns from the subquery table t3. In addition, this conditions seems strange because it references only columns of a table that is a lot earlier in the join plan.

          I am not sure this is a duplicate based on the following observations:

          • the latest test case contains columns of exactly the same type
          • the outer join produces a NULL-complemented row because the condition
            JOIN_TAB::on_precond attached to table t2 is evaluated to false.
            This condition on_precond is (t3.b3 = t3.a3). It is seems wrong to me that after the subquery has
            been materialized, there is a condition that still references columns from the subquery table t3.
            In addition, this conditions seems strange because it references only columns of a table that is
            a lot earlier in the join plan.
          timour Timour Katchaounov (Inactive) added a comment - I am not sure this is a duplicate based on the following observations: the latest test case contains columns of exactly the same type the outer join produces a NULL-complemented row because the condition JOIN_TAB::on_precond attached to table t2 is evaluated to false. This condition on_precond is (t3.b3 = t3.a3). It is seems wrong to me that after the subquery has been materialized, there is a condition that still references columns from the subquery table t3. In addition, this conditions seems strange because it references only columns of a table that is a lot earlier in the join plan.
          igor Igor Babaev (Inactive) added a comment - - edited

          The original bug was not a duplicate of bug mdev-3995.
          The wrong results of the MyISAM test case constructed by Timour is caused by bug mdev-3995 though.

          The original bug can be reproduced with MyISAM table with the following test case:

          CREATE TABLE t1 (a1 char(1), b1 char(1), index idx(b1,a1));
          INSERT INTO t1 VALUES ('f','c'),('d','m'),('g','y');
          INSERT INTO t1 VALUES ('f','c'),('d','m'),('g','y');

          CREATE TABLE t2 (a2 char(1), b2 char(1));
          INSERT INTO t2 VALUES ('y','y'),('y','y'),('w','w');

          CREATE TABLE t3 (a3 int);
          INSERT INTO t3 VALUES (8),(6);

          CREATE TABLE t4 (a4 char(1), b4 char(1));
          INSERT INTO t4 VALUES ('y','y'),('y','y'),('w','w');

          set optimizer_switch='materialization=off';

          EXPLAIN EXTENDED
          SELECT * FROM t1 LEFT JOIN t2 ON ( b1 = a2 )
          WHERE ( b1, b1 ) IN ( SELECT a4, b4 FROM t3, t4);
          SELECT * FROM t1 LEFT JOIN t2 ON ( b1 = a2 )
          WHERE ( b1, b1 ) IN ( SELECT a4, b4 FROM t3, t4);

          set optimizer_switch='materialization=on';

          EXPLAIN EXTENDED
          SELECT * FROM t1 LEFT JOIN t2 ON ( b1 = a2 )
          WHERE ( b1, b1 ) IN ( SELECT a4, b4 FROM t3, t4);
          SELECT * FROM t1 LEFT JOIN t2 ON ( b1 = a2 )
          WHERE ( b1, b1 ) IN ( SELECT a4, b4 FROM t3, t4);

          DROP TABLE t1,t2,t3,t4;

          igor Igor Babaev (Inactive) added a comment - - edited The original bug was not a duplicate of bug mdev-3995. The wrong results of the MyISAM test case constructed by Timour is caused by bug mdev-3995 though. The original bug can be reproduced with MyISAM table with the following test case: CREATE TABLE t1 (a1 char(1), b1 char(1), index idx(b1,a1)); INSERT INTO t1 VALUES ('f','c'),('d','m'),('g','y'); INSERT INTO t1 VALUES ('f','c'),('d','m'),('g','y'); CREATE TABLE t2 (a2 char(1), b2 char(1)); INSERT INTO t2 VALUES ('y','y'),('y','y'),('w','w'); CREATE TABLE t3 (a3 int); INSERT INTO t3 VALUES (8),(6); CREATE TABLE t4 (a4 char(1), b4 char(1)); INSERT INTO t4 VALUES ('y','y'),('y','y'),('w','w'); set optimizer_switch='materialization=off'; EXPLAIN EXTENDED SELECT * FROM t1 LEFT JOIN t2 ON ( b1 = a2 ) WHERE ( b1, b1 ) IN ( SELECT a4, b4 FROM t3, t4); SELECT * FROM t1 LEFT JOIN t2 ON ( b1 = a2 ) WHERE ( b1, b1 ) IN ( SELECT a4, b4 FROM t3, t4); set optimizer_switch='materialization=on'; EXPLAIN EXTENDED SELECT * FROM t1 LEFT JOIN t2 ON ( b1 = a2 ) WHERE ( b1, b1 ) IN ( SELECT a4, b4 FROM t3, t4); SELECT * FROM t1 LEFT JOIN t2 ON ( b1 = a2 ) WHERE ( b1, b1 ) IN ( SELECT a4, b4 FROM t3, t4); DROP TABLE t1,t2,t3,t4;
          igor Igor Babaev (Inactive) added a comment - - edited

          The fix for the bug was pushed into the 5.3 tree on 20-02-2013.

          igor Igor Babaev (Inactive) added a comment - - edited The fix for the bug was pushed into the 5.3 tree on 20-02-2013.

          People

            igor Igor Babaev (Inactive)
            elenst Elena Stepanova
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.