Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-4173

Wrong result (extra row) with semijoin=on, joins in outer query, LEFT JOIN in the subquery

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.0.1, 5.5.29
    • 10.0.4, 5.5.32
    • None
    • None

    Description

      The following test case produces COUNT = 23712 when it is executed with exists_to_in=off, and 23713 when it is executed with exists_to_in=on.
      Postgres, MySQL 5.6 and older versions of MariaDB all return 23712, so I assume it is the correct result.

      Test case:

      set optimizer_switch='exists_to_in=on';
       
      CREATE TABLE t1 (i1 INT) ENGINE=MyISAM;
      INSERT INTO t1 VALUES 
      (4),(8),(0),(0),(0),(7),(7),(5),(3),(4),(9),(6),(1),(5),(6),(2),(4),(4),
      (4),(3),(3),(7),(6),(7),(9),(4),(4),(2),(1),(2),(2),(3),(8),(4),(1),(7),
      (9),(4),(5),(5),(9),(3),(8),(0),(3),(1),(0),(8),(3),(3),(9),(6),(1),(0),
      (8),(3),(9),(5),(9),(2),(5),(9),(1),(8),(7),(6),(2),(4),(7),(3),(8),(6);
       
      CREATE TABLE t2 (i2 INT, j2 INT) ENGINE=MyISAM;
      INSERT INTO t2 VALUES 
      (7,1),(0,7),(9,4),(3,7),(4,0),(2,2),(5,9),(3,4),(1,0),(3,9),
      (5,8),(1,8),(204,18),(224,84),(9,6),(5,3),(0,6),(6,1),(7,3);
       
      CREATE TABLE t3 (i3 INT, KEY(i3)) ENGINE=MyISAM;
      INSERT INTO t3 VALUES 
      (0),(8),(1),(8),(9),(24),(6),(1),(6),
      (2),(4),(8),(4),(4),(7),(4),(1),(9),(4);
       
      SELECT COUNT(*) FROM t1 outer_t1, t2 outer_t2, t3 
      WHERE EXISTS ( 
        SELECT 1 FROM t2 LEFT JOIN t3 ON ( i3 = j2 ) 
        WHERE j2 = outer_t1.i1 AND i2 <> outer_t2.j2 
      );

      Result with exists_to_in=off:

      COUNT(*)
      23712

      Result with exists_to_in=on:

      COUNT(*)
      23713

      EXPLAIN with exists_to_in=on, otherwise default optimizer_switch:

      id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
      1	PRIMARY	outer_t2	ALL	NULL	NULL	NULL	NULL	19	100.00	
      1	PRIMARY	t3	index	NULL	i3	5	NULL	19	100.00	Using index; Using join buffer (flat, BNL join)
      1	PRIMARY	t2	ALL	NULL	NULL	NULL	NULL	19	100.00	Using where; Start temporary; Using join buffer (incremental, BNL join)
      1	PRIMARY	t3	ref	i3	i3	5	test.t2.j2	2	100.00	Using where; Using index
      1	PRIMARY	outer_t1	ALL	NULL	NULL	NULL	NULL	72	100.00	Using where; End temporary; Using join buffer (flat, BNL join)
      Warnings:
      Note	1276	Field or reference 'test.outer_t1.i1' of SELECT #2 was resolved in SELECT #1
      Note	1276	Field or reference 'test.outer_t2.j2' of SELECT #2 was resolved in SELECT #1
      Note	1003	select count(0) AS `COUNT(*)` from `test`.`t1` `outer_t1` semi join (`test`.`t2` left join `test`.`t3` on(((`test`.`t3`.`i3` = `test`.`t2`.`j2`) and (`test`.`t2`.`j2` is not null)))) join `test`.`t2` `outer_t2` join `test`.`t3` where ((`test`.`outer_t1`.`i1` = `test`.`t2`.`j2`) and (`test`.`t2`.`i2` <> `test`.`outer_t2`.`j2`))

      Attachments

        Activity

          I have analyzed and fixed this. The problem was due to ignoring overflowing duplicate value when converting from heap to aria/myisam temp table.
          It could lead to extra rows in semijoin or missing row in union query.

          Patch for 5.3 branch:
          http://bazaar.launchpad.net/~pomyks/maria/5.3-mdev-4173/revision/3658

          pomyk Patryk Pomykalski added a comment - I have analyzed and fixed this. The problem was due to ignoring overflowing duplicate value when converting from heap to aria/myisam temp table. It could lead to extra rows in semijoin or missing row in union query. Patch for 5.3 branch: http://bazaar.launchpad.net/~pomyks/maria/5.3-mdev-4173/revision/3658

          Confirm Patryk's analysis. Apparently, DuplicateElimination strategy is the only part of SQL layer that cares about whether a write to temporary table resulted in HA_ERR_DUPP_KEY or not. The patch seems big, but most of the changes are trivial. MySQL 5.6 also has such change. They have introduced it as a part of some big patch.

          psergei Sergei Petrunia added a comment - Confirm Patryk's analysis. Apparently, DuplicateElimination strategy is the only part of SQL layer that cares about whether a write to temporary table resulted in HA_ERR_DUPP_KEY or not. The patch seems big, but most of the changes are trivial. MySQL 5.6 also has such change. They have introduced it as a part of some big patch.
          psergei Sergei Petrunia added a comment - - edited

          Patryk, can I assume that the patch you've posted above is available to MariaDB under the New BSD licence?

          psergei Sergei Petrunia added a comment - - edited Patryk, can I assume that the patch you've posted above is available to MariaDB under the New BSD licence?

          Yes, all my patches are available under New BSD license. This one is mostly from mysql code.

          pomyk Patryk Pomykalski added a comment - Yes, all my patches are available under New BSD license. This one is mostly from mysql code.

          Thanks.

          psergei Sergei Petrunia added a comment - Thanks.

          People

            psergei Sergei Petrunia
            elenst Elena Stepanova
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.