Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-5161

Wrong result (missing rows) with semijoin, LEFT JOIN, ORDER BY, constant table

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.0.4, 5.3.12, 5.5.33a
    • 5.5.35, 10.0.7
    • None
    • None

    Description

      The test case is nearly identical to the one provided in the comment to MDEV-5059 (https://mariadb.atlassian.net/browse/MDEV-5059?focusedCommentId=35418&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-35418); the only difference is that table t3 lost the index, and the wrong result now affects the default join_cache_level=2.

      Test case:

      SET optimizer_switch = 'semijoin=on';
       
      CREATE TABLE t1 (pk INT PRIMARY KEY, c1 VARCHAR(1)) ENGINE=MyISAM;
      INSERT INTO t1 VALUES (1,'v'),(2,'v'),(3,'c'),(4,NULL),(5,'x');
       
      CREATE TABLE t2 (c2 VARCHAR(1)) ENGINE=MyISAM;
      INSERT INTO t2 VALUES ('x');
       
      CREATE TABLE t3 (c3 VARCHAR(1)) ENGINE=MyISAM;
      INSERT INTO t3 VALUES ('x'),('d');
       
      SELECT * FROM t1, t2 WHERE pk IN ( SELECT pk FROM t1 LEFT JOIN t3 ON (c1 = c3 ) ) ORDER BY c2, c1;

      Actual result:

      pk	c1	c2
      4	NULL	x

      Expected result:

      pk	c1	c2
      4	NULL	x
      3	c	x
      1	v	x
      2	v	x
      5	x	x

      revision-id: bar@mnogosearch.org-20131016141313-63jt6geakz1e6dg2
      revno: 3708
      branch-nick: 5.3

      EXPLAIN EXTENDED
      SELECT * FROM t1, t2 WHERE pk IN ( SELECT pk FROM t1 LEFT JOIN t3 ON (c1 = c3 ) ) ORDER BY c2, c1;
      id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
      1	PRIMARY	t2	system	NULL	NULL	NULL	NULL	1	100.00	Using filesort
      1	PRIMARY	t1	ALL	PRIMARY	NULL	NULL	NULL	5	100.00	
      1	PRIMARY	t1	eq_ref	PRIMARY	PRIMARY	4	test.t1.pk	1	100.00	
      1	PRIMARY	t3	ALL	NULL	NULL	NULL	NULL	2	100.00	Using where; Start temporary; End temporary
      Warnings:
      Note	1003	select `test`.`t1`.`pk` AS `pk`,`test`.`t1`.`c1` AS `c1`,'x' AS `c2` from `test`.`t1` semi join (`test`.`t3`) join `test`.`t1` join `test`.`t2` where (`test`.`t1`.`pk` = `test`.`t1`.`pk`) order by 'x',`test`.`t1`.`c1`

      Attachments

        Issue Links

          Activity

            Now I have a patch for MDEV-5293, continuing to work on this bug.

            psergei Sergei Petrunia added a comment - Now I have a patch for MDEV-5293 , continuing to work on this bug.

            Interesting.. there is code in setup_semijoin_dups_elimination() which disables "Using filesort":

            Fixed LP bug #901478.
            If the duplicate elimination strategy is used for a semi-join and potentially
            one of the block-based join algorithms can be employed to join the inner
            tables of the semi-join then sorting of the head (first non-constant) table
            for a query with ORDER BY / GROUP BY cannot be used.

            It seems, a problem similar to this one was encountered before?

            psergei Sergei Petrunia added a comment - Interesting.. there is code in setup_semijoin_dups_elimination() which disables "Using filesort": Fixed LP bug #901478. If the duplicate elimination strategy is used for a semi-join and potentially one of the block-based join algorithms can be employed to join the inner tables of the semi-join then sorting of the head (first non-constant) table for a query with ORDER BY / GROUP BY cannot be used. It seems, a problem similar to this one was encountered before?

            The fix looks like this:

            Make sure that possible sorting of rows from the head table
            is not to be employed.
            */
            if (join->get_sort_by_join_tab())

            { join->simple_order= 0; join->simple_group= 0; join->need_tmp= join->test_if_need_tmp_table(); }

            and it doesn't work for our testcase, because join->get_sort_by_join_tab()=NULL, which is caused by JOIN::sort_by_table being NULL.

            psergei Sergei Petrunia added a comment - The fix looks like this: Make sure that possible sorting of rows from the head table is not to be employed. */ if (join->get_sort_by_join_tab()) { join->simple_order= 0; join->simple_group= 0; join->need_tmp= join->test_if_need_tmp_table(); } and it doesn't work for our testcase, because join->get_sort_by_join_tab()=NULL, which is caused by JOIN::sort_by_table being NULL.

            I've made a fix that runs get_sort_by_table() after const. tables have been detected and read. This caused get_sort_by_join_tab() to return the first JOIN_TAB in this example.

            Tests pass, including valgrind.

            psergei Sergei Petrunia added a comment - I've made a fix that runs get_sort_by_table() after const. tables have been detected and read. This caused get_sort_by_join_tab() to return the first JOIN_TAB in this example. Tests pass, including valgrind.
            dbart Daniel Bartholomew added a comment - http://bazaar.launchpad.net/~maria-captains/maria/5.5/revision/3963.1.2

            People

              psergei Sergei Petrunia
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.