Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30408

Spider test using semijoin=off returns wrong result (zero rows)

Details

    Description

      I only tested this on 10.11. The bug could also be in earlier versions.

      Applying this diff:

      diff --git a/storage/spider/mysql-test/spider/bg/t/basic_sql.test b/storage/spider/mysql-test/spider/bg/t/basic_sql.test
      index 9cf4ce99dc3..6a35fe8dd70 100644
      --- a/storage/spider/mysql-test/spider/bg/t/basic_sql.test
      +++ b/storage/spider/mysql-test/spider/bg/t/basic_sql.test
      @@ -156,6 +156,7 @@ if ($USE_CHILD_GROUP2)
         }
       }
       --connection master_1
      +set @@optimizer_switch="semijoin=off";
       SELECT a, b, date_format(c, '%Y-%m-%d %H:%i:%s') FROM ta_l ORDER BY a;
       
       --echo
      

      Causes the following result change:
      mtr spider/bg.basic_sql

      +++ /home/my/maria-10.11/storage/spider/mysql-test/spider/bg/r/basic_sql.reject2023-01-13 20:10:47.205034143 +0200
      @@ -49,6 +49,7 @@
       ) MASTER_1_ENGINE MASTER_1_CHARSET MASTER_1_COMMENT_2_1
       SELECT a, b, c FROM tb_l
       connection master_1;
      +set @@optimizer_switch="semijoin=off";
       SELECT a, b, date_format(c, '%Y-%m-%d %H:%i:%s') FROM ta_l ORDER BY a;
       a      b       date_format(c, '%Y-%m-%d %H:%i:%s')
       1      a       2008-08-01 10:21:39
      @@ -279,11 +280,6 @@
       SELECT a.a, a.b, date_format(a.c, '%Y-%m-%d %H:%i:%s') FROM tb_l a WHERE
       EXISTS (SELECT * FROM ta_l b WHERE b.b = a.b) ORDER BY a.a;
       a      b       date_format(a.c, '%Y-%m-%d %H:%i:%s')
      -1      f       2008-07-01 10:21:39
      -2      g       2000-02-01 00:00:00
      -3      j       2007-05-04 20:03:11
      -4      i       2003-10-30 05:01:03
      -5      h       2001-10-31 23:59:59
       

      select using pushdown

      Attachments

        Activity

          ycp Yuchen Pei added a comment -

          Thanks for the review - updated and pushed 03a350378ad627d509a31a788896067b02f80e53 to 10.5

          ycp Yuchen Pei added a comment - Thanks for the review - updated and pushed 03a350378ad627d509a31a788896067b02f80e53 to 10.5

          ycp, looks good to me.
          A cosmetic comment is that I would write "= false" instead of "= 0" which is what we used to write in the old code.

          psergei Sergei Petrunia added a comment - ycp , looks good to me. A cosmetic comment is that I would write "= false" instead of "= 0" which is what we used to write in the old code.
          ycp Yuchen Pei added a comment - - edited

          Hi psergei, ptal this one-liner thanks:

          upstream/bb-10.5-mdev-30408 upstream/bb-10.5-ycp 3e93edb6758cd93ef99319173946a6ef7369c741
          MDEV-30408 Reset explicit_limit in exists2in
           
          Item_exists_subselect::fix_length_and_dec() sets explicit_limit to 1.
          In the exists2in transformation it resets select_limit to NULL. For
          consistency we should reset explicity_limit too.
           
          This fixes a bug where spider table returns wrong results for queries
          that gets through exists2in transformation when semijoin is off.
          

          ycp Yuchen Pei added a comment - - edited Hi psergei , ptal this one-liner thanks: upstream/bb-10.5-mdev-30408 upstream/bb-10.5-ycp 3e93edb6758cd93ef99319173946a6ef7369c741 MDEV-30408 Reset explicit_limit in exists2in   Item_exists_subselect::fix_length_and_dec() sets explicit_limit to 1. In the exists2in transformation it resets select_limit to NULL. For consistency we should reset explicity_limit too.   This fixes a bug where spider table returns wrong results for queries that gets through exists2in transformation when semijoin is off.
          ycp Yuchen Pei added a comment -

          Somehow with the semijoin=off optimizer switch, it uses the group by
          handler (gbh) for the EXISTS subquery. Because such subqueries set
          explicit_limit to true (see
          Item_exists_subselect::fix_length_and_dec()), the gbh uses the default
          limit value 0, which causes the subquery to return empty results.

          Either we make gbh use the correct limit 1 in this case, or ban gbh
          from being used. The latter requires checking why the optimizer switch
          causes gbh to be used for the subquery.

          ycp Yuchen Pei added a comment - Somehow with the semijoin=off optimizer switch, it uses the group by handler (gbh) for the EXISTS subquery. Because such subqueries set explicit_limit to true (see Item_exists_subselect::fix_length_and_dec()), the gbh uses the default limit value 0, which causes the subquery to return empty results. Either we make gbh use the correct limit 1 in this case, or ban gbh from being used. The latter requires checking why the optimizer switch causes gbh to be used for the subquery.

          People

            ycp Yuchen Pei
            monty Michael Widenius
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.