Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30408

Spider test using semijoin=off returns wrong result (zero rows)

Details

    Description

      I only tested this on 10.11. The bug could also be in earlier versions.

      Applying this diff:

      diff --git a/storage/spider/mysql-test/spider/bg/t/basic_sql.test b/storage/spider/mysql-test/spider/bg/t/basic_sql.test
      index 9cf4ce99dc3..6a35fe8dd70 100644
      --- a/storage/spider/mysql-test/spider/bg/t/basic_sql.test
      +++ b/storage/spider/mysql-test/spider/bg/t/basic_sql.test
      @@ -156,6 +156,7 @@ if ($USE_CHILD_GROUP2)
         }
       }
       --connection master_1
      +set @@optimizer_switch="semijoin=off";
       SELECT a, b, date_format(c, '%Y-%m-%d %H:%i:%s') FROM ta_l ORDER BY a;
       
       --echo
      

      Causes the following result change:
      mtr spider/bg.basic_sql

      +++ /home/my/maria-10.11/storage/spider/mysql-test/spider/bg/r/basic_sql.reject2023-01-13 20:10:47.205034143 +0200
      @@ -49,6 +49,7 @@
       ) MASTER_1_ENGINE MASTER_1_CHARSET MASTER_1_COMMENT_2_1
       SELECT a, b, c FROM tb_l
       connection master_1;
      +set @@optimizer_switch="semijoin=off";
       SELECT a, b, date_format(c, '%Y-%m-%d %H:%i:%s') FROM ta_l ORDER BY a;
       a      b       date_format(c, '%Y-%m-%d %H:%i:%s')
       1      a       2008-08-01 10:21:39
      @@ -279,11 +280,6 @@
       SELECT a.a, a.b, date_format(a.c, '%Y-%m-%d %H:%i:%s') FROM tb_l a WHERE
       EXISTS (SELECT * FROM ta_l b WHERE b.b = a.b) ORDER BY a.a;
       a      b       date_format(a.c, '%Y-%m-%d %H:%i:%s')
      -1      f       2008-07-01 10:21:39
      -2      g       2000-02-01 00:00:00
      -3      j       2007-05-04 20:03:11
      -4      i       2003-10-30 05:01:03
      -5      h       2001-10-31 23:59:59
       

      select using pushdown

      Attachments

        Activity

          ycp Yuchen Pei added a comment -

          Somehow with the semijoin=off optimizer switch, it uses the group by
          handler (gbh) for the EXISTS subquery. Because such subqueries set
          explicit_limit to true (see
          Item_exists_subselect::fix_length_and_dec()), the gbh uses the default
          limit value 0, which causes the subquery to return empty results.

          Either we make gbh use the correct limit 1 in this case, or ban gbh
          from being used. The latter requires checking why the optimizer switch
          causes gbh to be used for the subquery.

          ycp Yuchen Pei added a comment - Somehow with the semijoin=off optimizer switch, it uses the group by handler (gbh) for the EXISTS subquery. Because such subqueries set explicit_limit to true (see Item_exists_subselect::fix_length_and_dec()), the gbh uses the default limit value 0, which causes the subquery to return empty results. Either we make gbh use the correct limit 1 in this case, or ban gbh from being used. The latter requires checking why the optimizer switch causes gbh to be used for the subquery.
          ycp Yuchen Pei added a comment - - edited

          Hi psergei, ptal this one-liner thanks:

          upstream/bb-10.5-mdev-30408 upstream/bb-10.5-ycp 3e93edb6758cd93ef99319173946a6ef7369c741
          MDEV-30408 Reset explicit_limit in exists2in
           
          Item_exists_subselect::fix_length_and_dec() sets explicit_limit to 1.
          In the exists2in transformation it resets select_limit to NULL. For
          consistency we should reset explicity_limit too.
           
          This fixes a bug where spider table returns wrong results for queries
          that gets through exists2in transformation when semijoin is off.
          

          ycp Yuchen Pei added a comment - - edited Hi psergei , ptal this one-liner thanks: upstream/bb-10.5-mdev-30408 upstream/bb-10.5-ycp 3e93edb6758cd93ef99319173946a6ef7369c741 MDEV-30408 Reset explicit_limit in exists2in   Item_exists_subselect::fix_length_and_dec() sets explicit_limit to 1. In the exists2in transformation it resets select_limit to NULL. For consistency we should reset explicity_limit too.   This fixes a bug where spider table returns wrong results for queries that gets through exists2in transformation when semijoin is off.

          ycp, looks good to me.
          A cosmetic comment is that I would write "= false" instead of "= 0" which is what we used to write in the old code.

          psergei Sergei Petrunia added a comment - ycp , looks good to me. A cosmetic comment is that I would write "= false" instead of "= 0" which is what we used to write in the old code.
          ycp Yuchen Pei added a comment -

          Thanks for the review - updated and pushed 03a350378ad627d509a31a788896067b02f80e53 to 10.5

          ycp Yuchen Pei added a comment - Thanks for the review - updated and pushed 03a350378ad627d509a31a788896067b02f80e53 to 10.5

          People

            ycp Yuchen Pei
            monty Michael Widenius
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.