Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32868

Wrong result for SELECT NULL,NULL IN (SUBQUERY)

Details

    • Bug
    • Status: Stalled (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.4(EOL)
    • 10.5
    • None
    • None

    Description

      Those SELECTs must return NULL but they return 0.

      CREATE TABLE s (s INT);
      INSERT INTO s VALUES(1);
       
      SELECT (SELECT NULL, NULL) IN (SELECT 1, 2 FROM s) FROM s;
      0
      

      CREATE TABLE u(a INT, b INT);
      INSERT INTO u VALUES (NULL, NULL);
      SELECT (SELECT * FROM u) IN (SELECT 1, 2 FROM s) FROM s;
      0
      

      Attachments

        Activity

          oleg.smirnov Oleg Smirnov added a comment -

          Could this fix from MySQL be applicable?

          commit 985e788ed645e8282f586150d45d0cdd1172849d
          Author: Knut Anders Hatlen <knut.hatlen@oracle.com>
          Date:   Wed Dec 2 17:34:58 2015 +0100
           
              Bug#22089623: ASSERTION IN ITEM_FUNC_TRIG_COND::VAL_INT() WITH SUBQUERY
                            ON LEFT SIDE OF IN
              
              Problem:
              
              If the left expression of an IN expression is a row subquery that does
              not access any tables, an assertion could be raised in debug builds,
              and wrong results could be returned in release builds.
              
              The assert failure was introduced by the fix for bug#20729351, but the
              wrong results were seen even before that fix.
              
              Analysis:
              
              The root cause for these problems is an inconsistency in how
              nullability is represented. For Item_row, the maybe_null flag tells if
              any of the columns in the row could be NULL. For scalar subqueries and
              row subqueries represented by Item_singlerow_subselect, maybe_null has
              this meaning:
              
              - If the subquery is scalar, maybe_null == true means that the
                returned column could be NULL, or that the subquery could return an
                empty result (which would make the scalar subquery evaluate to
                NULL). This is consistent with how maybe_null is used in Item_row.
              
              - If the subquery returns multiple columns (a row subquery) and it is
                not a UNION, maybe_null == true means that the subquery could return
                empty results. And, conversely, maybe_null == false means that it
                could not return empty results. It does not say anything about the
                nullability of the columns. This is not consistent with how
                maybe_null is used in Item_row and in scalar subqueries.
              
              - If the subquery is a row query that is a UNION, maybe_null == true
                means that it accesses tables, whereas maybe_null == false means
                that it does not access any tables.
              
              Take these two equivalent statements:
              
                SELECT (NULL, NULL) IN ...
              
                SELECT (SELECT NULL, NULL) IN ...
              
              In the former, the Item_row that represents the left operand of the IN
              expression, has maybe_null == true. In the latter, the
              Item_singlerow_subselect that represents the left operand, has
              maybe_null == false.
              
              Or take these two equivalent statements:
              
                SELECT (SELECT 1, 2 WHERE FALSE) IN ...
              
                SELECT (SELECT 1, 2 WHERE FALSE UNION SELECT 1, 2 WHERE FALSE) IN ...
              
              In the former of these, the Item_singlerow_subselect has maybe_null ==
              true. In the latter, maybe_null is false.
              
              These inconsistencies confuse the code that transforms IN expressions
              to EXISTS expressions in the following way:
           
          Item_in_subselect::row_value_transformer() sets up a data structure
              (pushed_cond_guards) that might be needed in the synthetic predicates
              that are created later. It only does this if left_expr->maybe_null is
              true. However, the data structure is needed if any of the values in
              the left expression could be NULL, and this is not what
              left_expr->maybe_null tells us if left_expr is an
              Item_singlerow_subselect, only if it is an Item_row.
              
              Because of this, it might skip setting up pushed_cond_guards even
              though it will be needed later. When it is attempted used later, and
              it is found that it is NULL, it results in an assertion failure or a
              wrong result.
              
              Before bug#20729351, all the columns returned by the row subquery
              (represented by Item_cache objects) had maybe_null set to false, even
              if the value could be NULL. This was clearly wrong, and it caused
              wrong results in some cases. The assert wasn't seen, though.
              
              After bug#20729351, all the columns returned by the row subquery have
              maybe_null pessimistically set to true. This made the queries take a
              slightly different code path, which still produced wrong results, but
              now also raise an assert failure.
              
              Solution:
              
              This patch fixes these problems by making the maybe_null flag in
              Item_singlerow_subselect have a value that is more consistent. Now it
              is false if and only if we know for sure that it is never going to
              return empty results, and none of the returned columns are nullable.
              
              It also makes the maybe_null flag of the individual columns in the row
              subquery more accurate. Instead of being always false/true as they
              were before/after 20729351, the maybe_null flag now tells the actual
              nullability of the column (if the column could have the value NULL, or
              if the subquery could return an empty result, maybe_null is true).
              
              Additionally, testing of this fix exposed another, earlier wrong
              result regression, caused by "WL#6369: EXPLAIN for other thread." A
              query such as
              
                SELECT (SELECT 1, 2 UNION SELECT 1, 2) IN (SELECT 1, 2)
              
              started returning 0 instead of 1 after WL#6369, and it started raising
              an assert failure after bug#20729351.
              
              The problem is very similar to bug#22090717, in that the optimization
              of the IN expression happens earlier now than before. It tries to read
              the values returned from the UNION subquery before the subquery has
              been evaluated. The patch enhances the fix for bug#22090717 with an
              extra check in Item_in_subselect::mark_as_outer() to prevent reading
              of data from a subquery before the subquery has been evaluated, except
              in the case where the subquery returns a basic constant (a literal).
          

          oleg.smirnov Oleg Smirnov added a comment - Could this fix from MySQL be applicable? commit 985e788ed645e8282f586150d45d0cdd1172849d Author: Knut Anders Hatlen <knut.hatlen@oracle.com> Date: Wed Dec 2 17:34:58 2015 +0100   Bug#22089623: ASSERTION IN ITEM_FUNC_TRIG_COND::VAL_INT() WITH SUBQUERY ON LEFT SIDE OF IN Problem: If the left expression of an IN expression is a row subquery that does not access any tables, an assertion could be raised in debug builds, and wrong results could be returned in release builds. The assert failure was introduced by the fix for bug#20729351, but the wrong results were seen even before that fix. Analysis: The root cause for these problems is an inconsistency in how nullability is represented. For Item_row, the maybe_null flag tells if any of the columns in the row could be NULL. For scalar subqueries and row subqueries represented by Item_singlerow_subselect, maybe_null has this meaning: - If the subquery is scalar, maybe_null == true means that the returned column could be NULL, or that the subquery could return an empty result (which would make the scalar subquery evaluate to NULL). This is consistent with how maybe_null is used in Item_row. - If the subquery returns multiple columns (a row subquery) and it is not a UNION, maybe_null == true means that the subquery could return empty results. And, conversely, maybe_null == false means that it could not return empty results. It does not say anything about the nullability of the columns. This is not consistent with how maybe_null is used in Item_row and in scalar subqueries. - If the subquery is a row query that is a UNION, maybe_null == true means that it accesses tables, whereas maybe_null == false means that it does not access any tables. Take these two equivalent statements: SELECT (NULL, NULL) IN ... SELECT (SELECT NULL, NULL) IN ... In the former, the Item_row that represents the left operand of the IN expression, has maybe_null == true. In the latter, the Item_singlerow_subselect that represents the left operand, has maybe_null == false. Or take these two equivalent statements: SELECT (SELECT 1, 2 WHERE FALSE) IN ... SELECT (SELECT 1, 2 WHERE FALSE UNION SELECT 1, 2 WHERE FALSE) IN ... In the former of these, the Item_singlerow_subselect has maybe_null == true. In the latter, maybe_null is false. These inconsistencies confuse the code that transforms IN expressions to EXISTS expressions in the following way:   Item_in_subselect::row_value_transformer() sets up a data structure (pushed_cond_guards) that might be needed in the synthetic predicates that are created later. It only does this if left_expr->maybe_null is true. However, the data structure is needed if any of the values in the left expression could be NULL, and this is not what left_expr->maybe_null tells us if left_expr is an Item_singlerow_subselect, only if it is an Item_row. Because of this, it might skip setting up pushed_cond_guards even though it will be needed later. When it is attempted used later, and it is found that it is NULL, it results in an assertion failure or a wrong result. Before bug#20729351, all the columns returned by the row subquery (represented by Item_cache objects) had maybe_null set to false, even if the value could be NULL. This was clearly wrong, and it caused wrong results in some cases. The assert wasn't seen, though. After bug#20729351, all the columns returned by the row subquery have maybe_null pessimistically set to true. This made the queries take a slightly different code path, which still produced wrong results, but now also raise an assert failure. Solution: This patch fixes these problems by making the maybe_null flag in Item_singlerow_subselect have a value that is more consistent. Now it is false if and only if we know for sure that it is never going to return empty results, and none of the returned columns are nullable. It also makes the maybe_null flag of the individual columns in the row subquery more accurate. Instead of being always false/true as they were before/after 20729351, the maybe_null flag now tells the actual nullability of the column (if the column could have the value NULL, or if the subquery could return an empty result, maybe_null is true). Additionally, testing of this fix exposed another, earlier wrong result regression, caused by "WL#6369: EXPLAIN for other thread." A query such as SELECT (SELECT 1, 2 UNION SELECT 1, 2) IN (SELECT 1, 2) started returning 0 instead of 1 after WL#6369, and it started raising an assert failure after bug#20729351. The problem is very similar to bug#22090717, in that the optimization of the IN expression happens earlier now than before. It tries to read the values returned from the UNION subquery before the subquery has been evaluated. The patch enhances the fix for bug#22090717 with an extra check in Item_in_subselect::mark_as_outer() to prevent reading of data from a subquery before the subquery has been evaluated, except in the case where the subquery returns a basic constant (a literal).
          oleg.smirnov Oleg Smirnov added a comment -

          This commit might also be applicable:

          commit 477eb9aedcf0e8ee722f2968eb3faf3f33c2c9c4
          Author: Guilhem Bichot <guilhem.bichot@oracle.com>, Wed Feb 8 16:25:17 2012 +0100 (13 years ago)
           
          WL#6094 Allow subquery materialization in NOT IN if all columns are not nullable
          WL#6095 Allow subquery materialization in NOT IN if single-column subquery.
          And related refactorings: uniquesubquery_engine is deleted, it's just a particular
          case of indexsubquery_engine.
          Bug#13495157 - SUBQUERY MATERIALIZATION NOT USED FOR CERTAIN STATEMENTS.
           
          Subquery materialization has problems identifying partial matches
          (if an outer or inner expression is NULL, IN(materialized-subq) may return
          FALSE when UNKNOWN would be the correct answer).
          Due to this, subq-mat was so far confined to use cases where FALSE
          and UNKNOWN are equivalent answers, for example
            WHERE x IN (subq)
          but not
            WHERE x NOT IN (subq).
          This restriction is here lifted in some cases:
          - if all columns are nullable (WL#6094), in which case NULLs can't happen
          so problem can't happen
          - if there is a single outer expression (WL#6095), in which case it's easy
          to know what the correct answer is (see comment of item_subselect.cc
          for the logic)
           
          Subquery materialization is now usable by multi-table UPDATE and DELETE.
          

          oleg.smirnov Oleg Smirnov added a comment - This commit might also be applicable: commit 477eb9aedcf0e8ee722f2968eb3faf3f33c2c9c4 Author: Guilhem Bichot <guilhem.bichot@oracle.com>, Wed Feb 8 16:25:17 2012 +0100 (13 years ago)   WL#6094 Allow subquery materialization in NOT IN if all columns are not nullable WL#6095 Allow subquery materialization in NOT IN if single-column subquery. And related refactorings: uniquesubquery_engine is deleted, it's just a particular case of indexsubquery_engine. Bug#13495157 - SUBQUERY MATERIALIZATION NOT USED FOR CERTAIN STATEMENTS.   Subquery materialization has problems identifying partial matches (if an outer or inner expression is NULL, IN(materialized-subq) may return FALSE when UNKNOWN would be the correct answer). Due to this, subq-mat was so far confined to use cases where FALSE and UNKNOWN are equivalent answers, for example WHERE x IN (subq) but not WHERE x NOT IN (subq). This restriction is here lifted in some cases: - if all columns are nullable (WL#6094), in which case NULLs can't happen so problem can't happen - if there is a single outer expression (WL#6095), in which case it's easy to know what the correct answer is (see comment of item_subselect.cc for the logic)   Subquery materialization is now usable by multi-table UPDATE and DELETE.
          oleg.smirnov Oleg Smirnov added a comment -

          Simplified test case:

          CREATE TABLE t1 (a INT);
          INSERT INTO t1 VALUES (1);
          SELECT (NULL, NULL) IN (SELECT 1, 2 from t1);
          DROP TABLE t1;
          

          oleg.smirnov Oleg Smirnov added a comment - Simplified test case: CREATE TABLE t1 (a INT ); INSERT INTO t1 VALUES (1); SELECT ( NULL , NULL ) IN ( SELECT 1, 2 from t1); DROP TABLE t1;

          People

            oleg.smirnov Oleg Smirnov
            oleg.smirnov Oleg Smirnov
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.