[MDEV-32868] Wrong result for SELECT NULL,NULL IN (SUBQUERY) - Jira

Details

Type: Bug
Status: Stalled (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: 10.4(EOL)
Fix Version/s: 10.5
Component/s: None
Labels:
None

Description

Those SELECTs must return NULL but they return 0.

CREATE TABLE s (s INT);

INSERT INTO s VALUES(1);

SELECT (SELECT NULL, NULL) IN (SELECT 1, 2 FROM s) FROM s;

CREATE TABLE u(a INT, b INT);

INSERT INTO u VALUES (NULL, NULL);

SELECT (SELECT * FROM u) IN (SELECT 1, 2 FROM s) FROM s;

Attachments

Activity

Ascending order - Click to sort in descending order

Oleg Smirnov added a comment - 2023-11-24 05:00

Could this fix from MySQL be applicable?

commit 985e788ed645e8282f586150d45d0cdd1172849d

Author: Knut Anders Hatlen <knut.hatlen@oracle.com>

Date:   Wed Dec 2 17:34:58 2015 +0100

    Bug#22089623: ASSERTION IN ITEM_FUNC_TRIG_COND::VAL_INT() WITH SUBQUERY

                  ON LEFT SIDE OF IN

    Problem:

    If the left expression of an IN expression is a row subquery that does

    not access any tables, an assertion could be raised in debug builds,

    and wrong results could be returned in release builds.

    The assert failure was introduced by the fix for bug#20729351, but the

    wrong results were seen even before that fix.

    Analysis:

    The root cause for these problems is an inconsistency in how

    nullability is represented. For Item_row, the maybe_null flag tells if

    any of the columns in the row could be NULL. For scalar subqueries and

    row subqueries represented by Item_singlerow_subselect, maybe_null has

    this meaning:

    - If the subquery is scalar, maybe_null == true means that the

      returned column could be NULL, or that the subquery could return an

      empty result (which would make the scalar subquery evaluate to

      NULL). This is consistent with how maybe_null is used in Item_row.

    - If the subquery returns multiple columns (a row subquery) and it is

      not a UNION, maybe_null == true means that the subquery could return

      empty results. And, conversely, maybe_null == false means that it

      could not return empty results. It does not say anything about the

      nullability of the columns. This is not consistent with how

      maybe_null is used in Item_row and in scalar subqueries.

    - If the subquery is a row query that is a UNION, maybe_null == true

      means that it accesses tables, whereas maybe_null == false means

      that it does not access any tables.

    Take these two equivalent statements:

      SELECT (NULL, NULL) IN ...

      SELECT (SELECT NULL, NULL) IN ...

    In the former, the Item_row that represents the left operand of the IN

    expression, has maybe_null == true. In the latter, the

    Item_singlerow_subselect that represents the left operand, has

    maybe_null == false.

    Or take these two equivalent statements:

      SELECT (SELECT 1, 2 WHERE FALSE) IN ...

      SELECT (SELECT 1, 2 WHERE FALSE UNION SELECT 1, 2 WHERE FALSE) IN ...

    In the former of these, the Item_singlerow_subselect has maybe_null ==

    true. In the latter, maybe_null is false.

    These inconsistencies confuse the code that transforms IN expressions

    to EXISTS expressions in the following way:

Item_in_subselect::row_value_transformer() sets up a data structure

    (pushed_cond_guards) that might be needed in the synthetic predicates

    that are created later. It only does this if left_expr->maybe_null is

    true. However, the data structure is needed if any of the values in

    the left expression could be NULL, and this is not what

    left_expr->maybe_null tells us if left_expr is an

    Item_singlerow_subselect, only if it is an Item_row.

    Because of this, it might skip setting up pushed_cond_guards even

    though it will be needed later. When it is attempted used later, and

    it is found that it is NULL, it results in an assertion failure or a

    wrong result.

    Before bug#20729351, all the columns returned by the row subquery

    (represented by Item_cache objects) had maybe_null set to false, even

    if the value could be NULL. This was clearly wrong, and it caused

    wrong results in some cases. The assert wasn't seen, though.

    After bug#20729351, all the columns returned by the row subquery have

    maybe_null pessimistically set to true. This made the queries take a

    slightly different code path, which still produced wrong results, but

    now also raise an assert failure.

    Solution:

    This patch fixes these problems by making the maybe_null flag in

    Item_singlerow_subselect have a value that is more consistent. Now it

    is false if and only if we know for sure that it is never going to

    return empty results, and none of the returned columns are nullable.

    It also makes the maybe_null flag of the individual columns in the row

    subquery more accurate. Instead of being always false/true as they

    were before/after 20729351, the maybe_null flag now tells the actual

    nullability of the column (if the column could have the value NULL, or

    if the subquery could return an empty result, maybe_null is true).

    Additionally, testing of this fix exposed another, earlier wrong

    result regression, caused by "WL#6369: EXPLAIN for other thread." A

    query such as

      SELECT (SELECT 1, 2 UNION SELECT 1, 2) IN (SELECT 1, 2)

    started returning 0 instead of 1 after WL#6369, and it started raising

    an assert failure after bug#20729351.

    The problem is very similar to bug#22090717, in that the optimization

    of the IN expression happens earlier now than before. It tries to read

    the values returned from the UNION subquery before the subquery has

    been evaluated. The patch enhances the fix for bug#22090717 with an

    extra check in Item_in_subselect::mark_as_outer() to prevent reading

    of data from a subquery before the subquery has been evaluated, except

    in the case where the subquery returns a basic constant (a literal).

Oleg Smirnov added a comment - 2023-11-24 05:00 Could this fix from MySQL be applicable? commit 985e788ed645e8282f586150d45d0cdd1172849d Author: Knut Anders Hatlen <knut.hatlen@oracle.com> Date: Wed Dec 2 17:34:58 2015 +0100 Bug#22089623: ASSERTION IN ITEM_FUNC_TRIG_COND::VAL_INT() WITH SUBQUERY ON LEFT SIDE OF IN Problem: If the left expression of an IN expression is a row subquery that does not access any tables, an assertion could be raised in debug builds, and wrong results could be returned in release builds. The assert failure was introduced by the fix for bug#20729351, but the wrong results were seen even before that fix. Analysis: The root cause for these problems is an inconsistency in how nullability is represented. For Item_row, the maybe_null flag tells if any of the columns in the row could be NULL. For scalar subqueries and row subqueries represented by Item_singlerow_subselect, maybe_null has this meaning: - If the subquery is scalar, maybe_null == true means that the returned column could be NULL, or that the subquery could return an empty result (which would make the scalar subquery evaluate to NULL). This is consistent with how maybe_null is used in Item_row. - If the subquery returns multiple columns (a row subquery) and it is not a UNION, maybe_null == true means that the subquery could return empty results. And, conversely, maybe_null == false means that it could not return empty results. It does not say anything about the nullability of the columns. This is not consistent with how maybe_null is used in Item_row and in scalar subqueries. - If the subquery is a row query that is a UNION, maybe_null == true means that it accesses tables, whereas maybe_null == false means that it does not access any tables. Take these two equivalent statements: SELECT (NULL, NULL) IN ... SELECT (SELECT NULL, NULL) IN ... In the former, the Item_row that represents the left operand of the IN expression, has maybe_null == true. In the latter, the Item_singlerow_subselect that represents the left operand, has maybe_null == false. Or take these two equivalent statements: SELECT (SELECT 1, 2 WHERE FALSE) IN ... SELECT (SELECT 1, 2 WHERE FALSE UNION SELECT 1, 2 WHERE FALSE) IN ... In the former of these, the Item_singlerow_subselect has maybe_null == true. In the latter, maybe_null is false. These inconsistencies confuse the code that transforms IN expressions to EXISTS expressions in the following way: Item_in_subselect::row_value_transformer() sets up a data structure (pushed_cond_guards) that might be needed in the synthetic predicates that are created later. It only does this if left_expr->maybe_null is true. However, the data structure is needed if any of the values in the left expression could be NULL, and this is not what left_expr->maybe_null tells us if left_expr is an Item_singlerow_subselect, only if it is an Item_row. Because of this, it might skip setting up pushed_cond_guards even though it will be needed later. When it is attempted used later, and it is found that it is NULL, it results in an assertion failure or a wrong result. Before bug#20729351, all the columns returned by the row subquery (represented by Item_cache objects) had maybe_null set to false, even if the value could be NULL. This was clearly wrong, and it caused wrong results in some cases. The assert wasn't seen, though. After bug#20729351, all the columns returned by the row subquery have maybe_null pessimistically set to true. This made the queries take a slightly different code path, which still produced wrong results, but now also raise an assert failure. Solution: This patch fixes these problems by making the maybe_null flag in Item_singlerow_subselect have a value that is more consistent. Now it is false if and only if we know for sure that it is never going to return empty results, and none of the returned columns are nullable. It also makes the maybe_null flag of the individual columns in the row subquery more accurate. Instead of being always false/true as they were before/after 20729351, the maybe_null flag now tells the actual nullability of the column (if the column could have the value NULL, or if the subquery could return an empty result, maybe_null is true). Additionally, testing of this fix exposed another, earlier wrong result regression, caused by "WL#6369: EXPLAIN for other thread." A query such as SELECT (SELECT 1, 2 UNION SELECT 1, 2) IN (SELECT 1, 2) started returning 0 instead of 1 after WL#6369, and it started raising an assert failure after bug#20729351. The problem is very similar to bug#22090717, in that the optimization of the IN expression happens earlier now than before. It tries to read the values returned from the UNION subquery before the subquery has been evaluated. The patch enhances the fix for bug#22090717 with an extra check in Item_in_subselect::mark_as_outer() to prevent reading of data from a subquery before the subquery has been evaluated, except in the case where the subquery returns a basic constant (a literal).

Oleg Smirnov added a comment - 2024-12-22 06:12

This commit might also be applicable:

commit 477eb9aedcf0e8ee722f2968eb3faf3f33c2c9c4

Author: Guilhem Bichot <guilhem.bichot@oracle.com>, Wed Feb 8 16:25:17 2012 +0100 (13 years ago)

WL#6094 Allow subquery materialization in NOT IN if all columns are not nullable

WL#6095 Allow subquery materialization in NOT IN if single-column subquery.

And related refactorings: uniquesubquery_engine is deleted, it's just a particular

case of indexsubquery_engine.

Bug#13495157 - SUBQUERY MATERIALIZATION NOT USED FOR CERTAIN STATEMENTS.

Subquery materialization has problems identifying partial matches

(if an outer or inner expression is NULL, IN(materialized-subq) may return

FALSE when UNKNOWN would be the correct answer).

Due to this, subq-mat was so far confined to use cases where FALSE

and UNKNOWN are equivalent answers, for example

  WHERE x IN (subq)

but not

  WHERE x NOT IN (subq).

This restriction is here lifted in some cases:

- if all columns are nullable (WL#6094), in which case NULLs can't happen

so problem can't happen

- if there is a single outer expression (WL#6095), in which case it's easy

to know what the correct answer is (see comment of item_subselect.cc

for the logic)

Subquery materialization is now usable by multi-table UPDATE and DELETE.

Oleg Smirnov added a comment - 2024-12-22 06:12 This commit might also be applicable: commit 477eb9aedcf0e8ee722f2968eb3faf3f33c2c9c4 Author: Guilhem Bichot <guilhem.bichot@oracle.com>, Wed Feb 8 16:25:17 2012 +0100 (13 years ago) WL#6094 Allow subquery materialization in NOT IN if all columns are not nullable WL#6095 Allow subquery materialization in NOT IN if single-column subquery. And related refactorings: uniquesubquery_engine is deleted, it's just a particular case of indexsubquery_engine. Bug#13495157 - SUBQUERY MATERIALIZATION NOT USED FOR CERTAIN STATEMENTS. Subquery materialization has problems identifying partial matches (if an outer or inner expression is NULL, IN(materialized-subq) may return FALSE when UNKNOWN would be the correct answer). Due to this, subq-mat was so far confined to use cases where FALSE and UNKNOWN are equivalent answers, for example WHERE x IN (subq) but not WHERE x NOT IN (subq). This restriction is here lifted in some cases: - if all columns are nullable (WL#6094), in which case NULLs can't happen so problem can't happen - if there is a single outer expression (WL#6095), in which case it's easy to know what the correct answer is (see comment of item_subselect.cc for the logic) Subquery materialization is now usable by multi-table UPDATE and DELETE.

Oleg Smirnov added a comment - 2024-12-23 15:09

Simplified test case:

CREATE TABLE t1 (a INT);

INSERT INTO t1 VALUES (1);

SELECT (NULL, NULL) IN (SELECT 1, 2 from t1);

DROP TABLE t1;

Oleg Smirnov added a comment - 2024-12-23 15:09 Simplified test case: CREATE TABLE t1 (a INT ); INSERT INTO t1 VALUES (1); SELECT ( NULL , NULL ) IN ( SELECT 1, 2 from t1); DROP TABLE t1;

MariaDB Server

Wrong result for SELECT NULL,NULL IN (SUBQUERY)

Details

Description

Attachments

Activity

People

Dates

Git Integration