Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-11078

NULL NOT IN (non-empty subquery) should never return results

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.1.18, 5.5(EOL), 10.0(EOL), 10.1(EOL), 10.2(EOL)
    • 5.5.55
    • None
    • OS X El Capitan, MariaDB installed through Homebrew
    • 10.0.30

    Description

      I'm confused by the behaviour of `NULL` in a `NOT IN` expression. (MariaDB 10.1.18):

          MariaDB [(none)]> select 1 from dual where null not in (1);
          Empty set (0.00 sec)
       
          MariaDB [(none)]> select 1 from dual where null not in (select 1);
          Empty set (0.00 sec)
       
          MariaDB [(none)]> select 1 from dual where null not in (select 1 from dual);
          Empty set (0.00 sec)
      

      This is as I expected, since `NULL` is not equal or unequal to anything.

          MariaDB [(none)]> select 1 from dual where null not in (select 1 from mysql.user);
          +---+
          | 1 |
          +---+
          | 1 |
          +---+
          1 row in set (0.00 sec)
      

      Huh? How can that be? Maybe it's because I can't read the mysql.user table?

          MariaDB [(none)]> select distinct 1 from mysql.user;
          +---+
          | 1 |
          +---+
          | 1 |
          +---+
          1 row in set (0.00 sec)
      

      No, it returns a value. When selecting the one from a actual table, not `dual`, or an inline list expression, the behaviour is different.

      Is this a bug in MariaDB? An old MySQL server (5.5.40) I tried this with gave the expected empty set for all queries.

      Attachments

        Activity

          Thanks for the report and test case.

          elenst Elena Stepanova added a comment - Thanks for the report and test case.
          varun Varun Gupta (Inactive) added a comment - - edited

          create table t1(a int);
          insert into t1 value (1);
           
          EXPLAIN
          select 1 from t1 where  null not in (select 1 from t1);
          id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
          1	PRIMARY	t1	system	NULL	NULL	NULL	NULL	1	
          2	SUBQUERY	NULL	NULL	NULL	NULL	NULL	NULL	NULL	Impossible WHERE noticed after reading const tables
          
          

          varun Varun Gupta (Inactive) added a comment - - edited create table t1(a int ); insert into t1 value (1);   EXPLAIN select 1 from t1 where null not in ( select 1 from t1); id select_type table type possible_keys key key_len ref rows Extra 1 PRIMARY t1 system NULL NULL NULL NULL 1 2 SUBQUERY NULL NULL NULL NULL NULL NULL NULL Impossible WHERE noticed after reading const tables

          Results of investigation discussion:
          We get to the wrong query result, because the optimizer gets Impossible where after reading const tables for the subquery.

          This should not happen (as the subquery produces rows). However it does happen: IN->EXISTS rewrite changes the subquery to be

          exists (select 1 from t1 where trig_cond(null=1))
          

          The trig_cond is "enabled" at optimization phase, the optimizer sees it as constant, evaluates it, finds it to be NULL, and concludes that the subquery has "Impossible WHERE".

          psergei Sergei Petrunia added a comment - Results of investigation discussion: We get to the wrong query result, because the optimizer gets Impossible where after reading const tables for the subquery. This should not happen (as the subquery produces rows). However it does happen: IN->EXISTS rewrite changes the subquery to be exists (select 1 from t1 where trig_cond(null=1)) The trig_cond is "enabled" at optimization phase, the optimizer sees it as constant, evaluates it, finds it to be NULL, and concludes that the subquery has "Impossible WHERE".

          That is,

          Subquery's left expression is NULL. This means that trig_cond should be disabled when we evaluate the subquery. However, subquery optimization code assumes it to be enabled, and produces a wrong query result.

          psergei Sergei Petrunia added a comment - That is, Subquery's left expression is NULL. This means that trig_cond should be disabled when we evaluate the subquery. However, subquery optimization code assumes it to be enabled, and produces a wrong query result.

          More details:

          I stop here in the subquery:

            Breakpoint 5, make_join_select (join=0x7fffcdcf15c0,...)
          (gdb) p join->select_lex->select_number
            $45 = 2
          

          (gdb) p dbug_print_item(cond)
            $48 = 0x15674c0 <dbug_item_print_buf> "trigcond(<cache>((NULL = 1)))"
          (gdb) p cond->val_int()
            $49 = 0
          (gdb) p cond->const_item()
            $50 = false
          (gdb) p cond->used_tables()
            $51 = 0
          

          Then, the trigcond is moved into join->exec_const_cond, and we reach this code:

                  if (join->exec_const_cond && !join->exec_const_cond->is_expensive() &&
                      !join->exec_const_cond->val_int())
                  {
                    DBUG_PRINT("info",("Found impossible WHERE condition"));
          

          where is_expensive()=false, and cond->val_int()==0.

          psergei Sergei Petrunia added a comment - More details: I stop here in the subquery: Breakpoint 5, make_join_select (join=0x7fffcdcf15c0,...) (gdb) p join->select_lex->select_number $45 = 2 (gdb) p dbug_print_item(cond) $48 = 0x15674c0 <dbug_item_print_buf> "trigcond(<cache>((NULL = 1)))" (gdb) p cond->val_int() $49 = 0 (gdb) p cond->const_item() $50 = false (gdb) p cond->used_tables() $51 = 0 Then, the trigcond is moved into join->exec_const_cond, and we reach this code: if (join->exec_const_cond && !join->exec_const_cond->is_expensive() && !join->exec_const_cond->val_int()) { DBUG_PRINT("info",("Found impossible WHERE condition")); where is_expensive()=false, and cond->val_int()==0.

          So, for trigcond(const_expr), we have

          1. const_item() = FALSE
          2. used_tables() = 0

          #1 seems to be correct (as the value may change during the statement execution, depending on whether trigger is ON or OFF)

          #2 seems to be incorrect. Should it be OUTER_REF_TABLE_BIT instead?

          psergei Sergei Petrunia added a comment - So, for trigcond(const_expr) , we have 1. const_item() = FALSE 2. used_tables() = 0 #1 seems to be correct (as the value may change during the statement execution, depending on whether trigger is ON or OFF) #2 seems to be incorrect. Should it be OUTER_REF_TABLE_BIT instead?

          According to documentation:
          trigcond(X) := TRUE when the "linked" outer expression oe_i is NULL

          For our query the above condition holds "SELECT 1 from dual where NULL in (select t1 from t1)"
          but the condition join->exec_const_cond->val_int() return 0 which is wrong.

          varun Varun Gupta (Inactive) added a comment - According to documentation: trigcond(X) := TRUE when the "linked" outer expression oe_i is NULL For our query the above condition holds "SELECT 1 from dual where NULL in (select t1 from t1)" but the condition join->exec_const_cond->val_int() return 0 which is wrong.

          People

            varun Varun Gupta (Inactive)
            ComaVN Roel Harbers
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.