Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27624

Wrong result for nested left join using not_exists optimization

Details

    Description

      The following query returns wrong result:

      CREATE TABLE t1(
      K1 INT PRIMARY KEY,
      Name VARCHAR(15)
      );
      INSERT INTO t1 VALUES
      (1,'T1Row1'), (2,'T1Row2');
      CREATE TABLE t2(
      K2 INT PRIMARY KEY,
      K1r INT,
      rowTimestamp DATETIME,
      Event VARCHAR(15)
      );
       
      INSERT INTO t2 VALUES
      (1, 1, '2015-04-13 10:42:11' ,'T1Row1Event1'),
      (2, 1, '2015-04-13 10:42:12' ,'T1Row1Event2'),
      (3, 1, '2015-04-13 10:42:12' ,'T1Row1Event3');
       
      SELECT t1a.*, t2a.*,
      t2i.K2 AS K2B, t2i.K1r AS K1rB,
      t2i.rowTimestamp AS rowTimestampB, t2i.Event AS EventB
      FROM
      t1 t1a JOIN t2 t2a ON t2a.K1r = t1a.K1
      LEFT JOIN
      ( t1 t1i LEFT JOIN t2 t2i ON t2i.K1r = t1i.K1)
      ON (t1i.K1 = 1) AND
      (((t2i.K1r = t1a.K1 AND t2i.rowTimestamp > t2a.rowTimestamp ) OR
      (t2i.rowTimestamp = t2a.rowTimestamp AND t2i.K2 > t2a.K2))
      OR (t2i.K2 IS NULL))
      WHERE
      t2a.K1r = 1 AND t2i.K2 IS NULL;
      MariaDB [test]> SELECT t1a.*, t2a.*, t2i.K2 AS K2B, t2i.K1r AS K1rB, t2i.rowTimestamp AS rowTimestampB, t2i.Event AS EventB FROM t1 t1a JOIN t2 t2a ON t2a.K1r = t1a.K1 LEFT JOIN ( t1 t1i LEFT JOIN t2 t2i ON t2i.K1r = t1i.K1) ON (t1i.K1 = 1) AND (((t2i.K1r = t1a.K1 AND t2i.rowTimestamp > t2a.rowTimestamp ) OR (t2i.rowTimestamp = t2a.rowTimestamp AND t2i.K2 > t2a.K2)) OR (t2i.K2 IS NULL)) WHERE t2a.K1r = 1 AND t2i.K2 IS NULL;
      +----+--------+----+------+---------------------+--------------+------+------+---------------+--------+
      | K1 | Name   | K2 | K1r  | rowTimestamp        | Event        | K2B  | K1rB | rowTimestampB | EventB |
      +----+--------+----+------+---------------------+--------------+------+------+---------------+--------+
      |  1 | T1Row1 |  3 |    1 | 2015-04-13 10:42:12 | T1Row1Event3 | NULL | NULL | NULL          | NULL   |
      +----+--------+----+------+---------------------+--------------+------+------+---------------+--------+
      1 row in set (0.000 sec)
      

      it should be:

      mysql> SELECT t1a.*, t2a.*,
          -> t2i.K2 AS K2B, t2i.K1r AS K1rB,
          -> t2i.rowTimestamp AS rowTimestampB, t2i.Event AS EventB
          -> FROM
          -> t1 t1a JOIN t2 t2a ON t2a.K1r = t1a.K1
          -> LEFT JOIN
          -> ( t1 t1i LEFT JOIN t2 t2i ON t2i.K1r = t1i.K1)
          -> ON (t1i.K1 = 1) AND
          -> (((t2i.K1r = t1a.K1 AND t2i.rowTimestamp > t2a.rowTimestamp ) OR
          -> (t2i.rowTimestamp = t2a.rowTimestamp AND t2i.K2 > t2a.K2))
          -> OR (t2i.K2 IS NULL))
          -> WHERE
          -> t2a.K1r = 1 AND t2i.K2 IS NULL;
      +----+--------+----+------+---------------------+--------------+------+------+---------------+--------+
      | K1 | Name   | K2 | K1r  | rowTimestamp        | Event        | K2B  | K1rB | rowTimestampB | EventB |
      +----+--------+----+------+---------------------+--------------+------+------+---------------+--------+
      |  1 | T1Row1 |  1 |    1 | 2015-04-13 10:42:11 | T1Row1Event1 | NULL | NULL | NULL          | NULL   |
      |  1 | T1Row1 |  2 |    1 | 2015-04-13 10:42:12 | T1Row1Event2 | NULL | NULL | NULL          | NULL   |
      |  1 | T1Row1 |  3 |    1 | 2015-04-13 10:42:12 | T1Row1Event3 | NULL | NULL | NULL          | NULL   |
      +----+--------+----+------+---------------------+--------------+------+------+---------------+--------+
      3 rows in set (0.01 sec)
      
      

      Attachments

        Issue Links

          Activity

            oleg.smirnov Oleg Smirnov added a comment -

            An attempt to construct a dataset to demonstrate a non-legitimate usage of 'not exists'.

            CREATE TABLE t1 (a INT NOT NULL, b INT NOT NULL);
            CREATE TABLE t2 (a INT NOT NULL, b INT NOT NULL);
            CREATE TABLE t3 (a INT NOT NULL, b INT NOT NULL);
             
            INSERT INTO t1 VALUES (1,1), (2,1);
            INSERT INTO t2 VALUES (1,1), (1,2);
            INSERT INTO t3 VALUES (1,2), (2,1);
             
            SET join_cache_level=0;
            

            We expect the 2nd line of this dataset will be skipped if condition t2.a IS NULL will be applied due to the 'not exists' optimization:

            MariaDB [mdev27624_2]> SELECT * FROM t1 LEFT JOIN t2 ON t1.a=t2.a;
            +---+---+------+------+
            | a | b | a    | b    |
            +---+---+------+------+
            | 1 | 1 |    1 |    1 |
            | 1 | 1 |    1 |    2 |
            | 2 | 1 | NULL | NULL |
            +---+---+------+------+
            

            So the t3 row

            {1,2}

            will not match the ON condition and will be NULL-complemented:

            MariaDB [mdev27624_2]> SELECT * FROM t3 LEFT JOIN (t1 LEFT JOIN t2 ON t1.a=t2.a)
                ->   ON t3.a = t2.a AND t3.b=t2.b
                ->    WHERE t2.a IS NULL;
            +---+---+------+------+------+------+
            | a | b | a    | b    | a    | b    |
            +---+---+------+------+------+------+
            | 2 | 1 | NULL | NULL | NULL | NULL |
            +---+---+------+------+------+------+
            

            But this does not happen, the result is correct.

            oleg.smirnov Oleg Smirnov added a comment - An attempt to construct a dataset to demonstrate a non-legitimate usage of 'not exists'. CREATE TABLE t1 (a INT NOT NULL , b INT NOT NULL ); CREATE TABLE t2 (a INT NOT NULL , b INT NOT NULL ); CREATE TABLE t3 (a INT NOT NULL , b INT NOT NULL );   INSERT INTO t1 VALUES (1,1), (2,1); INSERT INTO t2 VALUES (1,1), (1,2); INSERT INTO t3 VALUES (1,2), (2,1);   SET join_cache_level=0; We expect the 2nd line of this dataset will be skipped if condition t2.a IS NULL will be applied due to the 'not exists' optimization: MariaDB [mdev27624_2]> SELECT * FROM t1 LEFT JOIN t2 ON t1.a=t2.a; + ---+---+------+------+ | a | b | a | b | + ---+---+------+------+ | 1 | 1 | 1 | 1 | | 1 | 1 | 1 | 2 | | 2 | 1 | NULL | NULL | + ---+---+------+------+ So the t3 row {1,2} will not match the ON condition and will be NULL-complemented: MariaDB [mdev27624_2]> SELECT * FROM t3 LEFT JOIN (t1 LEFT JOIN t2 ON t1.a=t2.a) -> ON t3.a = t2.a AND t3.b=t2.b -> WHERE t2.a IS NULL ; + ---+---+------+------+------+------+ | a | b | a | b | a | b | + ---+---+------+------+------+------+ | 2 | 1 | NULL | NULL | NULL | NULL | + ---+---+------+------+------+------+ But this does not happen, the result is correct.

            It seems my yesterday statement that Not Exists doesn't shortcut anything for nested outer joins was incorrect.

            Here's an example:

            set join_cache_level=0;
            create table t21(t21_a int);
            insert into t21 values (1);
             
            create table t22 (t22_a int, t22_b int);
            insert into t22 values (1,10);
            insert into t22 values (1,11);
             
            create table t23 (t23_b int not null, t23_c int not null);
            insert into t23 values
            (10, 100),
            (10, 101),
            (10, 102),
            (10, 103),
            (10, 104),
            (10, 105),
            (10, 106),
            (10, 107),
            (10, 108);
             
             
            create table t23null(t23_b int, t23_c int);
            insert into t23null select * from t23;
             
            create or replace temporary table t_initial as select table_schema, table_name, rows_fetched from sys.schema_table_statistics;
             
            select *  from       t21 left join      (t22 left join t23 on t22_b=t23_b)     on (t21_a= t22_a)  where    t23_b is null;
             
            select NEW.table_name, NEW.rows_fetched - ORIG.rows_fetched as DIFF 
            from sys.schema_table_statistics NEW, t_initial ORIG 
            where NEW.table_name=ORIG.table_name and NEW.table_schema=ORIG.table_schema and NEW.table_schema=database() having DIFF>0;
             
            create or replace temporary table t_initial as select table_schema, table_name, rows_fetched from sys.schema_table_statistics;
            select *  from       t21 left join      (t22 left join t23null on t22_b=t23_b)     on (t21_a= t22_a)  where    t23_b is null;
            select NEW.table_name, NEW.rows_fetched - ORIG.rows_fetched as DIFF
            from sys.schema_table_statistics NEW, t_initial ORIG 
            where NEW.table_name=ORIG.table_name and NEW.table_schema=ORIG.table_schema and NEW.table_schema=database() having DIFF>0;
            
            

            This shows

            +------------+------+
            | table_name | DIFF |
            +------------+------+
            | t21        |    2 |
            | t22        |    3 |
            | t23        |   10 |
            +------------+------+
            

            and

            +------------+------+
            | table_name | DIFF |
            +------------+------+
            | t21        |    2 |
            | t22        |    3 |
            | t23null    |   18 |
            +------------+------+
            

            which shows shortcutting is done. Perhaps it's different from what I expected it to be.

            psergei Sergei Petrunia added a comment - It seems my yesterday statement that Not Exists doesn't shortcut anything for nested outer joins was incorrect. Here's an example: set join_cache_level=0; create table t21(t21_a int); insert into t21 values (1);   create table t22 (t22_a int, t22_b int); insert into t22 values (1,10); insert into t22 values (1,11);   create table t23 (t23_b int not null, t23_c int not null); insert into t23 values (10, 100), (10, 101), (10, 102), (10, 103), (10, 104), (10, 105), (10, 106), (10, 107), (10, 108);     create table t23null(t23_b int, t23_c int); insert into t23null select * from t23;   create or replace temporary table t_initial as select table_schema, table_name, rows_fetched from sys.schema_table_statistics;   select * from t21 left join (t22 left join t23 on t22_b=t23_b) on (t21_a= t22_a) where t23_b is null;   select NEW.table_name, NEW.rows_fetched - ORIG.rows_fetched as DIFF from sys.schema_table_statistics NEW, t_initial ORIG where NEW.table_name=ORIG.table_name and NEW.table_schema=ORIG.table_schema and NEW.table_schema=database() having DIFF>0;   create or replace temporary table t_initial as select table_schema, table_name, rows_fetched from sys.schema_table_statistics; select * from t21 left join (t22 left join t23null on t22_b=t23_b) on (t21_a= t22_a) where t23_b is null; select NEW.table_name, NEW.rows_fetched - ORIG.rows_fetched as DIFF from sys.schema_table_statistics NEW, t_initial ORIG where NEW.table_name=ORIG.table_name and NEW.table_schema=ORIG.table_schema and NEW.table_schema=database() having DIFF>0; This shows +------------+------+ | table_name | DIFF | +------------+------+ | t21 | 2 | | t22 | 3 | | t23 | 10 | +------------+------+ and +------------+------+ | table_name | DIFF | +------------+------+ | t21 | 2 | | t22 | 3 | | t23null | 18 | +------------+------+ which shows shortcutting is done. Perhaps it's different from what I expected it to be.

            I cannot approve the fix as it just prohibits using not_exists optimization together with join buffer. I'm assigning this bug to myself for a proper resolution.

            igor Igor Babaev (Inactive) added a comment - I cannot approve the fix as it just prohibits using not_exists optimization together with join buffer. I'm assigning this bug to myself for a proper resolution.

            OK to push

            sanja Oleksandr Byelkin added a comment - OK to push

            The patch for this bug was pushed into 10.3. I should be merged upstream as it is.

            igor Igor Babaev (Inactive) added a comment - The patch for this bug was pushed into 10.3. I should be merged upstream as it is.

            People

              igor Igor Babaev (Inactive)
              yutellite yuweixing
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.