[MDEV-27624] Wrong result for nested left join using not_exists optimization Created: 2022-01-26  Updated: 2023-01-17  Resolved: 2023-01-17

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.4.22, 10.5.13, 10.6.5, 10.7.1, 10.2, 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 10.10
Fix Version/s: 10.11.2, 10.3.38, 10.4.28, 10.5.19, 10.6.12, 10.7.8, 10.8.7, 10.9.5, 10.10.3

Type: Bug Priority: Critical
Reporter: yuweixing Assignee: Igor Babaev
Resolution: Fixed Votes: 0
Labels: 11.0-sel, upstream

Issue Links:
Duplicate
is duplicated by MDEV-27630 CLONE - wrong result when doing left ... Closed
Relates
relates to MDEV-7992 LEFT JOINed result of subquery with L... Closed
relates to MDEV-29125 'Not exists' optimization isn't appli... Closed

 Description   

The following query returns wrong result:

CREATE TABLE t1(
K1 INT PRIMARY KEY,
Name VARCHAR(15)
);
INSERT INTO t1 VALUES
(1,'T1Row1'), (2,'T1Row2');
CREATE TABLE t2(
K2 INT PRIMARY KEY,
K1r INT,
rowTimestamp DATETIME,
Event VARCHAR(15)
);
 
INSERT INTO t2 VALUES
(1, 1, '2015-04-13 10:42:11' ,'T1Row1Event1'),
(2, 1, '2015-04-13 10:42:12' ,'T1Row1Event2'),
(3, 1, '2015-04-13 10:42:12' ,'T1Row1Event3');
 
SELECT t1a.*, t2a.*,
t2i.K2 AS K2B, t2i.K1r AS K1rB,
t2i.rowTimestamp AS rowTimestampB, t2i.Event AS EventB
FROM
t1 t1a JOIN t2 t2a ON t2a.K1r = t1a.K1
LEFT JOIN
( t1 t1i LEFT JOIN t2 t2i ON t2i.K1r = t1i.K1)
ON (t1i.K1 = 1) AND
(((t2i.K1r = t1a.K1 AND t2i.rowTimestamp > t2a.rowTimestamp ) OR
(t2i.rowTimestamp = t2a.rowTimestamp AND t2i.K2 > t2a.K2))
OR (t2i.K2 IS NULL))
WHERE
t2a.K1r = 1 AND t2i.K2 IS NULL;
MariaDB [test]> SELECT t1a.*, t2a.*, t2i.K2 AS K2B, t2i.K1r AS K1rB, t2i.rowTimestamp AS rowTimestampB, t2i.Event AS EventB FROM t1 t1a JOIN t2 t2a ON t2a.K1r = t1a.K1 LEFT JOIN ( t1 t1i LEFT JOIN t2 t2i ON t2i.K1r = t1i.K1) ON (t1i.K1 = 1) AND (((t2i.K1r = t1a.K1 AND t2i.rowTimestamp > t2a.rowTimestamp ) OR (t2i.rowTimestamp = t2a.rowTimestamp AND t2i.K2 > t2a.K2)) OR (t2i.K2 IS NULL)) WHERE t2a.K1r = 1 AND t2i.K2 IS NULL;
+----+--------+----+------+---------------------+--------------+------+------+---------------+--------+
| K1 | Name   | K2 | K1r  | rowTimestamp        | Event        | K2B  | K1rB | rowTimestampB | EventB |
+----+--------+----+------+---------------------+--------------+------+------+---------------+--------+
|  1 | T1Row1 |  3 |    1 | 2015-04-13 10:42:12 | T1Row1Event3 | NULL | NULL | NULL          | NULL   |
+----+--------+----+------+---------------------+--------------+------+------+---------------+--------+
1 row in set (0.000 sec)

it should be:

mysql> SELECT t1a.*, t2a.*,
    -> t2i.K2 AS K2B, t2i.K1r AS K1rB,
    -> t2i.rowTimestamp AS rowTimestampB, t2i.Event AS EventB
    -> FROM
    -> t1 t1a JOIN t2 t2a ON t2a.K1r = t1a.K1
    -> LEFT JOIN
    -> ( t1 t1i LEFT JOIN t2 t2i ON t2i.K1r = t1i.K1)
    -> ON (t1i.K1 = 1) AND
    -> (((t2i.K1r = t1a.K1 AND t2i.rowTimestamp > t2a.rowTimestamp ) OR
    -> (t2i.rowTimestamp = t2a.rowTimestamp AND t2i.K2 > t2a.K2))
    -> OR (t2i.K2 IS NULL))
    -> WHERE
    -> t2a.K1r = 1 AND t2i.K2 IS NULL;
+----+--------+----+------+---------------------+--------------+------+------+---------------+--------+
| K1 | Name   | K2 | K1r  | rowTimestamp        | Event        | K2B  | K1rB | rowTimestampB | EventB |
+----+--------+----+------+---------------------+--------------+------+------+---------------+--------+
|  1 | T1Row1 |  1 |    1 | 2015-04-13 10:42:11 | T1Row1Event1 | NULL | NULL | NULL          | NULL   |
|  1 | T1Row1 |  2 |    1 | 2015-04-13 10:42:12 | T1Row1Event2 | NULL | NULL | NULL          | NULL   |
|  1 | T1Row1 |  3 |    1 | 2015-04-13 10:42:12 | T1Row1Event3 | NULL | NULL | NULL          | NULL   |
+----+--------+----+------+---------------------+--------------+------+------+---------------+--------+
3 rows in set (0.01 sec)



 Comments   
Comment by Alice Sherepa [ 2022-01-26 ]

Mysql behaves inconsistently here, if there is no primary key in tables, then the result is the same as in MariaDb.
MariaDB returns the correct result

Comment by yuweixing [ 2022-01-26 ]

No, I drop the primary key of t1 and t2, you can see the plan has changed to table scan, the result are still 3 rows in MySQL.
And the result comes wright to 3 rows in MariaDB when I drop all the primary keys.

mysql> SELECT t1a.*, t2a.*, t2i.K2 AS K2B, t2i.K1r AS K1rB, t2i.rowTimestamp AS rowTimestampB, t2i.Event AS EventB FROM t1 t1a JOIN t2 t2a ON t2a.K1r = t1a.K1 LEFT JOIN ( t1 t1i LEFT JOIN t2 t2i ON t2i.K
1r = t1i.K1) ON (t1i.K1 = 1) AND (((t2i.K1r = t1a.K1 AND t2i.rowTimestamp > t2a.rowTimestamp ) OR (t2i.rowTimestamp = t2a.rowTimestamp AND t2i.K2 > t2a.K2)) OR (t2i.K2 IS NULL)) WHERE t2a.K1r = 1 AND t2i
.K2 IS desc SELECT t1a.*, t2a.*, t2i.K2 AS K2B, t2i.K1r AS K1rB, t2i.rowTimestamp AS rowTimestampB, t2i.Event AS EventB FROM t1 t1a JOIN t2 t2a ON t2a.K1r = t1a.K1 LEFT JOIN ( t1 t1i LEFT JOIN t2 t2i ON 
t2i.K1r = t1i.K1) ON (t1i.K1 = 1) AND (((t2i.K1r = t1a.K1 AND t2i.rowTimestamp > t2a.rowTimestamp ) OR (t2i.rowTimestamp = t2a.rowTimestamp AND t2i.K2 > t2a.K2)) OR (t2i.K2 IS NULL)) WHERE t2a.K1r = 1 ANN
D t2i.K2 IS NULL;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                                          |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------------------+
|  1 | SIMPLE      | t1a   | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    2 |    50.00 | Using where                                                    |
|  1 | SIMPLE      | t2a   | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    3 |    33.33 | Using where; Using join buffer (Block Nested Loop)             |
|  1 | SIMPLE      | t1i   | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    2 |   100.00 | Using where; Using join buffer (Block Nested Loop)             |
|  1 | SIMPLE      | t2i   | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    3 |    33.33 | Using where; Not exists; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------------------+
4 rows in set, 1 warning (0.00 sec)
 
mysql> SELECT t1a.*, t2a.*, t2i.K2 AS K2B, t2i.K1r AS K1rB, t2i.rowTimestamp AS rowTimestampB, t2i.Event AS EventB FROM t1 t1a JOIN t2 t2a ON t2a.K1r = t1a.K1 LEFT JOIN ( t1 t1i LEFT JOIN t2 t2i ON t2i.K
1r = t1i.K1) ON (t1i.K1 = 1) AND (((t2i.K1r = t1a.K1 AND t2i.rowTimestamp > t2a.rowTimestamp ) OR (t2i.rowTimestamp = t2a.rowTimestamp AND t2i.K2 > t2a.K2)) OR (t2i.K2 IS NULL)) WHERE t2a.K1r = 1 AND t2ii
.K2 IS NULL;     
+----+--------+----+------+---------------------+--------------+------+------+---------------+--------+
| K1 | Name   | K2 | K1r  | rowTimestamp        | Event        | K2B  | K1rB | rowTimestampB | EventB |
+----+--------+----+------+---------------------+--------------+------+------+---------------+--------+
|  1 | T1Row1 |  1 |    1 | 2015-04-13 10:42:11 | T1Row1Event1 | NULL | NULL | NULL          | NULL   |
|  1 | T1Row1 |  2 |    1 | 2015-04-13 10:42:12 | T1Row1Event2 | NULL | NULL | NULL          | NULL   |
|  1 | T1Row1 |  3 |    1 | 2015-04-13 10:42:12 | T1Row1Event3 | NULL | NULL | NULL          | NULL   |
+----+--------+----+------+---------------------+--------------+------+------+---------------+--------+
3 rows in set (0.00 sec)

Comment by yuweixing [ 2022-01-26 ]

MariaDB [test]> alter table t1 drop primary key;
Query OK, 2 rows affected (0.046 sec)              
Records: 2  Duplicates: 0  Warnings: 0
 
MariaDB [test]> alter table t2 drop primary key;
Query OK, 3 rows affected (0.022 sec)              
Records: 3  Duplicates: 0  Warnings: 0
 
MariaDB [test]> desc SELECT t1a.*, t2a.*, t2i.K2 AS K2B, t2i.K1r AS K1rB, t2i.rowTimestamp AS rowTimestampB, t2i.Event AS EventB FROM t1 t1a JOIN t2 t2a ON t2a.K1r = t1a.K1 LEFT JOIN ( t1 t1i LEFT JOIN t2 t2i ON t2i.K1r = t1i.K1) ON (t1i.K1 = 1) AND (((t2i.K1r = t1a.K1 AND t2i.rowTimestamp > t2a.rowTimestamp ) OR (t2i.rowTimestamp = t2a.rowTimestamp AND t2i.K2 > t2a.K2)) OR (t2i.K2 IS NULL)) WHERE t2a.K1r = 1 AND t2i.K2 IS NULL;
+------+-------------+-------+------+---------------+------+---------+------+------+--------------------------------------------------------------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                                                              |
+------+-------------+-------+------+---------------+------+---------+------+------+--------------------------------------------------------------------+
|    1 | SIMPLE      | t1a   | ALL  | NULL          | NULL | NULL    | NULL | 2    | Using where                                                        |
|    1 | SIMPLE      | t2a   | ALL  | NULL          | NULL | NULL    | NULL | 3    | Using where; Using join buffer (flat, BNL join)                    |
|    1 | SIMPLE      | t1i   | ALL  | NULL          | NULL | NULL    | NULL | 2    | Using where; Using join buffer (incremental, BNL join)             |
|    1 | SIMPLE      | t2i   | ALL  | NULL          | NULL | NULL    | NULL | 3    | Using where; Not exists; Using join buffer (incremental, BNL join) |
+------+-------------+-------+------+---------------+------+---------+------+------+--------------------------------------------------------------------+
4 rows in set (0.016 sec)
 
MariaDB [test]> SELECT t1a.*, t2a.*, t2i.K2 AS K2B, t2i.K1r AS K1rB, t2i.rowTimestamp AS rowTimestampB, t2i.Event AS EventB FROM t1 t1a JOIN t2 t2a ON t2a.K1r = t1a.K1 LEFT JOIN ( t1 t1i LEFT JOIN t2 t2i ON t2i.K1r = t1i.K1) ON (t1i.K1 = 1) AND (((t2i.K1r = t1a.K1 AND t2i.rowTimestamp > t2a.rowTimestamp ) OR (t2i.rowTimestamp = t2a.rowTimestamp AND t2i.K2 > t2a.K2)) OR (t2i.K2 IS NULL)) WHERE t2a.K1r = 1 AND t2i.K2 IS NULL;
+----+--------+----+------+---------------------+--------------+------+------+---------------+--------+
| K1 | Name   | K2 | K1r  | rowTimestamp        | Event        | K2B  | K1rB | rowTimestampB | EventB |
+----+--------+----+------+---------------------+--------------+------+------+---------------+--------+
|  1 | T1Row1 |  1 |    1 | 2015-04-13 10:42:11 | T1Row1Event1 | NULL | NULL | NULL          | NULL   |
|  1 | T1Row1 |  2 |    1 | 2015-04-13 10:42:12 | T1Row1Event2 | NULL | NULL | NULL          | NULL   |
|  1 | T1Row1 |  3 |    1 | 2015-04-13 10:42:12 | T1Row1Event3 | NULL | NULL | NULL          | NULL   |
+----+--------+----+------+---------------------+--------------+------+------+---------------+--------+
3 rows in set (0.010 sec)
 
MariaDB [test]> 

Comment by yuweixing [ 2022-01-26 ]

MariaDB [test]> alter table t1 add primary key(k1);
Query OK, 0 rows affected (0.023 sec)
Records: 0  Duplicates: 0  Warnings: 0
 
MariaDB [test]> alter table t2 add primary key(k2);
Query OK, 0 rows affected (0.007 sec)
Records: 0  Duplicates: 0  Warnings: 0
 
MariaDB [test]> desc SELECT t1a.*, t2a.*, t2i.K2 AS K2B, t2i.K1r AS K1rB, t2i.rowTimestamp AS rowTimestampB, t2i.Event AS EventB FROM t1 t1a JOIN t2 t2a ON t2a.K1r = t1a.K1 LEFT JOIN ( t1 t1i LEFT JOIN t2 t2i ON t2i.K1r = t1i.K1) ON (t1i.K1 = 1) AND (((t2i.K1r = t1a.K1 AND t2i.rowTimestamp > t2a.rowTimestamp ) OR (t2i.rowTimestamp = t2a.rowTimestamp AND t2i.K2 > t2a.K2)) OR (t2i.K2 IS NULL)) WHERE t2a.K1r = 1 AND t2i.K2 IS NULL;
+------+-------------+-------+-------+---------------+---------+---------+-------+------+-------------------------+
| id   | select_type | table | type  | possible_keys | key     | key_len | ref   | rows | Extra                   |
+------+-------------+-------+-------+---------------+---------+---------+-------+------+-------------------------+
|    1 | SIMPLE      | t1a   | const | PRIMARY       | PRIMARY | 4       | const | 1    |                         |
|    1 | SIMPLE      | t2a   | ALL   | NULL          | NULL    | NULL    | NULL  | 3    | Using where             |
|    1 | SIMPLE      | t1i   | const | PRIMARY       | PRIMARY | 4       | const | 1    | Using index             |
|    1 | SIMPLE      | t2i   | ALL   | NULL          | NULL    | NULL    | NULL  | 3    | Using where; Not exists |
+------+-------------+-------+-------+---------------+---------+---------+-------+------+-------------------------+
4 rows in set (0.001 sec)
 
MariaDB [test]> SELECT t1a.*, t2a.*, t2i.K2 AS K2B, t2i.K1r AS K1rB, t2i.rowTimestamp AS rowTimestampB, t2i.Event AS EventB FROM t1 t1a JOIN t2 t2a ON t2a.K1r = t1a.K1 LEFT JOIN ( t1 t1i LEFT JOIN t2 t2i ON t2i.K1r = t1i.K1) ON (t1i.K1 = 1) AND (((t2i.K1r = t1a.K1 AND t2i.rowTimestamp > t2a.rowTimestamp ) OR (t2i.rowTimestamp = t2a.rowTimestamp AND t2i.K2 > t2a.K2)) OR (t2i.K2 IS NULL)) WHERE t2a.K1r = 1 AND t2i.K2 IS NULL;
+----+--------+----+------+---------------------+--------------+------+------+---------------+--------+
| K1 | Name   | K2 | K1r  | rowTimestamp        | Event        | K2B  | K1rB | rowTimestampB | EventB |
+----+--------+----+------+---------------------+--------------+------+------+---------------+--------+
|  1 | T1Row1 |  3 |    1 | 2015-04-13 10:42:12 | T1Row1Event3 | NULL | NULL | NULL          | NULL   |
+----+--------+----+------+---------------------+--------------+------+------+---------------+--------+
1 row in set (0.001 sec)

Comment by Alice Sherepa [ 2022-01-26 ]

yes, you are right, there is a bug, when k2 column in table t2 is defined as not null (or has primary key)
Mysql I tried only in db fiddle, https://www.db-fiddle.com/f/3KnfUJ81K23cCgpUvD8Fbb/0 (8.0.12, 5.7.22), it seems they have the same bug
(the correct result should be 1 row)

CREATE TABLE t1(k1 INT, name varchar(15)) engine=innodb;
INSERT INTO t1 VALUES (1, 'T1Row1'), (2, 'T1Row2');
 
CREATE TABLE t2(k2 INT not null, k1r INT, rowtimestamp datetime, event varchar(15)) engine=innodb;
INSERT INTO t2 VALUES
(1, 1, '2015-04-13 10:42:11' ,'T1Row1Event1'),
(2, 1, '2015-04-13 10:42:12' ,'T1Row1Event2'),
(3, 1, '2015-04-13 10:42:12' ,'T1Row1Event3');
 
MariaDB [test]> SELECT sql_no_cache t1.*,        t2.*,        t2i.k2 AS k2b,        t2i.k1r AS k1rb,        t2i.rowtimestamp AS rowtimestampb,        t2i.event AS eventb FROM t1 JOIN t2 ON t2.k1r = t1.k1 LEFT JOIN t1 t1i  LEFT JOIN t2 t2i ON t2i.k1r = t1i.k1  ON (t1i.k1 = 1) AND  (((t2i.k1r = t1.k1 AND t2i.rowtimestamp > t2.rowtimestamp) OR (t2i.rowtimestamp = t2.rowtimestamp AND t2i.k2 > t2.k2)) OR (t2i.k2 IS NULL)) WHERE t2.k1r = 1 AND t2i.k2 IS NULL;
+------+--------+----+------+---------------------+--------------+------+------+---------------+--------+
| k1   | name   | k2 | k1r  | rowtimestamp        | event        | k2b  | k1rb | rowtimestampb | eventb |
+------+--------+----+------+---------------------+--------------+------+------+---------------+--------+
|    1 | T1Row1 |  1 |    1 | 2015-04-13 10:42:11 | T1Row1Event1 | NULL | NULL | NULL          | NULL   |
|    1 | T1Row1 |  2 |    1 | 2015-04-13 10:42:12 | T1Row1Event2 | NULL | NULL | NULL          | NULL   |
|    1 | T1Row1 |  3 |    1 | 2015-04-13 10:42:12 | T1Row1Event3 | NULL | NULL | NULL          | NULL   |
+------+--------+----+------+---------------------+--------------+------+------+---------------+--------+
3 rows in set (0.00 sec)
 
MariaDB [test]> explain extended SELECT sql_no_cache t1.*,        t2.*,        t2i.k2 AS k2b,        t2i.k1r AS k1rb,        t2i.rowtimestamp AS rowtimestampb,        t2i.event AS eventb FROM t1 JOIN t2 ON t2.k1r = t1.k1 LEFT JOIN t1 t1i  LEFT JOIN t2 t2i ON t2i.k1r = t1i.k1  ON (t1i.k1 = 1) AND  (((t2i.k1r = t1.k1 AND t2i.rowtimestamp > t2.rowtimestamp) OR (t2i.rowtimestamp = t2.rowtimestamp AND t2i.k2 > t2.k2)) OR (t2i.k2 IS NULL)) WHERE t2.k1r = 1 AND t2i.k2 IS NULL;
+------+-------------+-------+------+---------------+------+---------+------+------+----------+--------------------------------------------------------------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                                                              |
+------+-------------+-------+------+---------------+------+---------+------+------+----------+--------------------------------------------------------------------+
|    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL |    2 |   100.00 | Using where                                                        |
|    1 | SIMPLE      | t2    | ALL  | NULL          | NULL | NULL    | NULL |    3 |   100.00 | Using where; Using join buffer (flat, BNL join)                    |
|    1 | SIMPLE      | t1i   | ALL  | NULL          | NULL | NULL    | NULL |    2 |   100.00 | Using where; Using join buffer (incremental, BNL join)             |
|    1 | SIMPLE      | t2i   | ALL  | NULL          | NULL | NULL    | NULL |    3 |   100.00 | Using where; Not exists; Using join buffer (incremental, BNL join) |
+------+-------------+-------+------+---------------+------+---------+------+------+----------+--------------------------------------------------------------------+
4 rows in set, 1 warning (0.00 sec)
 
Note (Code 1003): select sql_no_cache `test`.`t1`.`k1` AS `k1`,`test`.`t1`.`name` AS `name`,`test`.`t2`.`k2` AS `k2`,`test`.`t2`.`k1r` AS `k1r`,`test`.`t2`.`rowtimestamp` AS `rowtimestamp`,`test`.`t2`.`event` AS `event`,`test`.`t2i`.`k2` AS `k2b`,`test`.`t2i`.`k1r` AS `k1rb`,`test`.`t2i`.`rowtimestamp` AS `rowtimestampb`,`test`.`t2i`.`event` AS `eventb` from `test`.`t1` join `test`.`t2` left join (`test`.`t1` `t1i` left join `test`.`t2` `t2i` on(`test`.`t2i`.`k1r` = 1)) on(`test`.`t1i`.`k1` = 1 and (`test`.`t2i`.`k1r` = 1 and `test`.`t2i`.`rowtimestamp` > `test`.`t2`.`rowtimestamp` or `test`.`t2i`.`rowtimestamp` = `test`.`t2`.`rowtimestamp` and `test`.`t2i`.`k2` > `test`.`t2`.`k2` or `test`.`t2i`.`k2` is null)) where `test`.`t1`.`k1` = 1 and `test`.`t2`.`k1r` = 1 and `test`.`t2i`.`k2` is null
 
MariaDB [test]> alter table t2 modify column k2 int;
Query OK, 0 rows affected (0.12 sec)
Records: 0  Duplicates: 0  Warnings: 0
 
MariaDB [test]> SELECT  t1.*,        t2.*,        t2i.k2 AS k2b,        t2i.k1r AS k1rb,        t2i.rowtimestamp AS rowtimestampb,        t2i.event AS eventb FROM t1 JOIN t2 ON t2.k1r = t1.k1 LEFT JOIN t1 t1i  LEFT JOIN t2 t2i ON t2i.k1r = t1i.k1  ON (t1i.k1 = 1) AND  (((t2i.k1r = t1.k1 AND t2i.rowtimestamp > t2.rowtimestamp) OR (t2i.rowtimestamp = t2.rowtimestamp AND t2i.k2 > t2.k2)) OR (t2i.k2 IS NULL)) WHERE t2.k1r = 1 AND t2i.k2 IS NULL;
+------+--------+------+------+---------------------+--------------+------+------+---------------+--------+
| k1   | name   | k2   | k1r  | rowtimestamp        | event        | k2b  | k1rb | rowtimestampb | eventb |
+------+--------+------+------+---------------------+--------------+------+------+---------------+--------+
|    1 | T1Row1 |    3 |    1 | 2015-04-13 10:42:12 | T1Row1Event3 | NULL | NULL | NULL          | NULL   |
+------+--------+------+------+---------------------+--------------+------+------+---------------+--------+
1 row in set (0.00 sec)

Comment by Oleg Smirnov [ 2022-06-08 ]

This case is very similar to MDEV-7992: the query and sample data are the same but the tables structure differ. MDEV-7992 tables have PRIMARY KEYs while these task's tables do not.

/* MDEV-7992 */
CREATE TABLE T1(
  K1 INT PRIMARY KEY,
  Name VARCHAR(15)
) ENGINE=InnoDB;
 
CREATE TABLE T2(
  K2 INT PRIMARY KEY,
  K1r INT,
  rowTimestamp DATETIME,
  Event VARCHAR(15)
) ENGINE=InnoDB;

/* MDEV-27624 */
CREATE TABLE T1(
  K1 INT NOT NULL,
  Name VARCHAR(15)
) ENGINE=InnoDB;
 
CREATE TABLE T2(
  K2 INT NOT NULL,
  K1r INT,
  rowTimestamp DATETIME,
  Event VARCHAR(15)
) ENGINE=InnoDB;

That's why execution plans differ:

/* MDEV-7992 */
MariaDB [mdev7992]> EXPLAIN
    -> SELECT t1.*, t2.*, t1i.*, t2i.*
    -> FROM ((t1 JOIN t2 ON (t2.k1r = t1.k1))
    ->     LEFT JOIN (t1 t1i LEFT JOIN t2 t2i ON (t2i.k1r = t1i.k1))
    ->         ON (
    ->           t1i.k1 = 1 AND (
    ->             t2i.k1r = t1.k1 AND t2i.rowtimestamp > t2.rowtimestamp OR t2i.rowtimestamp = t2.rowtimestamp
    ->             AND t2i.k2 > t2.k2 OR t2i.k2 is NULL
    ->           )
    ->         )
    ->      )
    -> WHERE t2.k1r = 1 AND t2i.k2 is NULL;
+------+-------------+-------+-------+---------------+---------+---------+-------+------+-------------------------+
| id   | select_type | table | type  | possible_keys | key     | key_len | ref   | rows | Extra                   |
+------+-------------+-------+-------+---------------+---------+---------+-------+------+-------------------------+
|    1 | SIMPLE      | t1    | const | PRIMARY       | PRIMARY | 4       | const | 1    |                         |
|    1 | SIMPLE      | t2    | ALL   | NULL          | NULL    | NULL    | NULL  | 3    | Using where             |
|    1 | SIMPLE      | t1i   | const | PRIMARY       | PRIMARY | 4       | const | 1    |                         |
|    1 | SIMPLE      | t2i   | ALL   | NULL          | NULL    | NULL    | NULL  | 3    | Using where; Not exists |
+------+-------------+-------+-------+---------------+---------+---------+-------+------+-------------------------+

/* MDEV-27624 */
MariaDB [test]> EXPLAIN
    -> SELECT t1.*, t2.*, t1i.*, t2i.*
    -> FROM ((t1 JOIN t2 ON (t2.k1r = t1.k1))
    ->     LEFT JOIN (t1 t1i LEFT JOIN t2 t2i ON (t2i.k1r = t1i.k1))
    ->         ON (
    ->           t1i.k1 = 1 AND (
    ->             t2i.k1r = t1.k1 AND t2i.rowtimestamp > t2.rowtimestamp OR t2i.rowtimestamp = t2.rowtimestamp
    ->             AND t2i.k2 > t2.k2 OR t2i.k2 is NULL
    ->           )
    ->         )
    ->      )
    -> WHERE t2.k1r = 1 AND t2i.k2 is NULL;
+------+-------------+-------+------+---------------+------+---------+------+------+--------------------------------------------------------------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                                                              |
+------+-------------+-------+------+---------------+------+---------+------+------+--------------------------------------------------------------------+
|    1 | SIMPLE      | t1    | ALL  | NULL          | NULL | NULL    | NULL | 1    | Using where                                                        |
|    1 | SIMPLE      | t2    | ALL  | NULL          | NULL | NULL    | NULL | 3    | Using where; Using join buffer (flat, BNL join)                    |
|    1 | SIMPLE      | t1i   | ALL  | NULL          | NULL | NULL    | NULL | 1    | Using where; Using join buffer (incremental, BNL join)             |
|    1 | SIMPLE      | t2i   | ALL  | NULL          | NULL | NULL    | NULL | 3    | Using where; Not exists; Using join buffer (incremental, BNL join) |
+------+-------------+-------+------+---------------+------+---------+------+------+--------------------------------------------------------------------+

MDEV-7992 has added the following changes with commit that restricts applying "not_exists" optimization in this case:

static enum_nested_loop_state
evaluate_join_record(JOIN *join, JOIN_TAB *join_tab,
                     int error)
...
       for (JOIN_TAB *tab= first_unmatched; tab <= join_tab; tab++)
       {
+        /*
+          Check whether 'not exists' optimization can be used here.
+          If  tab->table->reginfo.not_exists_optimize is set to true
+          then WHERE contains a conjunctive predicate IS NULL over
+          a non-nullable field of tab. When activated this predicate
+          will filter out all records with matches for the left part
+          of the outer join whose inner tables start from the
+          first_unmatched table and include table tab. To safely use
+          'not exists' optimization we have to check that the
+          IS NULL predicate is really activated, i.e. all guards
+          that wrap it are in the 'open' state.
+       */
+       bool not_exists_opt_is_applicable=
+               tab->table->reginfo.not_exists_optimize;
+       for (JOIN_TAB *first_upper= first_unmatched->first_upper;
+             not_exists_opt_is_applicable && first_upper;
+             first_upper= first_upper->first_upper)
+        {
+          if (!first_upper->found)
+            not_exists_opt_is_applicable= false;
+        }

I suggest this could be a solution for this task as well but the problem is the server doesn't hit this code when executing query for tables without primary keys.

Comment by Oleg Smirnov [ 2022-06-09 ]

The bug doesn't reveal itself with join_cache_level=0 and 1 (flat BLN buffers), only starting from join_cache_level=2 (incremental BNL buffers)

Comment by Oleg Smirnov [ 2022-07-15 ]

Minimal example (unnecessary fields and conditions removed).

CREATE TABLE t1 (a INT NOT NULL, b INT, c INT);
INSERT INTO t1 VALUES (1,1,1), (1,2,1), (1,3,1);
 
CREATE TABLE t2(a INT NOT NULL);
INSERT INTO t2 VALUES (1), (2);
 
CREATE TABLE t3(a INT not null, b INT);
INSERT INTO t3 VALUES (1, 1), (2, 1), (3, 1);
 
SELECT *
  FROM (t1 LEFT JOIN (t2 LEFT JOIN t3 ON t2.a = t3.b)
             ON (t2.a = 1 AND (
                 t3.b = t1.a AND t3.a > t1.b OR t3.a is NULL)
         )
      )
WHERE t1.c = 1 AND t3.a is NULL;

Comment by Oleg Smirnov [ 2022-07-20 ]

An attempt to construct a dataset to demonstrate a non-legitimate usage of 'not exists'.

CREATE TABLE t1 (a INT NOT NULL, b INT NOT NULL);
CREATE TABLE t2 (a INT NOT NULL, b INT NOT NULL);
CREATE TABLE t3 (a INT NOT NULL, b INT NOT NULL);
 
INSERT INTO t1 VALUES (1,1), (2,1);
INSERT INTO t2 VALUES (1,1), (1,2);
INSERT INTO t3 VALUES (1,2), (2,1);
 
SET join_cache_level=0;

We expect the 2nd line of this dataset will be skipped if condition t2.a IS NULL will be applied due to the 'not exists' optimization:

MariaDB [mdev27624_2]> SELECT * FROM t1 LEFT JOIN t2 ON t1.a=t2.a;
+---+---+------+------+
| a | b | a    | b    |
+---+---+------+------+
| 1 | 1 |    1 |    1 |
| 1 | 1 |    1 |    2 |
| 2 | 1 | NULL | NULL |
+---+---+------+------+

So the t3 row

{1,2}

will not match the ON condition and will be NULL-complemented:

MariaDB [mdev27624_2]> SELECT * FROM t3 LEFT JOIN (t1 LEFT JOIN t2 ON t1.a=t2.a)
    ->   ON t3.a = t2.a AND t3.b=t2.b
    ->    WHERE t2.a IS NULL;
+---+---+------+------+------+------+
| a | b | a    | b    | a    | b    |
+---+---+------+------+------+------+
| 2 | 1 | NULL | NULL | NULL | NULL |
+---+---+------+------+------+------+

But this does not happen, the result is correct.

Comment by Sergei Petrunia [ 2022-07-20 ]

It seems my yesterday statement that Not Exists doesn't shortcut anything for nested outer joins was incorrect.

Here's an example:

set join_cache_level=0;
create table t21(t21_a int);
insert into t21 values (1);
 
create table t22 (t22_a int, t22_b int);
insert into t22 values (1,10);
insert into t22 values (1,11);
 
create table t23 (t23_b int not null, t23_c int not null);
insert into t23 values
(10, 100),
(10, 101),
(10, 102),
(10, 103),
(10, 104),
(10, 105),
(10, 106),
(10, 107),
(10, 108);
 
 
create table t23null(t23_b int, t23_c int);
insert into t23null select * from t23;
 
create or replace temporary table t_initial as select table_schema, table_name, rows_fetched from sys.schema_table_statistics;
 
select *  from       t21 left join      (t22 left join t23 on t22_b=t23_b)     on (t21_a= t22_a)  where    t23_b is null;
 
select NEW.table_name, NEW.rows_fetched - ORIG.rows_fetched as DIFF 
from sys.schema_table_statistics NEW, t_initial ORIG 
where NEW.table_name=ORIG.table_name and NEW.table_schema=ORIG.table_schema and NEW.table_schema=database() having DIFF>0;
 
create or replace temporary table t_initial as select table_schema, table_name, rows_fetched from sys.schema_table_statistics;
select *  from       t21 left join      (t22 left join t23null on t22_b=t23_b)     on (t21_a= t22_a)  where    t23_b is null;
select NEW.table_name, NEW.rows_fetched - ORIG.rows_fetched as DIFF
from sys.schema_table_statistics NEW, t_initial ORIG 
where NEW.table_name=ORIG.table_name and NEW.table_schema=ORIG.table_schema and NEW.table_schema=database() having DIFF>0;

This shows

+------------+------+
| table_name | DIFF |
+------------+------+
| t21        |    2 |
| t22        |    3 |
| t23        |   10 |
+------------+------+

and

+------------+------+
| table_name | DIFF |
+------------+------+
| t21        |    2 |
| t22        |    3 |
| t23null    |   18 |
+------------+------+

which shows shortcutting is done. Perhaps it's different from what I expected it to be.

Comment by Igor Babaev [ 2022-10-20 ]

I cannot approve the fix as it just prohibits using not_exists optimization together with join buffer. I'm assigning this bug to myself for a proper resolution.

Comment by Oleksandr Byelkin [ 2023-01-12 ]

OK to push

Comment by Igor Babaev [ 2023-01-17 ]

The patch for this bug was pushed into 10.3. I should be merged upstream as it is.

Generated at Thu Feb 08 09:54:20 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.