A table T1 is inner joined to a table T2 containing multiple timestamped rows with a FK reference to T1. Left joined to this is a subquery returning rows of T2 referenceing the same row of T1 but with newer timestamps or with identical timestamps and larger integer primary keys.
Now a filter is added to the where clause:
and K2B IS NULL
and instead of returning just the last row of the above result set the results are as follows:
This behavior is not exhibited by MySql 5.5 on the same test data and queries.
A detailed procedure for reproducing the issue is attached.
The actual query is:
SELECT
t1a.*,
t2a.*,
t2b.K2 as K2B,
t2b.K1r as K1rB,
t2b.rowTimestamp as rowTimestampB,
t2b.Event as EventB
FROM
T1 as t1a
JOIN T2 as t2a
ON t2a.K1r = t1a.K1
LEFTJOIN (
SELECT
t2i.*
FROM
T1 as t1i
LEFTJOIN T2 as t2i
ON t2i.K1r = t1i.K1
WHERE
t1i.K1 = 1
and t2i.K2 ISNOTNULL
) as t2b
ON t2b.K1r = t1a.K1
AND t2b.rowTimestamp > t2a.rowTimestamp
OR (t2b.rowTimestamp = t2a.rowTimestamp AND t2b.K2 > t2a.K2)
WHERE
t1a.K1 = 1
and t2b.K2 ISNULL
The subquery used does appear peculiar to me:
SELECT
t2i.*
FROM
T1 as t1i
LEFTJOIN T2 as t2i
ON t2i.K1r = t1i.K1
WHERE
t1i.K1 = 1
and t2i.K2 ISNOTNULL
in that it uses a left join and a subsequent filter to verify non-null joined data. A more natural form might be:
SELECT
t2i.*
FROM
T1 as t1i
JOIN T2 as t2i
ON t2i.K1r = t1i.K1
WHERE
t1i.K1 = 1
I mention this because, in fact, using the second form of the subquery eliminates the unexpected behavior, a fact which may help in the diagnosis. However, I am testing MariaDB as a "drop-in replacement" for an existing MySql deployment and I don't have the option of modifying the SQL produced by the application.
I apologize for the complexity of the test query. It was the simplest form I could find that demonstrated the behavior exhibited by the problematic application queries.
I've added MDEV-12099 "LEFT JOIN merged from view with null rejected WHERE is not converted into INNER JOIN" to fix the above problem
Yet without the transformation of the embedding LEFT JOIN into an INNER JOIN the server must return the
right result set.
To factor out this problem I've changed the testcase removed the predicate (t2i.K2 IS NOT NULL) from the WHERE condition of the view v1 and ORed the predicate (t2b.K2 IS NULL) to the ON condition of the query
CREATE VIEW v1 AS
SELECT t2i.*
FROM t1 as t1i LEFT JOIN t2 as t2i ON t2i.K1r = t1i.K1
WHERE t1i.K1 = 1 ;
SELECT
t1a.*, t2a.*, t2b.K2 as K2B, t2b.K1r as K1rB,
t2b.rowTimestamp as rowTimestampB, t2b.Event as EventB
FROM
t1 as t1a JOIN t2 as t2a ON t2a.K1r = t1a.K1
LEFT JOIN
v1 as t2b
ON ((t2b.K1r = t1a.K1 AND t2b.rowTimestamp > t2a.rowTimestamp) OR
(t2b.rowTimestamp = t2a.rowTimestamp AND t2b.K2 > t2a.K2))
OR (t2b.K2 IS NULL)
WHERE
t1a.K1 = 1 AND
t2b.K2 IS NULL;
The result of the manual merge of v1 into the main query now looks like this:
SELECT t1a.*, t2a.*,
t2i.K2 AS K2B, t2i.K1r AS K1rB,
t2i.rowTimestamp AS rowTimestampB, t2i.Event AS EventB
FROM
t1 t1a JOIN t2 t2a ON t2a.K1r = t1a.K1
LEFT JOIN
( t1 t1i LEFT JOIN t2 t2i ON t2i.K1r = t1i.K1)
ON (t1i.K1 = 1) AND
(((t2i.K1r = t1a.K1 AND t2i.rowTimestamp > t2a.rowTimestamp ) OR
(t2i.rowTimestamp = t2a.rowTimestamp AND t2i.K2 > t2a.K2))
OR (t2i.K2 IS NULL))
WHERE
t2a.K1r = 1 AND t2i.K2 IS NULL;
Now the optimizer does not convert the embedded OUTER JOIN into INNER JOIN for
both queries:
MariaDB [test]> explain extended
-> SELECT
-> t1a.*, t2a.*, t2b.K2 as K2B, t2b.K1r as K1rB,
-> t2b.rowTimestamp as rowTimestampB, t2b.Event as EventB
-> FROM
-> t1 as t1a JOIN t2 as t2a ON t2a.K1r = t1a.K1
-> LEFT JOIN
-> v1 as t2b
-> ON ((t2b.K1r = t1a.K1 AND t2b.rowTimestamp > t2a.rowTimestamp) OR
-> (t2b.rowTimestamp = t2a.rowTimestamp AND t2b.K2 > t2a.K2))
| Note | 1003 | select '1' AS `K1`,'T1Row1' AS `Name`,`test`.`t2a`.`K2` AS `K2`,`test`.`t2a`.`K1r` AS `K1r`,`test`.`t2a`.`rowTimestamp` AS `rowTimestamp`,`test`.`t2a`.`Event` AS `Event`,`test`.`t2i`.`K2` AS `K2B`,`test`.`t2i`.`K1r` AS `K1rB`,`test`.`t2i`.`rowTimestamp` AS `rowTimestampB`,`test`.`t2i`.`Event` AS `EventB` from `test`.`t1` `t1a` join `test`.`t2` `t2a` left join (`test`.`t1` `t1i` left join `test`.`t2` `t2i` on(((`test`.`t1i`.`K1` = 1) and (`test`.`t2i`.`K1r` = 1)))) on(((((`test`.`t2i`.`rowTimestamp` > `test`.`t2a`.`rowTimestamp`) and ((`test`.`t2a`.`K1r` = 1) and (`test`.`t2i`.`K1r` = 1))) or ((`test`.`t2i`.`K2` > `test`.`t2a`.`K2`) and (`test`.`t2i`.`rowTimestamp` = `test`.`t2a`.`rowTimestamp`)) or isnull(`test`.`t2i`.`K2`)) and (`test`.`t1i`.`K1` = 1))) where ((`test`.`t2a`.`K1r` = 1) and isnull(`test`.`t2i`.`K2`)) |
| Note | 1003 | select '1' AS `K1`,'T1Row1' AS `Name`,`test`.`t2a`.`K2` AS `K2`,`test`.`t2a`.`K1r` AS `K1r`,`test`.`t2a`.`rowTimestamp` AS `rowTimestamp`,`test`.`t2a`.`Event` AS `Event`,`test`.`t2i`.`K2` AS `K2B`,`test`.`t2i`.`K1r` AS `K1rB`,`test`.`t2i`.`rowTimestamp` AS `rowTimestampB`,`test`.`t2i`.`Event` AS `EventB` from `test`.`t1` `t1a` join `test`.`t2` `t2a` left join (`test`.`t1` `t1i` left join `test`.`t2` `t2i` on(((`test`.`t1i`.`K1` = 1) and (`test`.`t2i`.`K1r` = 1)))) on(((((`test`.`t2i`.`rowTimestamp` > `test`.`t2a`.`rowTimestamp`) and ((`test`.`t2a`.`K1r` = 1) and (`test`.`t2i`.`K1r` = 1))) or ((`test`.`t2i`.`K2` > `test`.`t2a`.`K2`) and (`test`.`t2i`.`rowTimestamp` = `test`.`t2a`.`rowTimestamp`)) or isnull(`test`.`t2i`.`K2`)) and (`test`.`t1i`.`K1` = 1))) where ((`test`.`t2a`.`K1r` = 1) and isnull(`test`.`t2i`.`K2`)) |
(All results in these comments I've got on maridb-5.2. For all other version we get similar outputs).
Igor Babaev (Inactive)
added a comment - - edited I've added MDEV-12099 "LEFT JOIN merged from view with null rejected WHERE is not converted into INNER JOIN" to fix the above problem
Yet without the transformation of the embedding LEFT JOIN into an INNER JOIN the server must return the
right result set.
To factor out this problem I've changed the testcase removed the predicate (t2i.K2 IS NOT NULL) from the WHERE condition of the view v1 and ORed the predicate (t2b.K2 IS NULL) to the ON condition of the query
CREATE VIEW v1 AS
SELECT t2i.*
FROM t1 as t1i LEFT JOIN t2 as t2i ON t2i.K1r = t1i.K1
WHERE t1i.K1 = 1 ;
SELECT
t1a.*, t2a.*, t2b.K2 as K2B, t2b.K1r as K1rB,
t2b.rowTimestamp as rowTimestampB, t2b.Event as EventB
FROM
t1 as t1a JOIN t2 as t2a ON t2a.K1r = t1a.K1
LEFT JOIN
v1 as t2b
ON ((t2b.K1r = t1a.K1 AND t2b.rowTimestamp > t2a.rowTimestamp) OR
(t2b.rowTimestamp = t2a.rowTimestamp AND t2b.K2 > t2a.K2))
OR (t2b.K2 IS NULL)
WHERE
t1a.K1 = 1 AND
t2b.K2 IS NULL;
The result of the manual merge of v1 into the main query now looks like this:
SELECT t1a.*, t2a.*,
t2i.K2 AS K2B, t2i.K1r AS K1rB,
t2i.rowTimestamp AS rowTimestampB, t2i.Event AS EventB
FROM
t1 t1a JOIN t2 t2a ON t2a.K1r = t1a.K1
LEFT JOIN
( t1 t1i LEFT JOIN t2 t2i ON t2i.K1r = t1i.K1)
ON (t1i.K1 = 1) AND
(((t2i.K1r = t1a.K1 AND t2i.rowTimestamp > t2a.rowTimestamp ) OR
(t2i.rowTimestamp = t2a.rowTimestamp AND t2i.K2 > t2a.K2))
OR (t2i.K2 IS NULL))
WHERE
t2a.K1r = 1 AND t2i.K2 IS NULL;
Now the optimizer does not convert the embedded OUTER JOIN into INNER JOIN for
both queries:
MariaDB [test]> explain extended
-> SELECT
-> t1a.*, t2a.*, t2b.K2 as K2B, t2b.K1r as K1rB,
-> t2b.rowTimestamp as rowTimestampB, t2b.Event as EventB
-> FROM
-> t1 as t1a JOIN t2 as t2a ON t2a.K1r = t1a.K1
-> LEFT JOIN
-> v1 as t2b
-> ON ((t2b.K1r = t1a.K1 AND t2b.rowTimestamp > t2a.rowTimestamp) OR
-> (t2b.rowTimestamp = t2a.rowTimestamp AND t2b.K2 > t2a.K2))
-> OR (t2b.K2 IS NULL)
-> WHERE
-> t1a.K1 = 1 AND
-> t2b.K2 IS NULL;
+----+-------------+-------+-------+---------------+---------+---------+-------+------+----------+-------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+----------+-------------------------+
| 1 | SIMPLE | t1a | const | PRIMARY | PRIMARY | 4 | const | 1 | 100.00 | |
| 1 | SIMPLE | t2a | ALL | NULL | NULL | NULL | NULL | 3 | 100.00 | Using where |
| 1 | SIMPLE | t1i | const | PRIMARY | PRIMARY | 4 | const | 1 | 100.00 | Using index |
| 1 | SIMPLE | t2i | ALL | NULL | NULL | NULL | NULL | 3 | 100.00 | Using where; Not exists |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+----------+-------------------------+
MariaDB [test]> show warnings;
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | select '1' AS `K1`,'T1Row1' AS `Name`,`test`.`t2a`.`K2` AS `K2`,`test`.`t2a`.`K1r` AS `K1r`,`test`.`t2a`.`rowTimestamp` AS `rowTimestamp`,`test`.`t2a`.`Event` AS `Event`,`test`.`t2i`.`K2` AS `K2B`,`test`.`t2i`.`K1r` AS `K1rB`,`test`.`t2i`.`rowTimestamp` AS `rowTimestampB`,`test`.`t2i`.`Event` AS `EventB` from `test`.`t1` `t1a` join `test`.`t2` `t2a` left join (`test`.`t1` `t1i` left join `test`.`t2` `t2i` on(((`test`.`t1i`.`K1` = 1) and (`test`.`t2i`.`K1r` = 1)))) on(((((`test`.`t2i`.`rowTimestamp` > `test`.`t2a`.`rowTimestamp`) and ((`test`.`t2a`.`K1r` = 1) and (`test`.`t2i`.`K1r` = 1))) or ((`test`.`t2i`.`K2` > `test`.`t2a`.`K2`) and (`test`.`t2i`.`rowTimestamp` = `test`.`t2a`.`rowTimestamp`)) or isnull(`test`.`t2i`.`K2`)) and (`test`.`t1i`.`K1` = 1))) where ((`test`.`t2a`.`K1r` = 1) and isnull(`test`.`t2i`.`K2`)) |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
MariaDB [test]> explain extended
-> SELECT t1a.*, t2a.*,
-> t2i.K2 AS K2B, t2i.K1r AS K1rB,
-> t2i.rowTimestamp AS rowTimestampB, t2i.Event AS EventB
-> FROM
-> t1 t1a JOIN t2 t2a ON t2a.K1r = t1a.K1
-> LEFT JOIN
-> ( t1 t1i LEFT JOIN t2 t2i ON t2i.K1r = t1i.K1)
-> ON (t1i.K1 = 1) AND
-> (((t2i.K1r = t1a.K1 AND t2i.rowTimestamp > t2a.rowTimestamp ) OR
-> (t2i.rowTimestamp = t2a.rowTimestamp AND t2i.K2 > t2a.K2))
-> OR (t2i.K2 IS NULL))
-> WHERE
-> t2a.K1r = 1 AND t2i.K2 IS NULL;
+----+-------------+-------+-------+---------------+---------+---------+-------+------+----------+-------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+----------+-------------------------+
| 1 | SIMPLE | t1a | const | PRIMARY | PRIMARY | 4 | const | 1 | 100.00 | |
| 1 | SIMPLE | t2a | ALL | NULL | NULL | NULL | NULL | 3 | 100.00 | Using where |
| 1 | SIMPLE | t1i | const | PRIMARY | PRIMARY | 4 | const | 1 | 100.00 | Using index |
| 1 | SIMPLE | t2i | ALL | NULL | NULL | NULL | NULL | 3 | 100.00 | Using where; Not exists |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+----------+-------------------------+
MariaDB [test]> show warnings;
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | select '1' AS `K1`,'T1Row1' AS `Name`,`test`.`t2a`.`K2` AS `K2`,`test`.`t2a`.`K1r` AS `K1r`,`test`.`t2a`.`rowTimestamp` AS `rowTimestamp`,`test`.`t2a`.`Event` AS `Event`,`test`.`t2i`.`K2` AS `K2B`,`test`.`t2i`.`K1r` AS `K1rB`,`test`.`t2i`.`rowTimestamp` AS `rowTimestampB`,`test`.`t2i`.`Event` AS `EventB` from `test`.`t1` `t1a` join `test`.`t2` `t2a` left join (`test`.`t1` `t1i` left join `test`.`t2` `t2i` on(((`test`.`t1i`.`K1` = 1) and (`test`.`t2i`.`K1r` = 1)))) on(((((`test`.`t2i`.`rowTimestamp` > `test`.`t2a`.`rowTimestamp`) and ((`test`.`t2a`.`K1r` = 1) and (`test`.`t2i`.`K1r` = 1))) or ((`test`.`t2i`.`K2` > `test`.`t2a`.`K2`) and (`test`.`t2i`.`rowTimestamp` = `test`.`t2a`.`rowTimestamp`)) or isnull(`test`.`t2i`.`K2`)) and (`test`.`t1i`.`K1` = 1))) where ((`test`.`t2a`.`K1r` = 1) and isnull(`test`.`t2i`.`K2`)) |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
And both queries return wrong result sets:
MariaDB [test]> SELECT
-> t1a.*, t2a.*, t2b.K2 as K2B, t2b.K1r as K1rB,
-> t2b.rowTimestamp as rowTimestampB, t2b.Event as EventB
-> FROM
-> t1 as t1a JOIN t2 as t2a ON t2a.K1r = t1a.K1
-> LEFT JOIN
-> v1 as t2b
-> ON ((t2b.K1r = t1a.K1 AND t2b.rowTimestamp > t2a.rowTimestamp) OR
-> (t2b.rowTimestamp = t2a.rowTimestamp AND t2b.K2 > t2a.K2))
-> OR (t2b.K2 IS NULL)
-> WHERE
-> t1a.K1 = 1 AND
-> t2b.K2 IS NULL;
+----+--------+----+------+---------------------+--------------+------+------+---------------+--------+
| K1 | Name | K2 | K1r | rowTimestamp | Event | K2B | K1rB | rowTimestampB | EventB |
+----+--------+----+------+---------------------+--------------+------+------+---------------+--------+
| 1 | T1Row1 | 1 | 1 | 2015-04-13 10:42:11 | T1Row1Event1 | NULL | NULL | NULL | NULL |
| 1 | T1Row1 | 2 | 1 | 2015-04-13 10:42:12 | T1Row1Event2 | NULL | NULL | NULL | NULL |
| 1 | T1Row1 | 3 | 1 | 2015-04-13 10:42:12 | T1Row1Event3 | NULL | NULL | NULL | NULL |
+----+--------+----+------+---------------------+--------------+------+------+---------------+--------+
MariaDB [test]>
MariaDB [test]> SELECT t1a.*, t2a.*,
-> t2i.K2 AS K2B, t2i.K1r AS K1rB,
-> t2i.rowTimestamp AS rowTimestampB, t2i.Event AS EventB
-> FROM
-> t1 t1a JOIN t2 t2a ON t2a.K1r = t1a.K1
-> LEFT JOIN
-> ( t1 t1i LEFT JOIN t2 t2i ON t2i.K1r = t1i.K1)
-> ON (t1i.K1 = 1) AND
-> (((t2i.K1r = t1a.K1 AND t2i.rowTimestamp > t2a.rowTimestamp ) OR
-> (t2i.rowTimestamp = t2a.rowTimestamp AND t2i.K2 > t2a.K2))
-> OR (t2i.K2 IS NULL))
-> WHERE
-> t2a.K1r = 1 AND t2i.K2 IS NULL;
+----+--------+----+------+---------------------+--------------+------+------+---------------+--------+
| K1 | Name | K2 | K1r | rowTimestamp | Event | K2B | K1rB | rowTimestampB | EventB |
+----+--------+----+------+---------------------+--------------+------+------+---------------+--------+
| 1 | T1Row1 | 1 | 1 | 2015-04-13 10:42:11 | T1Row1Event1 | NULL | NULL | NULL | NULL |
| 1 | T1Row1 | 2 | 1 | 2015-04-13 10:42:12 | T1Row1Event2 | NULL | NULL | NULL | NULL |
| 1 | T1Row1 | 3 | 1 | 2015-04-13 10:42:12 | T1Row1Event3 | NULL | NULL | NULL | NULL |
+----+--------+----+------+---------------------+--------------+------+------+---------------+--------+
(All results in these comments I've got on maridb-5.2. For all other version we get similar outputs).
t2i.rowTimestamp AS rowTimestampB, t2i.Event AS EventB
FROM
t1 t1a JOIN t2 t2a ON t2a.K1r = t1a.K1
LEFT JOIN
( t1 t1i LEFT JOIN t2 t2i ON t2i.K1r = t1i.K1)
ON (t1i.K1 = 1) AND
(((t2i.K1r = t1a.K1 AND t2i.rowTimestamp > t2a.rowTimestamp ) OR
(t2i.rowTimestamp = t2a.rowTimestamp AND t2i.K2 > t2a.K2))
OR (t2i.K2 IS NULL))
WHERE
t2a.K1r = 1 AND t2i.K2 IS NULL;
and find out why it returns a wrong result set.
Most probably it's because the optimizer tries to use 'Not exists' optimization shortcut for the last table.
Not exists optimization was originally introduced for simple, not nested outer joins. When nested
outer joins were supported no special measures were taken to support this optimization for them.
So the following code of get_best_combination() tried to block usage of this optimization for nested
outer joins:
if (!*j->on_expr_ref)
form->reginfo.not_exists_optimize=0; // Only with LEFT JOIN
This code erroneously allowed to use this optimization for simple nested outer joins.
The proper code should block usage of this optimization for any nested outer joins:
if (!*j->on_expr_ref || j->table->pos_in_table_list->embedding)
form->reginfo.not_exists_optimize=0; // Only with LEFT JOIN
With this code applied to 5.2 the query returns the right result set:
ariaDB [test]> SELECT t1a.*, t2a.*,
-> t2i.K2 AS K2B, t2i.K1r AS K1rB,
-> t2i.rowTimestamp AS rowTimestampB, t2i.Event AS EventB
-> FROM
-> t1 t1a JOIN t2 t2a ON t2a.K1r = t1a.K1
-> LEFT JOIN
-> ( t1 t1i LEFT JOIN t2 t2i ON t2i.K1r = t1i.K1)
-> ON (t1i.K1 = 1) AND
-> (((t2i.K1r = t1a.K1 AND t2i.rowTimestamp > t2a.rowTimestamp ) OR
-> (t2i.rowTimestamp = t2a.rowTimestamp AND t2i.K2 > t2a.K2))
Igor Babaev (Inactive)
added a comment - Now we can focus exclusively on the query:
SELECT t1a.*, t2a.*,
t2i.K2 AS K2B, t2i.K1r AS K1rB,
t2i.rowTimestamp AS rowTimestampB, t2i.Event AS EventB
FROM
t1 t1a JOIN t2 t2a ON t2a.K1r = t1a.K1
LEFT JOIN
( t1 t1i LEFT JOIN t2 t2i ON t2i.K1r = t1i.K1)
ON (t1i.K1 = 1) AND
(((t2i.K1r = t1a.K1 AND t2i.rowTimestamp > t2a.rowTimestamp ) OR
(t2i.rowTimestamp = t2a.rowTimestamp AND t2i.K2 > t2a.K2))
OR (t2i.K2 IS NULL))
WHERE
t2a.K1r = 1 AND t2i.K2 IS NULL;
and find out why it returns a wrong result set.
Most probably it's because the optimizer tries to use 'Not exists' optimization shortcut for the last table.
Not exists optimization was originally introduced for simple, not nested outer joins. When nested
outer joins were supported no special measures were taken to support this optimization for them.
So the following code of get_best_combination() tried to block usage of this optimization for nested
outer joins:
if (!*j->on_expr_ref)
form->reginfo.not_exists_optimize=0; // Only with LEFT JOIN
This code erroneously allowed to use this optimization for simple nested outer joins.
The proper code should block usage of this optimization for any nested outer joins:
if (!*j->on_expr_ref || j->table->pos_in_table_list->embedding)
form->reginfo.not_exists_optimize=0; // Only with LEFT JOIN
With this code applied to 5.2 the query returns the right result set:
ariaDB [test]> SELECT t1a.*, t2a.*,
-> t2i.K2 AS K2B, t2i.K1r AS K1rB,
-> t2i.rowTimestamp AS rowTimestampB, t2i.Event AS EventB
-> FROM
-> t1 t1a JOIN t2 t2a ON t2a.K1r = t1a.K1
-> LEFT JOIN
-> ( t1 t1i LEFT JOIN t2 t2i ON t2i.K1r = t1i.K1)
-> ON (t1i.K1 = 1) AND
-> (((t2i.K1r = t1a.K1 AND t2i.rowTimestamp > t2a.rowTimestamp ) OR
-> (t2i.rowTimestamp = t2a.rowTimestamp AND t2i.K2 > t2a.K2))
-> OR (t2i.K2 IS NULL))
-> WHERE
-> t2a.K1r = 1 AND t2i.K2 IS NULL;
+----+--------+----+------+---------------------+--------------+------+------+---------------+--------+
| K1 | Name | K2 | K1r | rowTimestamp | Event | K2B | K1rB | rowTimestampB | EventB |
+----+--------+----+------+---------------------+--------------+------+------+---------------+--------+
| 1 | T1Row1 | 3 | 1 | 2015-04-13 10:42:12 | T1Row1Event3 | NULL | NULL | NULL | NULL |
+----+--------+----+------+---------------------+--------------+------+------+---------------+--------+
if (!*j->on_expr_ref || j->table->pos_in_table_list->embedding)
form->reginfo.not_exists_optimize=0; // Only with LEFT JOIN
to 5.5 the last query, of course, returns the correct result set, but EXPLAIN show that 'Not exists" optimization is not applied to the query anymore. Also some a test from join_nested fails because EXPLAIN does not show
'Not exists' anymore.
To prohibit applying 'Not exists' optimization to nested outer joins is, probably, not a bright idea.
Let's try to support this optimization for any nested outer join.
To figure out why the current code does not support it properly, let's first revert the patch merged from the mysql code line that tried to fix bug #58490 that concerns exacly 'Not exists' optimization applied to nested outer joins. (I'll explain later why this patch does not fix the problem in general.)
The mariadb-5.3 still has the old variant of the code where this optimization is applied
for (JOIN_TAB *tab= first_unmatched; tab <= join_tab; tab++)
{
/* Check all predicates that has just been activated. */
/*
Actually all predicates non-guarded by first_unmatched->found
will be re-evaluated again. It could be fixed, but, probably,
it's not worth doing now.
*/
if (tab->select_cond && !tab->select_cond->val_int())
{
/* The condition attached to table tab is false */
if (tab == join_tab)
{
found= 0;
if (tab->table->reginfo.not_exists_optimize)
DBUG_RETURN(NESTED_LOOP_NO_MORE_ROWS);
}
else
{
/*
Set a return point if rejected predicate is attached
not to the last table of the current nest level.
*/
join->return_tab= tab;
if (tab->table->reginfo.not_exists_optimize)
DBUG_RETURN(NESTED_LOOP_NO_MORE_ROWS);
else
DBUG_RETURN(NESTED_LOOP_OK);
}
}
}
Let's figure out what's wrong with this code.
The problem with the above code is that it actually does not check that the IS NULL predicate is
actually active at the moment when we decide to skip records. This check should be done like this:
bool not_exists_opt_is_applicable=
tab->table->reginfo.not_exists_optimize;
for (JOIN_TAB *first_upper= first_unmatched->first_upper;
not_exists_opt_is_applicable && first_upper;
first_upper= first_upper->first_upper)
{
if (!first_upper->found)
not_exists_opt_is_applicable= false;
}
The fix for bug #58490 missed this point as well.
Igor Babaev (Inactive)
added a comment - - edited If I apply the change
if (!*j->on_expr_ref || j->table->pos_in_table_list->embedding)
form->reginfo.not_exists_optimize=0; // Only with LEFT JOIN
to 5.5 the last query, of course, returns the correct result set, but EXPLAIN show that 'Not exists" optimization is not applied to the query anymore. Also some a test from join_nested fails because EXPLAIN does not show
'Not exists' anymore.
To prohibit applying 'Not exists' optimization to nested outer joins is, probably, not a bright idea.
Let's try to support this optimization for any nested outer join.
To figure out why the current code does not support it properly, let's first revert the patch merged from the mysql code line that tried to fix bug #58490 that concerns exacly 'Not exists' optimization applied to nested outer joins. (I'll explain later why this patch does not fix the problem in general.)
The mariadb-5.3 still has the old variant of the code where this optimization is applied
for (JOIN_TAB *tab= first_unmatched; tab <= join_tab; tab++)
{
/* Check all predicates that has just been activated. */
/*
Actually all predicates non-guarded by first_unmatched->found
will be re-evaluated again. It could be fixed, but, probably,
it's not worth doing now.
*/
if (tab->select_cond && !tab->select_cond->val_int())
{
/* The condition attached to table tab is false */
if (tab == join_tab)
{
found= 0;
if (tab->table->reginfo.not_exists_optimize)
DBUG_RETURN(NESTED_LOOP_NO_MORE_ROWS);
}
else
{
/*
Set a return point if rejected predicate is attached
not to the last table of the current nest level.
*/
join->return_tab= tab;
if (tab->table->reginfo.not_exists_optimize)
DBUG_RETURN(NESTED_LOOP_NO_MORE_ROWS);
else
DBUG_RETURN(NESTED_LOOP_OK);
}
}
}
Let's figure out what's wrong with this code.
The problem with the above code is that it actually does not check that the IS NULL predicate is
actually active at the moment when we decide to skip records. This check should be done like this:
bool not_exists_opt_is_applicable=
tab->table->reginfo.not_exists_optimize;
for (JOIN_TAB *first_upper= first_unmatched->first_upper;
not_exists_opt_is_applicable && first_upper;
first_upper= first_upper->first_upper)
{
if (!first_upper->found)
not_exists_opt_is_applicable= false;
}
The fix for bug #58490 missed this point as well.
I've added
MDEV-12099"LEFT JOIN merged from view with null rejected WHERE is not converted into INNER JOIN" to fix the above problemYet without the transformation of the embedding LEFT JOIN into an INNER JOIN the server must return the
right result set.
To factor out this problem I've changed the testcase removed the predicate (t2i.K2 IS NOT NULL) from the WHERE condition of the view v1 and ORed the predicate (t2b.K2 IS NULL) to the ON condition of the query
CREATE VIEW v1 AS
SELECT t2i.*
FROM t1 as t1i LEFT JOIN t2 as t2i ON t2i.K1r = t1i.K1
WHERE t1i.K1 = 1 ;
SELECT
t1a.*, t2a.*, t2b.K2 as K2B, t2b.K1r as K1rB,
t2b.rowTimestamp as rowTimestampB, t2b.Event as EventB
FROM
t1 as t1a JOIN t2 as t2a ON t2a.K1r = t1a.K1
LEFT JOIN
v1 as t2b
ON ((t2b.K1r = t1a.K1 AND t2b.rowTimestamp > t2a.rowTimestamp) OR
(t2b.rowTimestamp = t2a.rowTimestamp AND t2b.K2 > t2a.K2))
OR (t2b.K2 IS NULL)
WHERE
t1a.K1 = 1 AND
t2b.K2 IS NULL;
The result of the manual merge of v1 into the main query now looks like this:
SELECT t1a.*, t2a.*,
t2i.K2 AS K2B, t2i.K1r AS K1rB,
t2i.rowTimestamp AS rowTimestampB, t2i.Event AS EventB
FROM
t1 t1a JOIN t2 t2a ON t2a.K1r = t1a.K1
LEFT JOIN
( t1 t1i LEFT JOIN t2 t2i ON t2i.K1r = t1i.K1)
ON (t1i.K1 = 1) AND
(((t2i.K1r = t1a.K1 AND t2i.rowTimestamp > t2a.rowTimestamp ) OR
(t2i.rowTimestamp = t2a.rowTimestamp AND t2i.K2 > t2a.K2))
OR (t2i.K2 IS NULL))
WHERE
t2a.K1r = 1 AND t2i.K2 IS NULL;
Now the optimizer does not convert the embedded OUTER JOIN into INNER JOIN for
both queries:
MariaDB [test]> explain extended
-> SELECT
-> t1a.*, t2a.*, t2b.K2 as K2B, t2b.K1r as K1rB,
-> t2b.rowTimestamp as rowTimestampB, t2b.Event as EventB
-> FROM
-> t1 as t1a JOIN t2 as t2a ON t2a.K1r = t1a.K1
-> LEFT JOIN
-> v1 as t2b
-> ON ((t2b.K1r = t1a.K1 AND t2b.rowTimestamp > t2a.rowTimestamp) OR
-> (t2b.rowTimestamp = t2a.rowTimestamp AND t2b.K2 > t2a.K2))
-> OR (t2b.K2 IS NULL)
-> WHERE
-> t1a.K1 = 1 AND
-> t2b.K2 IS NULL;
+----+-------------+-------+-------+---------------+---------+---------+-------+------+----------+-------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+----------+-------------------------+
| 1 | SIMPLE | t1a | const | PRIMARY | PRIMARY | 4 | const | 1 | 100.00 | |
| 1 | SIMPLE | t2a | ALL | NULL | NULL | NULL | NULL | 3 | 100.00 | Using where |
| 1 | SIMPLE | t1i | const | PRIMARY | PRIMARY | 4 | const | 1 | 100.00 | Using index |
| 1 | SIMPLE | t2i | ALL | NULL | NULL | NULL | NULL | 3 | 100.00 | Using where; Not exists |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+----------+-------------------------+
MariaDB [test]> show warnings;
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | select '1' AS `K1`,'T1Row1' AS `Name`,`test`.`t2a`.`K2` AS `K2`,`test`.`t2a`.`K1r` AS `K1r`,`test`.`t2a`.`rowTimestamp` AS `rowTimestamp`,`test`.`t2a`.`Event` AS `Event`,`test`.`t2i`.`K2` AS `K2B`,`test`.`t2i`.`K1r` AS `K1rB`,`test`.`t2i`.`rowTimestamp` AS `rowTimestampB`,`test`.`t2i`.`Event` AS `EventB` from `test`.`t1` `t1a` join `test`.`t2` `t2a` left join (`test`.`t1` `t1i` left join `test`.`t2` `t2i` on(((`test`.`t1i`.`K1` = 1) and (`test`.`t2i`.`K1r` = 1)))) on(((((`test`.`t2i`.`rowTimestamp` > `test`.`t2a`.`rowTimestamp`) and ((`test`.`t2a`.`K1r` = 1) and (`test`.`t2i`.`K1r` = 1))) or ((`test`.`t2i`.`K2` > `test`.`t2a`.`K2`) and (`test`.`t2i`.`rowTimestamp` = `test`.`t2a`.`rowTimestamp`)) or isnull(`test`.`t2i`.`K2`)) and (`test`.`t1i`.`K1` = 1))) where ((`test`.`t2a`.`K1r` = 1) and isnull(`test`.`t2i`.`K2`)) |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
MariaDB [test]> explain extended
-> SELECT t1a.*, t2a.*,
-> t2i.K2 AS K2B, t2i.K1r AS K1rB,
-> t2i.rowTimestamp AS rowTimestampB, t2i.Event AS EventB
-> FROM
-> t1 t1a JOIN t2 t2a ON t2a.K1r = t1a.K1
-> LEFT JOIN
-> ( t1 t1i LEFT JOIN t2 t2i ON t2i.K1r = t1i.K1)
-> ON (t1i.K1 = 1) AND
-> (((t2i.K1r = t1a.K1 AND t2i.rowTimestamp > t2a.rowTimestamp ) OR
-> (t2i.rowTimestamp = t2a.rowTimestamp AND t2i.K2 > t2a.K2))
-> OR (t2i.K2 IS NULL))
-> WHERE
-> t2a.K1r = 1 AND t2i.K2 IS NULL;
+----+-------------+-------+-------+---------------+---------+---------+-------+------+----------+-------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+----------+-------------------------+
| 1 | SIMPLE | t1a | const | PRIMARY | PRIMARY | 4 | const | 1 | 100.00 | |
| 1 | SIMPLE | t2a | ALL | NULL | NULL | NULL | NULL | 3 | 100.00 | Using where |
| 1 | SIMPLE | t1i | const | PRIMARY | PRIMARY | 4 | const | 1 | 100.00 | Using index |
| 1 | SIMPLE | t2i | ALL | NULL | NULL | NULL | NULL | 3 | 100.00 | Using where; Not exists |
+----+-------------+-------+-------+---------------+---------+---------+-------+------+----------+-------------------------+
MariaDB [test]> show warnings;
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | select '1' AS `K1`,'T1Row1' AS `Name`,`test`.`t2a`.`K2` AS `K2`,`test`.`t2a`.`K1r` AS `K1r`,`test`.`t2a`.`rowTimestamp` AS `rowTimestamp`,`test`.`t2a`.`Event` AS `Event`,`test`.`t2i`.`K2` AS `K2B`,`test`.`t2i`.`K1r` AS `K1rB`,`test`.`t2i`.`rowTimestamp` AS `rowTimestampB`,`test`.`t2i`.`Event` AS `EventB` from `test`.`t1` `t1a` join `test`.`t2` `t2a` left join (`test`.`t1` `t1i` left join `test`.`t2` `t2i` on(((`test`.`t1i`.`K1` = 1) and (`test`.`t2i`.`K1r` = 1)))) on(((((`test`.`t2i`.`rowTimestamp` > `test`.`t2a`.`rowTimestamp`) and ((`test`.`t2a`.`K1r` = 1) and (`test`.`t2i`.`K1r` = 1))) or ((`test`.`t2i`.`K2` > `test`.`t2a`.`K2`) and (`test`.`t2i`.`rowTimestamp` = `test`.`t2a`.`rowTimestamp`)) or isnull(`test`.`t2i`.`K2`)) and (`test`.`t1i`.`K1` = 1))) where ((`test`.`t2a`.`K1r` = 1) and isnull(`test`.`t2i`.`K2`)) |
+-------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
And both queries return wrong result sets:
MariaDB [test]> SELECT
-> t1a.*, t2a.*, t2b.K2 as K2B, t2b.K1r as K1rB,
-> t2b.rowTimestamp as rowTimestampB, t2b.Event as EventB
-> FROM
-> t1 as t1a JOIN t2 as t2a ON t2a.K1r = t1a.K1
-> LEFT JOIN
-> v1 as t2b
-> ON ((t2b.K1r = t1a.K1 AND t2b.rowTimestamp > t2a.rowTimestamp) OR
-> (t2b.rowTimestamp = t2a.rowTimestamp AND t2b.K2 > t2a.K2))
-> OR (t2b.K2 IS NULL)
-> WHERE
-> t1a.K1 = 1 AND
-> t2b.K2 IS NULL;
+----+--------+----+------+---------------------+--------------+------+------+---------------+--------+
| K1 | Name | K2 | K1r | rowTimestamp | Event | K2B | K1rB | rowTimestampB | EventB |
+----+--------+----+------+---------------------+--------------+------+------+---------------+--------+
| 1 | T1Row1 | 1 | 1 | 2015-04-13 10:42:11 | T1Row1Event1 | NULL | NULL | NULL | NULL |
| 1 | T1Row1 | 2 | 1 | 2015-04-13 10:42:12 | T1Row1Event2 | NULL | NULL | NULL | NULL |
| 1 | T1Row1 | 3 | 1 | 2015-04-13 10:42:12 | T1Row1Event3 | NULL | NULL | NULL | NULL |
+----+--------+----+------+---------------------+--------------+------+------+---------------+--------+
MariaDB [test]>
MariaDB [test]> SELECT t1a.*, t2a.*,
-> t2i.K2 AS K2B, t2i.K1r AS K1rB,
-> t2i.rowTimestamp AS rowTimestampB, t2i.Event AS EventB
-> FROM
-> t1 t1a JOIN t2 t2a ON t2a.K1r = t1a.K1
-> LEFT JOIN
-> ( t1 t1i LEFT JOIN t2 t2i ON t2i.K1r = t1i.K1)
-> ON (t1i.K1 = 1) AND
-> (((t2i.K1r = t1a.K1 AND t2i.rowTimestamp > t2a.rowTimestamp ) OR
-> (t2i.rowTimestamp = t2a.rowTimestamp AND t2i.K2 > t2a.K2))
-> OR (t2i.K2 IS NULL))
-> WHERE
-> t2a.K1r = 1 AND t2i.K2 IS NULL;
+----+--------+----+------+---------------------+--------------+------+------+---------------+--------+
| K1 | Name | K2 | K1r | rowTimestamp | Event | K2B | K1rB | rowTimestampB | EventB |
+----+--------+----+------+---------------------+--------------+------+------+---------------+--------+
| 1 | T1Row1 | 1 | 1 | 2015-04-13 10:42:11 | T1Row1Event1 | NULL | NULL | NULL | NULL |
| 1 | T1Row1 | 2 | 1 | 2015-04-13 10:42:12 | T1Row1Event2 | NULL | NULL | NULL | NULL |
| 1 | T1Row1 | 3 | 1 | 2015-04-13 10:42:12 | T1Row1Event3 | NULL | NULL | NULL | NULL |
+----+--------+----+------+---------------------+--------------+------+------+---------------+--------+
(All results in these comments I've got on maridb-5.2. For all other version we get similar outputs).