SELECT n_nationkey FROM nation LEFTJOIN region ON ( r_regionkey = n_regionkey ) JOIN supplier ON ( s_nationkey = n_nationkey ) JOIN partsupp ON ( s_suppkey = ps_suppkey ) JOIN lineitem ON ( ps_partkey = l_partkey ) JOIN orders ON ( l_orderkey = o_orderkey ) JOIN customer WHERE o_totalprice = 151 OR c_nationkey = s_nationkey AND s_acctbal > 187;
CREATETABLE t (a INT);
INSERTINTO t VALUES (1),(17),(2),(17);
SELECT * FROM t WHERE a IN (
SELECT n_nationkey FROM nation LEFTJOIN region ON ( r_regionkey = n_regionkey ) JOIN supplier ON ( s_nationkey = n_nationkey ) JOIN partsupp ON ( s_suppkey = ps_suppkey ) JOIN lineitem ON ( ps_partkey = l_partkey ) JOIN orders ON ( l_orderkey = o_orderkey ) JOIN customer WHERE o_totalprice = 151 OR c_nationkey = s_nationkey AND s_acctbal > 187
);
# Cleanup
DROPDATABASE dbt3;
SELECT n_nationkey FROM nation LEFTJOIN region ON ( r_regionkey = n_regionkey ) JOIN supplier ON ( s_nationkey = n_nationkey ) JOIN partsupp ON ( s_suppkey = ps_suppkey ) JOIN lineitem ON ( ps_partkey = l_partkey ) JOIN orders ON ( l_orderkey = o_orderkey ) JOIN customer WHERE o_totalprice = 151 OR c_nationkey = s_nationkey AND s_acctbal > 187;
n_nationkey
17
17
17
CREATETABLE t (a INT);
INSERTINTO t VALUES (1),(17),(2),(17);
SELECT * FROM t WHERE a IN (
SELECT n_nationkey FROM nation LEFTJOIN region ON ( r_regionkey = n_regionkey ) JOIN supplier ON ( s_nationkey = n_nationkey ) JOIN partsupp ON ( s_suppkey = ps_suppkey ) JOIN lineitem ON ( ps_partkey = l_partkey ) JOIN orders ON ( l_orderkey = o_orderkey ) JOIN customer WHERE o_totalprice = 151 OR c_nationkey = s_nationkey AND s_acctbal > 187
);
a
So, the first SELECT above returns rows with value 17.
The table t contains values 17.
However, when the same SELECT is used as IN subquery, the resulting query doesn't return anything.
Plan:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 PRIMARY supplier system PRIMARYNULLNULLNULL 1 100.00
I played with the query a bit.
If I change the t1 table to have only 3 values 1,17 and 2 then things works. Adding a row with value '17' changes the plan to
use materialization of <subquery2> and in this case the result is 0 rows.
The same happens if the last entry in t is '18.
Adding a key on 't' changes the plan to not use materialization or join_buffer and then things works.
I also tried with disabling join_cache. In this case materialization still happens and the bug is still there.
This indicates something is wrong in materialization
Michael Widenius
added a comment - - edited I played with the query a bit.
If I change the t1 table to have only 3 values 1,17 and 2 then things works. Adding a row with value '17' changes the plan to
use materialization of <subquery2> and in this case the result is 0 rows.
The same happens if the last entry in t is '18.
Adding a key on 't' changes the plan to not use materialization or join_buffer and then things works.
I also tried with disabling join_cache. In this case materialization still happens and the bug is still there.
This indicates something is wrong in materialization
The problem can be seen here in EXPLAIN FORMAT=JSON output:
"table_name": "<subquery2>",
"access_type": "ALL",
"possible_keys": ["distinct_key"],
"rows": 1,
"filtered": 100,
"attached_condition": "orders.o_totalprice = 151 or customer.c_nationkey = 17",
Here, the attached_condition refers to columns of tables that are inside the materialization nest.
One could argue that _nationkey is a part of IN-equality and could be accessible (but it's a part of a multiple-equality so we'll need to check which table's _nationkey the data is unpacked to).
But it's clear that o_totalprice should NOT be accessed when materialization is already done...
Sergei Petrunia
added a comment -
Take-aways from discussion with Monty:
The problem can be seen here in EXPLAIN FORMAT=JSON output:
"table_name": "<subquery2>",
"access_type": "ALL",
"possible_keys": ["distinct_key"],
"rows": 1,
"filtered": 100,
"attached_condition": "orders.o_totalprice = 151 or customer.c_nationkey = 17",
Here, the attached_condition refers to columns of tables that are inside the materialization nest.
One could argue that _nationkey is a part of IN-equality and could be accessible (but it's a part of a multiple-equality so we'll need to check which table's _nationkey the data is unpacked to).
But it's clear that o_totalprice should NOT be accessed when materialization is already done...
LEFTJOIN region ON ( r_regionkey = n_regionkey ) --- const
JOIN supplier ON ( s_nationkey = n_nationkey ) --- const
JOIN partsupp ON ( s_suppkey = ps_suppkey )
JOIN lineitem ON ( ps_partkey = l_partkey )
JOIN orders ON ( l_orderkey = o_orderkey )
JOIN customer
WHERE o_totalprice = 151 OR c_nationkey = s_nationkey AND s_acctbal > 187
);
Sergei Petrunia
added a comment -
Query:
SELECT * FROM t WHERE a IN (
SELECT n_nationkey
FROM
nation --- const
LEFT JOIN region ON ( r_regionkey = n_regionkey ) --- const
JOIN supplier ON ( s_nationkey = n_nationkey ) --- const
JOIN partsupp ON ( s_suppkey = ps_suppkey )
JOIN lineitem ON ( ps_partkey = l_partkey )
JOIN orders ON ( l_orderkey = o_orderkey )
JOIN customer
WHERE o_totalprice = 151 OR c_nationkey = s_nationkey AND s_acctbal > 187
);
Sergei Petrunia
added a comment - Where does
"orders.o_totalprice = 151 or customer.c_nationkey = 17"
come from?
Query's WHERE after rewrites:
"resulting_condition":
"(
orders.o_totalprice = 151 or
(
supplier.s_acctbal > 187 and
multiple equal(t.a, nation.n_nationkey, supplier.s_nationkey, customer.c_nationkey)
)
)
and
multiple equal(t.a, nation.n_nationkey, supplier.s_nationkey) and
multiple equal(lineitem.l_orderkey, orders.o_orderkey) and
multiple equal(partsupp.ps_partkey, lineitem.l_partkey) and
multiple equal(supplier.s_suppkey, partsupp.ps_suppkey)
"
The original
o_totalprice = 151 OR (c_nationkey = s_nationkey AND s_acctbal > 187)
became:
orders.o_totalprice = 151 or
(
supplier.s_acctbal > 187 and
multiple equal(t.a, nation.n_nationkey, supplier.s_nationkey, customer.c_nationkey)
)
That is, "customer.c_nationkey = supplier.s_nationkey" became
multiple equal(t.a, nation.n_nationkey, supplier.s_nationkey, customer.c_nationkey)
The two new members of the multiple equality are "inherited" from this top-level multiple-equality:
multiple equal(t.a, nation.n_nationkey, supplier.s_nationkey)
But why are pair-wise equalities generated from inherited members?
I think they should not be.
Consider this example (further TRIVIAL-EXAMPLE):
createtable t (
c1 int,
c2 int,
c3 int,
c4 int,
c10 int
);
explain format=json
select * from t
where
c1=c2 and c2=c3 and (c10 < 3333 or c3=c4);
shows:
"attached_condition":
"t.c2 = t.c1 and t.c3 = t.c1 and (t.c10 < 3333 or t.c4 = t.c1)"
Note that the right side of the (... or ...) only includes "c4=c1" and doesn't include "c1=c2" or "c2=c3"....
Sergei Petrunia
added a comment - - edited But why are pair-wise equalities generated from inherited members?
I think they should not be.
Consider this example (further TRIVIAL-EXAMPLE):
create table t (
c1 int ,
c2 int ,
c3 int ,
c4 int ,
c10 int
);
explain format=json
select * from t
where
c1=c2 and c2=c3 and (c10 < 3333 or c3=c4);
shows:
"attached_condition":
"t.c2 = t.c1 and t.c3 = t.c1 and (t.c10 < 3333 or t.c4 = t.c1)"
Note that the right side of the (... or ...) only includes "c4=c1" and doesn't include "c1=c2" or "c2=c3"....
Sergei Petrunia
added a comment - - edited eliminate_item_equal() does not remove the "inherited" members of the multiple equality, because the Item_equal object has upper_levels=NULL.
In substitute_for_best_equal_field() right before the call to eliminate_item_equal() we have:
(gdb) printf "%s\n", dbug_print_item(item_equal)
multiple equal(17, customer.c_nationkey, t.a)
(gdb) p item_equal->upper_levels
$70 = (COND_EQUAL *) 0x0
for comparison, in the TRIVIAL-EXAMPLE:
(gdb) p dbug_print_item(item_equal)
$111 = 0x555557c202c0 <dbug_item_print_buf> "multiple equal(t.c1, t.c2, t.c3, t.c4)"
(gdb) p item_equal->upper_levels
$113 = (COND_EQUAL *) 0x7fff980182b0
(gdb) p dbug_print_item(item_equal->upper_levels->current_level.elem(0))
$127 = 0x555557c202c0 <dbug_item_print_buf> "multiple equal(t.c1, t.c2, t.c3)"
One can observe multiple equality to generate "redundant" conditions even without any semi-joins. I will post a query and full explain output below.
The important part:
"table": {
"table_name": "t",
"access_type": "ALL",
"rows": 4,
"filtered": 100,
"attached_condition": "t.a = 17 and (orders.o_totalprice = 151 or customer.c_nationkey = 17 and t.a = 17)"
Note the
t.a=17 AND
( .... OR (... AND t.a=17))
The first conjunct of the top-level AND (line #1) guarantees that t.a=17 is true. Why is t.a=17 also generated inside the OR on line #2?
igor, is this a bug of in multiple equality code or its intended property? If it's a bug, should we fix it in stable releases (or should we make SJ-Materialization code work with this behavior?)
Sergei Petrunia
added a comment - One can observe multiple equality to generate "redundant" conditions even without any semi-joins. I will post a query and full explain output below.
The important part:
"table": {
"table_name": "t",
"access_type": "ALL",
"rows": 4,
"filtered": 100,
"attached_condition": "t.a = 17 and (orders.o_totalprice = 151 or customer.c_nationkey = 17 and t.a = 17)"
Note the
t.a=17 AND
( .... OR (... AND t.a=17))
The first conjunct of the top-level AND (line #1) guarantees that t.a=17 is true. Why is t.a=17 also generated inside the OR on line #2?
igor , is this a bug of in multiple equality code or its intended property? If it's a bug, should we fix it in stable releases (or should we make SJ-Materialization code work with this behavior?)
Test dataset: mdev30361-dataset.sql
The query I've used (without subquery): mdev30361-join-query.sql
Explain output I've got: mdev30361-join-explain-output.txt
I played with the query a bit.
If I change the t1 table to have only 3 values 1,17 and 2 then things works. Adding a row with value '17' changes the plan to
use materialization of <subquery2> and in this case the result is 0 rows.
The same happens if the last entry in t is '18.
Adding a key on 't' changes the plan to not use materialization or join_buffer and then things works.
I also tried with disabling join_cache. In this case materialization still happens and the bug is still there.
This indicates something is wrong in materialization