A DuplicateWeedout strategy is supposed to be implemented using temporary tables for semijoin queries. For some values of "optimizer_search_depth", this duplicate weedout step may not occur.
Attached is the following:
optimizer_search_depth_semijoin_data_setup.sql - A script that sets up a database and a few tables.
optimizer_search_depth_semijoin_query_test.sql - A script that queries the tables to demonstrate the problem.
optimizer_search_depth_semijoin_output.txt - Example execution of the scripts and their output.
T1_0_ is removed by Table Elimination (this seems to be ok)
T1_1_ is pulled out by table pullout (is this ok? an inner table of an outer join is pulled out?)
Sergei Petrunia
added a comment - - edited Query structure
T2_0_
LJ T1_0_
SJ
T3_0_
T2_1_
LJ T1_1_
Query structure according to EXPLAIN EXTENDED:
T1_1_ semi join (T3_0_ join T2_1_) join T2_0_
Debugging query optimization...
T1_0_ is removed by Table Elimination (this seems to be ok)
T1_1_ is pulled out by table pullout (is this ok? an inner table of an outer join is pulled out?)
Removed all outer joins from the example query and it still fails. this returns 9 rows with semijoin=on and 3 rows with semijoin=off:
EXPLAIN
SELECT SQL_NO_CACHE
T2_0_.T1IdRef,
T2_0_.T2Id
FROM
T2 T2_0_
WHERE
T2_0_.T1IdRef IN (
SELECT
T1_1_.T1Id
FROM
T3 T3_0_
INNER JOIN
T2 T2_1_
ON T3_0_.T2IdRef=T2_1_.T2Id
INNER JOIN
T1 T1_1_
ON T2_1_.T1IdRef=T1_1_.T1Id
WHERE
T3_0_.T3IdRef= 1
);
Sergei Petrunia
added a comment - Removed all outer joins from the example query and it still fails. this returns 9 rows with semijoin=on and 3 rows with semijoin=off:
EXPLAIN
SELECT SQL_NO_CACHE
T2_0_.T1IdRef,
T2_0_.T2Id
FROM
T2 T2_0_
WHERE
T2_0_.T1IdRef IN (
SELECT
T1_1_.T1Id
FROM
T3 T3_0_
INNER JOIN
T2 T2_1_
ON T3_0_.T2IdRef=T2_1_.T2Id
INNER JOIN
T1 T1_1_
ON T2_1_.T1IdRef=T1_1_.T1Id
WHERE
T3_0_.T3IdRef= 1
);
Examining things in debugger stopped at fix_semijoin_strategies_for_picked_join_order(), I see that
the join order is
$40 = 0x7fff9c0811a0 "T3_0_" (inner)
$41 = 0x7fff9c081f60 "T2_1_" (inner)
$42 = 0x7fff9c02cd00 "T1_1_" (pulled out)
$43 = 0x7fff9c07e470 "T2_0_" (outer)
but
join_tab->position[i].sj_strategy= SJ_OPT_NONE
for any i. That is, join optimization picked an invalid plan.
Join optimization considers join prefixes as follows:
t1_1_
t2_0_
t3_0_ (best)
t2_1_
. t2_0_
. t1_1_
. t2_1_ (best)
. . t1_1_ (best)
. . t2_0_
. . . t2_0_ (best)
Debugging at table=t2_0_, idx=3 I see that we consider DuplicateElimination but choose not to use it, because its cost is higher than the cost of current plan, AND also join->cur_dups_producing_tables=0 which essentially means that there is another strategy that removed the duplicates. However, looking at join->positions[i].sj_strategy, I see nothing.
Sergei Petrunia
added a comment - - edited Examining things in debugger stopped at fix_semijoin_strategies_for_picked_join_order(), I see that
the join order is
$40 = 0x7fff9c0811a0 "T3_0_" (inner)
$41 = 0x7fff9c081f60 "T2_1_" (inner)
$42 = 0x7fff9c02cd00 "T1_1_" (pulled out)
$43 = 0x7fff9c07e470 "T2_0_" (outer)
but
join_tab->position[i].sj_strategy= SJ_OPT_NONE
for any i. That is, join optimization picked an invalid plan.
Join optimization considers join prefixes as follows:
t1_1_
t2_0_
t3_0_ (best)
t2_1_
. t2_0_
. t1_1_
. t2_1_ (best)
. . t1_1_ (best)
. . t2_0_
. . . t2_0_ (best)
Debugging at table=t2_0_, idx=3 I see that we consider DuplicateElimination but choose not to use it, because its cost is higher than the cost of current plan, AND also join->cur_dups_producing_tables=0 which essentially means that there is another strategy that removed the duplicates. However, looking at join->positions [i] .sj_strategy, I see nothing.
Debugging how join->cur_dups_producing_tables changes during join optimization.
initial value is 0.
It changes to 6 after we've put t3_0_ into join prefix.
and then changes back to 0 when we remove t3_0_ from the join prefix
It changes to 6 again when we put t2_1_ into join prefix
and changes back to 0 when we remove it..
then in greedy_search, we put t3_0_ back into the join prefix:
/* select the first table in the optimal extension as most promising */
best_pos= join->best_positions[idx];
best_table= best_pos.table;
/*
Each subsequent loop of 'best_extension_by_limited_search' uses
'join->positions' for cost estimates, therefore we have to update its
value.
*/
join->positions[idx]= best_pos;
but join->cur_dups_producing_tables remains 0.
Sergei Petrunia
added a comment - Debugging how join->cur_dups_producing_tables changes during join optimization.
initial value is 0.
It changes to 6 after we've put t3_0_ into join prefix.
and then changes back to 0 when we remove t3_0_ from the join prefix
It changes to 6 again when we put t2_1_ into join prefix
and changes back to 0 when we remove it..
then in greedy_search, we put t3_0_ back into the join prefix:
/* select the first table in the optimal extension as most promising */
best_pos= join->best_positions[idx];
best_table= best_pos.table;
/*
Each subsequent loop of 'best_extension_by_limited_search' uses
'join->positions' for cost estimates, therefore we have to update its
value.
*/
join->positions[idx]= best_pos;
but join->cur_dups_producing_tables remains 0.
Ran a set of tests, got several hundred of mismatches between the patched version and baseline 10.0 (low optimizer_search_depth values). All differences look legit, related to the bugfix, so the good news is that tests hit the right spot; but due to the sheer amount of mismatches it's unrealistic to check each one separately.
Next step – I'll run a similar test, but between the patched version with low optimizer_search_depth and baseline 10.0 with normal optimizer_search_depth. The expectation is that there should be no mismatches.
On a separate note, I wonder if the commit comment is correct:
JOIN::cur_dups_producing_tables was not maintained correctly in
the cases of greedy optimization (search_depth > n_tables).
Is it not the other way round, search-depth < n_tables?
Elena Stepanova
added a comment - Ran a set of tests, got several hundred of mismatches between the patched version and baseline 10.0 (low optimizer_search_depth values). All differences look legit, related to the bugfix, so the good news is that tests hit the right spot; but due to the sheer amount of mismatches it's unrealistic to check each one separately.
Next step – I'll run a similar test, but between the patched version with low optimizer_search_depth and baseline 10.0 with normal optimizer_search_depth. The expectation is that there should be no mismatches.
On a separate note, I wonder if the commit comment is correct:
JOIN::cur_dups_producing_tables was not maintained correctly in
the cases of greedy optimization (search_depth > n_tables).
Is it not the other way round, search-depth < n_tables?
The second run went okay. I will also try valgrind tests, but please go ahead and push (we need to decide first whether it's for 5.5 or 10.0).
Elena Stepanova
added a comment - The second run went okay. I will also try valgrind tests, but please go ahead and push (we need to decide first whether it's for 5.5 or 10.0).
Thanks for the report and the test case.
Reproducible as described.
test_type
semijoin=ON, optimizer_search_depth=1
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY T3_0_ ref PRIMARY,FK_T3_T2Id PRIMARY 8 const 3 Using index
1 PRIMARY T2_1_ eq_ref PRIMARY,FK_T2_T1Id PRIMARY 8 semijoin_test.T3_0_.T2IdRef 1
1 PRIMARY T1_1_ eq_ref PRIMARY PRIMARY 8 semijoin_test.T2_1_.T1IdRef 1 Using index
1 PRIMARY T2_0_ ref FK_T2_T1Id FK_T2_T1Id 8 semijoin_test.T2_1_.T1IdRef 1 Using index
T1IdRef T2Id
200001 200011
200001 200012
200001 200013
200001 200011
200001 200012
200001 200013
200001 200011
200001 200012
200001 200013
test_type
semijoin=ON, optimizer_search_depth=3
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY T3_0_ ref PRIMARY,FK_T3_T2Id PRIMARY 8 const 3 Using index; Start temporary
1 PRIMARY T2_1_ eq_ref PRIMARY,FK_T2_T1Id PRIMARY 8 semijoin_test.T3_0_.T2IdRef 1
1 PRIMARY T1_1_ eq_ref PRIMARY PRIMARY 8 semijoin_test.T2_1_.T1IdRef 1 Using index
1 PRIMARY T2_0_ ref FK_T2_T1Id FK_T2_T1Id 8 semijoin_test.T2_1_.T1IdRef 1 Using index; End temporary
T1IdRef T2Id
200001 200011
200001 200012
200001 200013
test_type
semijoin=OFF, optimizer_search_depth=1
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY T2_0_ index NULL FK_T2_T1Id 8 NULL 27 Using where; Using index
2 MATERIALIZED T3_0_ ref PRIMARY,FK_T3_T2Id PRIMARY 8 const 3 Using index
2 MATERIALIZED T2_1_ eq_ref PRIMARY PRIMARY 8 semijoin_test.T3_0_.T2IdRef 1
2 MATERIALIZED T1_1_ eq_ref PRIMARY PRIMARY 8 semijoin_test.T2_1_.T1IdRef 1 Using index
T1IdRef T2Id
200001 200011
200001 200012
200001 200013
test_type
semijoin=OFF, optimizer_search_depth=3
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY T2_0_ index NULL FK_T2_T1Id 8 NULL 27 Using where; Using index
2 MATERIALIZED T3_0_ ref PRIMARY,FK_T3_T2Id PRIMARY 8 const 3 Using index
2 MATERIALIZED T2_1_ eq_ref PRIMARY PRIMARY 8 semijoin_test.T3_0_.T2IdRef 1
2 MATERIALIZED T1_1_ eq_ref PRIMARY PRIMARY 8 semijoin_test.T2_1_.T1IdRef 1 Using index
T1IdRef T2Id
200001 200011
200001 200012
200001 200013