[MDEV-7474] Semi-Join's DuplicateWeedout strategy skipped for some values of optimizer_search_depth Created: 2015-01-16  Updated: 2015-04-10  Resolved: 2015-03-17

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 5.5.41, 10.0.15
Fix Version/s: 5.5.43, 10.0.18

Type: Bug Priority: Major
Reporter: Geoff Montee (Inactive) Assignee: Sergei Petrunia
Resolution: Fixed Votes: 0
Labels: optimizer
Environment:

Linux


Attachments: File optimizer_search_depth_semijoin_data_setup.sql     Text File optimizer_search_depth_semijoin_output.txt     File optimizer_search_depth_semijoin_query_test.sql    

 Description   

When "semijoin" is enabled via "optimizer_switch", data can be duplicated with certain values of "optimizer_search_depth".

According to this documentation page:

https://mariadb.com/kb/en/mariadb/documentation/managing-mariadb/optimization-and-tuning/query-optimizations/optimization-strategies/duplicateweedout-strategy/

A DuplicateWeedout strategy is supposed to be implemented using temporary tables for semijoin queries. For some values of "optimizer_search_depth", this duplicate weedout step may not occur.

Attached is the following:

optimizer_search_depth_semijoin_data_setup.sql - A script that sets up a database and a few tables.

optimizer_search_depth_semijoin_query_test.sql - A script that queries the tables to demonstrate the problem.

optimizer_search_depth_semijoin_output.txt - Example execution of the scripts and their output.



 Comments   
Comment by Elena Stepanova [ 2015-01-21 ]

Thanks for the report and the test case.
Reproducible as described.

test_type
semijoin=ON, optimizer_search_depth=1
id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	PRIMARY	T3_0_	ref	PRIMARY,FK_T3_T2Id	PRIMARY	8	const	3	Using index
1	PRIMARY	T2_1_	eq_ref	PRIMARY,FK_T2_T1Id	PRIMARY	8	semijoin_test.T3_0_.T2IdRef	1	
1	PRIMARY	T1_1_	eq_ref	PRIMARY	PRIMARY	8	semijoin_test.T2_1_.T1IdRef	1	Using index
1	PRIMARY	T2_0_	ref	FK_T2_T1Id	FK_T2_T1Id	8	semijoin_test.T2_1_.T1IdRef	1	Using index
T1IdRef	T2Id
200001	200011
200001	200012
200001	200013
200001	200011
200001	200012
200001	200013
200001	200011
200001	200012
200001	200013

test_type
semijoin=ON, optimizer_search_depth=3
id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	PRIMARY	T3_0_	ref	PRIMARY,FK_T3_T2Id	PRIMARY	8	const	3	Using index; Start temporary
1	PRIMARY	T2_1_	eq_ref	PRIMARY,FK_T2_T1Id	PRIMARY	8	semijoin_test.T3_0_.T2IdRef	1	
1	PRIMARY	T1_1_	eq_ref	PRIMARY	PRIMARY	8	semijoin_test.T2_1_.T1IdRef	1	Using index
1	PRIMARY	T2_0_	ref	FK_T2_T1Id	FK_T2_T1Id	8	semijoin_test.T2_1_.T1IdRef	1	Using index; End temporary
T1IdRef	T2Id
200001	200011
200001	200012
200001	200013

test_type
semijoin=OFF, optimizer_search_depth=1
id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	PRIMARY	T2_0_	index	NULL	FK_T2_T1Id	8	NULL	27	Using where; Using index
2	MATERIALIZED	T3_0_	ref	PRIMARY,FK_T3_T2Id	PRIMARY	8	const	3	Using index
2	MATERIALIZED	T2_1_	eq_ref	PRIMARY	PRIMARY	8	semijoin_test.T3_0_.T2IdRef	1	
2	MATERIALIZED	T1_1_	eq_ref	PRIMARY	PRIMARY	8	semijoin_test.T2_1_.T1IdRef	1	Using index
T1IdRef	T2Id
200001	200011
200001	200012
200001	200013

test_type
semijoin=OFF, optimizer_search_depth=3
id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	PRIMARY	T2_0_	index	NULL	FK_T2_T1Id	8	NULL	27	Using where; Using index
2	MATERIALIZED	T3_0_	ref	PRIMARY,FK_T3_T2Id	PRIMARY	8	const	3	Using index
2	MATERIALIZED	T2_1_	eq_ref	PRIMARY	PRIMARY	8	semijoin_test.T3_0_.T2IdRef	1	
2	MATERIALIZED	T1_1_	eq_ref	PRIMARY	PRIMARY	8	semijoin_test.T2_1_.T1IdRef	1	Using index
T1IdRef	T2Id
200001	200011
200001	200012
200001	200013

Comment by Sergei Petrunia [ 2015-03-12 ]

Query structure

T2_0_
LJ T1_0_
SJ
  T3_0_ 
  T2_1_
  LJ T1_1_

Query structure according to EXPLAIN EXTENDED:

T1_1_ semi join (T3_0_ join T2_1_) join T2_0_ 

Debugging query optimization...

  • T1_0_ is removed by Table Elimination (this seems to be ok)
  • T1_1_ is pulled out by table pullout (is this ok? an inner table of an outer join is pulled out?)
Comment by Sergei Petrunia [ 2015-03-12 ]

Removed all outer joins from the example query and it still fails. this returns 9 rows with semijoin=on and 3 rows with semijoin=off:

EXPLAIN 
SELECT SQL_NO_CACHE 
T2_0_.T1IdRef,
T2_0_.T2Id
FROM
        T2 T2_0_ 
WHERE
        T2_0_.T1IdRef IN (
                SELECT
                        T1_1_.T1Id 
                FROM
                        T3 T3_0_ 
                INNER JOIN
                        T2 T2_1_ 
                                ON T3_0_.T2IdRef=T2_1_.T2Id 
                INNER JOIN
                        T1 T1_1_ 
                                ON T2_1_.T1IdRef=T1_1_.T1Id            
                WHERE
                        T3_0_.T3IdRef= 1
);

Comment by Sergei Petrunia [ 2015-03-13 ]

Examining things in debugger stopped at fix_semijoin_strategies_for_picked_join_order(), I see that
the join order is

  $40 = 0x7fff9c0811a0 "T3_0_" (inner)
  $41 = 0x7fff9c081f60 "T2_1_" (inner)
  $42 = 0x7fff9c02cd00 "T1_1_" (pulled out)
  $43 = 0x7fff9c07e470 "T2_0_" (outer)

but

join_tab->position[i].sj_strategy= SJ_OPT_NONE

for any i. That is, join optimization picked an invalid plan.

Join optimization considers join prefixes as follows:

t1_1_
t2_0_
t3_0_  (best)
t2_1_
. t2_0_
. t1_1_
. t2_1_  (best)
.   .  t1_1_  (best)
.   .  t2_0_
.   .    .  t2_0_  (best)

Debugging at table=t2_0_, idx=3 I see that we consider DuplicateElimination but choose not to use it, because its cost is higher than the cost of current plan, AND also join->cur_dups_producing_tables=0 which essentially means that there is another strategy that removed the duplicates. However, looking at join->positions[i].sj_strategy, I see nothing.

Comment by Sergei Petrunia [ 2015-03-13 ]

Debugging how join->cur_dups_producing_tables changes during join optimization.

  • initial value is 0.
  • It changes to 6 after we've put t3_0_ into join prefix.
  • and then changes back to 0 when we remove t3_0_ from the join prefix
  • It changes to 6 again when we put t2_1_ into join prefix
  • and changes back to 0 when we remove it..

then in greedy_search, we put t3_0_ back into the join prefix:

    /* select the first table in the optimal extension as most promising */
    best_pos= join->best_positions[idx];
    best_table= best_pos.table;
    /*
      Each subsequent loop of 'best_extension_by_limited_search' uses
      'join->positions' for cost estimates, therefore we have to update its
      value.
    */
 
    join->positions[idx]= best_pos;

but join->cur_dups_producing_tables remains 0.

Comment by Sergei Petrunia [ 2015-03-13 ]

Pushed a patch into https://github.com/MariaDB/server/tree/bb-10.0-mdev7474 . I'll need testing from elenst.

Comment by Elena Stepanova [ 2015-03-17 ]

Ran a set of tests, got several hundred of mismatches between the patched version and baseline 10.0 (low optimizer_search_depth values). All differences look legit, related to the bugfix, so the good news is that tests hit the right spot; but due to the sheer amount of mismatches it's unrealistic to check each one separately.
Next step – I'll run a similar test, but between the patched version with low optimizer_search_depth and baseline 10.0 with normal optimizer_search_depth. The expectation is that there should be no mismatches.

On a separate note, I wonder if the commit comment is correct:

JOIN::cur_dups_producing_tables was not maintained correctly in
the cases of greedy optimization (search_depth > n_tables).

Is it not the other way round, search-depth < n_tables?

Comment by Elena Stepanova [ 2015-03-17 ]

The second run went okay. I will also try valgrind tests, but please go ahead and push (we need to decide first whether it's for 5.5 or 10.0).

Comment by Sergei Petrunia [ 2015-03-17 ]

Considering the nature of the fix, it should be 5.5

Comment by Sergei Petrunia [ 2015-03-17 ]

Pushed into 5.5 tree

Generated at Thu Feb 08 07:19:52 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.