Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-7474

Semi-Join's DuplicateWeedout strategy skipped for some values of optimizer_search_depth

Details

    Description

      When "semijoin" is enabled via "optimizer_switch", data can be duplicated with certain values of "optimizer_search_depth".

      According to this documentation page:

      https://mariadb.com/kb/en/mariadb/documentation/managing-mariadb/optimization-and-tuning/query-optimizations/optimization-strategies/duplicateweedout-strategy/

      A DuplicateWeedout strategy is supposed to be implemented using temporary tables for semijoin queries. For some values of "optimizer_search_depth", this duplicate weedout step may not occur.

      Attached is the following:

      optimizer_search_depth_semijoin_data_setup.sql - A script that sets up a database and a few tables.

      optimizer_search_depth_semijoin_query_test.sql - A script that queries the tables to demonstrate the problem.

      optimizer_search_depth_semijoin_output.txt - Example execution of the scripts and their output.

      Attachments

        Activity

          GeoffMontee Geoff Montee (Inactive) created issue -
          GeoffMontee Geoff Montee (Inactive) made changes -
          Field Original Value New Value
          Attachment optimizer_search_depth_semijoin_data_setup.sql [ 36714 ]
          GeoffMontee Geoff Montee (Inactive) made changes -
          Attachment optimizer_search_depth_semijoin_output.txt [ 36713 ]
          GeoffMontee Geoff Montee (Inactive) made changes -
          Attachment optimizer_search_depth_semijoin_query_test.sql [ 36712 ]

          Thanks for the report and the test case.
          Reproducible as described.

          test_type
          semijoin=ON, optimizer_search_depth=1
          id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
          1	PRIMARY	T3_0_	ref	PRIMARY,FK_T3_T2Id	PRIMARY	8	const	3	Using index
          1	PRIMARY	T2_1_	eq_ref	PRIMARY,FK_T2_T1Id	PRIMARY	8	semijoin_test.T3_0_.T2IdRef	1	
          1	PRIMARY	T1_1_	eq_ref	PRIMARY	PRIMARY	8	semijoin_test.T2_1_.T1IdRef	1	Using index
          1	PRIMARY	T2_0_	ref	FK_T2_T1Id	FK_T2_T1Id	8	semijoin_test.T2_1_.T1IdRef	1	Using index
          T1IdRef	T2Id
          200001	200011
          200001	200012
          200001	200013
          200001	200011
          200001	200012
          200001	200013
          200001	200011
          200001	200012
          200001	200013

          test_type
          semijoin=ON, optimizer_search_depth=3
          id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
          1	PRIMARY	T3_0_	ref	PRIMARY,FK_T3_T2Id	PRIMARY	8	const	3	Using index; Start temporary
          1	PRIMARY	T2_1_	eq_ref	PRIMARY,FK_T2_T1Id	PRIMARY	8	semijoin_test.T3_0_.T2IdRef	1	
          1	PRIMARY	T1_1_	eq_ref	PRIMARY	PRIMARY	8	semijoin_test.T2_1_.T1IdRef	1	Using index
          1	PRIMARY	T2_0_	ref	FK_T2_T1Id	FK_T2_T1Id	8	semijoin_test.T2_1_.T1IdRef	1	Using index; End temporary
          T1IdRef	T2Id
          200001	200011
          200001	200012
          200001	200013

          test_type
          semijoin=OFF, optimizer_search_depth=1
          id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
          1	PRIMARY	T2_0_	index	NULL	FK_T2_T1Id	8	NULL	27	Using where; Using index
          2	MATERIALIZED	T3_0_	ref	PRIMARY,FK_T3_T2Id	PRIMARY	8	const	3	Using index
          2	MATERIALIZED	T2_1_	eq_ref	PRIMARY	PRIMARY	8	semijoin_test.T3_0_.T2IdRef	1	
          2	MATERIALIZED	T1_1_	eq_ref	PRIMARY	PRIMARY	8	semijoin_test.T2_1_.T1IdRef	1	Using index
          T1IdRef	T2Id
          200001	200011
          200001	200012
          200001	200013

          test_type
          semijoin=OFF, optimizer_search_depth=3
          id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
          1	PRIMARY	T2_0_	index	NULL	FK_T2_T1Id	8	NULL	27	Using where; Using index
          2	MATERIALIZED	T3_0_	ref	PRIMARY,FK_T3_T2Id	PRIMARY	8	const	3	Using index
          2	MATERIALIZED	T2_1_	eq_ref	PRIMARY	PRIMARY	8	semijoin_test.T3_0_.T2IdRef	1	
          2	MATERIALIZED	T1_1_	eq_ref	PRIMARY	PRIMARY	8	semijoin_test.T2_1_.T1IdRef	1	Using index
          T1IdRef	T2Id
          200001	200011
          200001	200012
          200001	200013

          elenst Elena Stepanova added a comment - Thanks for the report and the test case. Reproducible as described. test_type semijoin=ON, optimizer_search_depth=1 id select_type table type possible_keys key key_len ref rows Extra 1 PRIMARY T3_0_ ref PRIMARY,FK_T3_T2Id PRIMARY 8 const 3 Using index 1 PRIMARY T2_1_ eq_ref PRIMARY,FK_T2_T1Id PRIMARY 8 semijoin_test.T3_0_.T2IdRef 1 1 PRIMARY T1_1_ eq_ref PRIMARY PRIMARY 8 semijoin_test.T2_1_.T1IdRef 1 Using index 1 PRIMARY T2_0_ ref FK_T2_T1Id FK_T2_T1Id 8 semijoin_test.T2_1_.T1IdRef 1 Using index T1IdRef T2Id 200001 200011 200001 200012 200001 200013 200001 200011 200001 200012 200001 200013 200001 200011 200001 200012 200001 200013 test_type semijoin=ON, optimizer_search_depth=3 id select_type table type possible_keys key key_len ref rows Extra 1 PRIMARY T3_0_ ref PRIMARY,FK_T3_T2Id PRIMARY 8 const 3 Using index; Start temporary 1 PRIMARY T2_1_ eq_ref PRIMARY,FK_T2_T1Id PRIMARY 8 semijoin_test.T3_0_.T2IdRef 1 1 PRIMARY T1_1_ eq_ref PRIMARY PRIMARY 8 semijoin_test.T2_1_.T1IdRef 1 Using index 1 PRIMARY T2_0_ ref FK_T2_T1Id FK_T2_T1Id 8 semijoin_test.T2_1_.T1IdRef 1 Using index; End temporary T1IdRef T2Id 200001 200011 200001 200012 200001 200013 test_type semijoin=OFF, optimizer_search_depth=1 id select_type table type possible_keys key key_len ref rows Extra 1 PRIMARY T2_0_ index NULL FK_T2_T1Id 8 NULL 27 Using where; Using index 2 MATERIALIZED T3_0_ ref PRIMARY,FK_T3_T2Id PRIMARY 8 const 3 Using index 2 MATERIALIZED T2_1_ eq_ref PRIMARY PRIMARY 8 semijoin_test.T3_0_.T2IdRef 1 2 MATERIALIZED T1_1_ eq_ref PRIMARY PRIMARY 8 semijoin_test.T2_1_.T1IdRef 1 Using index T1IdRef T2Id 200001 200011 200001 200012 200001 200013 test_type semijoin=OFF, optimizer_search_depth=3 id select_type table type possible_keys key key_len ref rows Extra 1 PRIMARY T2_0_ index NULL FK_T2_T1Id 8 NULL 27 Using where; Using index 2 MATERIALIZED T3_0_ ref PRIMARY,FK_T3_T2Id PRIMARY 8 const 3 Using index 2 MATERIALIZED T2_1_ eq_ref PRIMARY PRIMARY 8 semijoin_test.T3_0_.T2IdRef 1 2 MATERIALIZED T1_1_ eq_ref PRIMARY PRIMARY 8 semijoin_test.T2_1_.T1IdRef 1 Using index T1IdRef T2Id 200001 200011 200001 200012 200001 200013
          elenst Elena Stepanova made changes -
          Fix Version/s 10.0 [ 16000 ]
          Fix Version/s 5.5 [ 15800 ]
          Assignee Sergei Petrunia [ psergey ]
          psergei Sergei Petrunia added a comment - - edited

          Query structure

          T2_0_
          LJ T1_0_
          SJ
            T3_0_ 
            T2_1_
            LJ T1_1_

          Query structure according to EXPLAIN EXTENDED:

          T1_1_ semi join (T3_0_ join T2_1_) join T2_0_ 

          Debugging query optimization...

          • T1_0_ is removed by Table Elimination (this seems to be ok)
          • T1_1_ is pulled out by table pullout (is this ok? an inner table of an outer join is pulled out?)
          psergei Sergei Petrunia added a comment - - edited Query structure T2_0_ LJ T1_0_ SJ T3_0_ T2_1_ LJ T1_1_ Query structure according to EXPLAIN EXTENDED: T1_1_ semi join (T3_0_ join T2_1_) join T2_0_ Debugging query optimization... T1_0_ is removed by Table Elimination (this seems to be ok) T1_1_ is pulled out by table pullout (is this ok? an inner table of an outer join is pulled out?)

          Removed all outer joins from the example query and it still fails. this returns 9 rows with semijoin=on and 3 rows with semijoin=off:

          EXPLAIN 
          SELECT SQL_NO_CACHE 
          T2_0_.T1IdRef,
          T2_0_.T2Id
          FROM
                  T2 T2_0_ 
          WHERE
                  T2_0_.T1IdRef IN (
                          SELECT
                                  T1_1_.T1Id 
                          FROM
                                  T3 T3_0_ 
                          INNER JOIN
                                  T2 T2_1_ 
                                          ON T3_0_.T2IdRef=T2_1_.T2Id 
                          INNER JOIN
                                  T1 T1_1_ 
                                          ON T2_1_.T1IdRef=T1_1_.T1Id            
                          WHERE
                                  T3_0_.T3IdRef= 1
          );

          psergei Sergei Petrunia added a comment - Removed all outer joins from the example query and it still fails. this returns 9 rows with semijoin=on and 3 rows with semijoin=off: EXPLAIN SELECT SQL_NO_CACHE T2_0_.T1IdRef, T2_0_.T2Id FROM T2 T2_0_ WHERE T2_0_.T1IdRef IN ( SELECT T1_1_.T1Id FROM T3 T3_0_ INNER JOIN T2 T2_1_ ON T3_0_.T2IdRef=T2_1_.T2Id INNER JOIN T1 T1_1_ ON T2_1_.T1IdRef=T1_1_.T1Id WHERE T3_0_.T3IdRef= 1 );
          psergei Sergei Petrunia added a comment - - edited

          Examining things in debugger stopped at fix_semijoin_strategies_for_picked_join_order(), I see that
          the join order is

            $40 = 0x7fff9c0811a0 "T3_0_" (inner)
            $41 = 0x7fff9c081f60 "T2_1_" (inner)
            $42 = 0x7fff9c02cd00 "T1_1_" (pulled out)
            $43 = 0x7fff9c07e470 "T2_0_" (outer)

          but

          join_tab->position[i].sj_strategy= SJ_OPT_NONE

          for any i. That is, join optimization picked an invalid plan.

          Join optimization considers join prefixes as follows:

          t1_1_
          t2_0_
          t3_0_  (best)
          t2_1_
          . t2_0_
          . t1_1_
          . t2_1_  (best)
          .   .  t1_1_  (best)
          .   .  t2_0_
          .   .    .  t2_0_  (best)

          Debugging at table=t2_0_, idx=3 I see that we consider DuplicateElimination but choose not to use it, because its cost is higher than the cost of current plan, AND also join->cur_dups_producing_tables=0 which essentially means that there is another strategy that removed the duplicates. However, looking at join->positions[i].sj_strategy, I see nothing.

          psergei Sergei Petrunia added a comment - - edited Examining things in debugger stopped at fix_semijoin_strategies_for_picked_join_order(), I see that the join order is $40 = 0x7fff9c0811a0 "T3_0_" (inner) $41 = 0x7fff9c081f60 "T2_1_" (inner) $42 = 0x7fff9c02cd00 "T1_1_" (pulled out) $43 = 0x7fff9c07e470 "T2_0_" (outer) but join_tab->position[i].sj_strategy= SJ_OPT_NONE for any i. That is, join optimization picked an invalid plan. Join optimization considers join prefixes as follows: t1_1_ t2_0_ t3_0_ (best) t2_1_ . t2_0_ . t1_1_ . t2_1_ (best) . . t1_1_ (best) . . t2_0_ . . . t2_0_ (best) Debugging at table=t2_0_, idx=3 I see that we consider DuplicateElimination but choose not to use it, because its cost is higher than the cost of current plan, AND also join->cur_dups_producing_tables=0 which essentially means that there is another strategy that removed the duplicates. However, looking at join->positions [i] .sj_strategy, I see nothing.

          Debugging how join->cur_dups_producing_tables changes during join optimization.

          • initial value is 0.
          • It changes to 6 after we've put t3_0_ into join prefix.
          • and then changes back to 0 when we remove t3_0_ from the join prefix
          • It changes to 6 again when we put t2_1_ into join prefix
          • and changes back to 0 when we remove it..

          then in greedy_search, we put t3_0_ back into the join prefix:

              /* select the first table in the optimal extension as most promising */
              best_pos= join->best_positions[idx];
              best_table= best_pos.table;
              /*
                Each subsequent loop of 'best_extension_by_limited_search' uses
                'join->positions' for cost estimates, therefore we have to update its
                value.
              */
           
              join->positions[idx]= best_pos;

          but join->cur_dups_producing_tables remains 0.

          psergei Sergei Petrunia added a comment - Debugging how join->cur_dups_producing_tables changes during join optimization. initial value is 0. It changes to 6 after we've put t3_0_ into join prefix. and then changes back to 0 when we remove t3_0_ from the join prefix It changes to 6 again when we put t2_1_ into join prefix and changes back to 0 when we remove it.. then in greedy_search, we put t3_0_ back into the join prefix: /* select the first table in the optimal extension as most promising */ best_pos= join->best_positions[idx]; best_table= best_pos.table; /* Each subsequent loop of 'best_extension_by_limited_search' uses 'join->positions' for cost estimates, therefore we have to update its value. */   join->positions[idx]= best_pos; but join->cur_dups_producing_tables remains 0.

          Pushed a patch into https://github.com/MariaDB/server/tree/bb-10.0-mdev7474 . I'll need testing from elenst.

          psergei Sergei Petrunia added a comment - Pushed a patch into https://github.com/MariaDB/server/tree/bb-10.0-mdev7474 . I'll need testing from elenst .
          elenst Elena Stepanova made changes -
          Assignee Sergei Petrunia [ psergey ] Elena Stepanova [ elenst ]
          elenst Elena Stepanova made changes -
          Status Open [ 1 ] In Progress [ 3 ]

          Ran a set of tests, got several hundred of mismatches between the patched version and baseline 10.0 (low optimizer_search_depth values). All differences look legit, related to the bugfix, so the good news is that tests hit the right spot; but due to the sheer amount of mismatches it's unrealistic to check each one separately.
          Next step – I'll run a similar test, but between the patched version with low optimizer_search_depth and baseline 10.0 with normal optimizer_search_depth. The expectation is that there should be no mismatches.

          On a separate note, I wonder if the commit comment is correct:

          JOIN::cur_dups_producing_tables was not maintained correctly in
          the cases of greedy optimization (search_depth > n_tables).

          Is it not the other way round, search-depth < n_tables?

          elenst Elena Stepanova added a comment - Ran a set of tests, got several hundred of mismatches between the patched version and baseline 10.0 (low optimizer_search_depth values). All differences look legit, related to the bugfix, so the good news is that tests hit the right spot; but due to the sheer amount of mismatches it's unrealistic to check each one separately. Next step – I'll run a similar test, but between the patched version with low optimizer_search_depth and baseline 10.0 with normal optimizer_search_depth. The expectation is that there should be no mismatches. On a separate note, I wonder if the commit comment is correct: JOIN::cur_dups_producing_tables was not maintained correctly in the cases of greedy optimization (search_depth > n_tables). Is it not the other way round, search-depth < n_tables?
          elenst Elena Stepanova made changes -
          Status In Progress [ 3 ] Stalled [ 10000 ]
          elenst Elena Stepanova made changes -
          Assignee Elena Stepanova [ elenst ] Sergei Petrunia [ psergey ]

          The second run went okay. I will also try valgrind tests, but please go ahead and push (we need to decide first whether it's for 5.5 or 10.0).

          elenst Elena Stepanova added a comment - The second run went okay. I will also try valgrind tests, but please go ahead and push (we need to decide first whether it's for 5.5 or 10.0).

          Considering the nature of the fix, it should be 5.5

          psergei Sergei Petrunia added a comment - Considering the nature of the fix, it should be 5.5

          Pushed into 5.5 tree

          psergei Sergei Petrunia added a comment - Pushed into 5.5 tree
          psergei Sergei Petrunia made changes -
          Fix Version/s 5.5.43 [ 18601 ]
          Fix Version/s 5.5 [ 15800 ]
          Fix Version/s 10.0 [ 16000 ]
          Resolution Fixed [ 1 ]
          Status Stalled [ 10000 ] Closed [ 6 ]
          psergei Sergei Petrunia made changes -
          Fix Version/s 10.0.18 [ 18702 ]
          ratzpo Rasmus Johansson (Inactive) made changes -
          Workflow MariaDB v2 [ 59238 ] MariaDB v3 [ 66317 ]
          serg Sergei Golubchik made changes -
          Workflow MariaDB v3 [ 66317 ] MariaDB v4 [ 148723 ]

          People

            psergei Sergei Petrunia
            GeoffMontee Geoff Montee (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.