Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-287

CHEAP SQ: A query with subquery in SELECT list, EXISTS, inner joins takes hundreds times longer

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • None
    • 5.5.25
    • None
    • None

    Description

      The following query

       
      SELECT ( 
        SELECT MIN(b) FROM t1, t2 
        WHERE b = a AND 
        ( b = alias1.b OR 
           EXISTS ( SELECT * FROM t3 )
        )
      ) 
      FROM t2 alias1, t1 alias2, t1 alias3;

      on my machine takes 80 sec and more on the work tree vs 0.5 sec and less on the main 5.5 tree (tried revno 3413 and revno 3402).

      bzr version-info

      revision-id: timour@askmonty.org-20120518115201-s6byggvesqxcntkd
      date: 2012-05-18 14:52:01 +0300
      revno: 3404

      Reproducible with the default optimizer_switch as well as with all OFF values (except for in_to_exists which is required to execute the query).
      Reproducible with MyISAM, Aria, InnoDB.

      EXPLAIN on the work tree (with the default optimizer_switch):

      id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
      1	PRIMARY	alias2	ALL	NULL	NULL	NULL	NULL	20	100.00	
      1	PRIMARY	alias3	ALL	NULL	NULL	NULL	NULL	20	100.00	Using join buffer (flat, BNL join)
      1	PRIMARY	alias1	ALL	NULL	NULL	NULL	NULL	100	100.00	Using join buffer (incremental, BNL join)
      2	DEPENDENT SUBQUERY	t1	ALL	NULL	NULL	NULL	NULL	20	100.00	
      2	DEPENDENT SUBQUERY	t2	ALL	NULL	NULL	NULL	NULL	100	100.00	Using where; Using join buffer (flat, BNL join)
      3	SUBQUERY	t3	ALL	NULL	NULL	NULL	NULL	2	100.00	
      Warnings:
      Note	1276	Field or reference 'test.alias1.b' of SELECT #2 was resolved in SELECT #1
      Note	1003	select <expr_cache><>((select min(`test`.`t2`.`b`) from `test`.`t1` join `test`.`t2` where (`test`.`t2`.`b` = `test`.`t1`.`a`))) AS `( 
      SELECT MIN(b) FROM t1, t2 
      WHERE b = a AND 
      ( b = alias1.b OR 
      EXISTS ( SELECT * FROM t3 ) 
      )
      )` from `test`.`t2` `alias1` join `test`.`t1` `alias2` join `test`.`t1` `alias3`

      EXPLAIN on the main tree (with the default optimizer_switch):

      id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
      1	PRIMARY	alias2	ALL	NULL	NULL	NULL	NULL	20	100.00	
      1	PRIMARY	alias3	ALL	NULL	NULL	NULL	NULL	20	100.00	Using join buffer (flat, BNL join)
      1	PRIMARY	alias1	ALL	NULL	NULL	NULL	NULL	100	100.00	Using join buffer (incremental, BNL join)
      2	DEPENDENT SUBQUERY	t1	ALL	NULL	NULL	NULL	NULL	20	100.00	Using where
      2	DEPENDENT SUBQUERY	t2	ALL	NULL	NULL	NULL	NULL	100	100.00	Using where; Using join buffer (flat, BNL join)
      3	SUBQUERY	t3	ALL	NULL	NULL	NULL	NULL	2	100.00	
      Warnings:
      Note	1276	Field or reference 'test.alias1.b' of SELECT #2 was resolved in SELECT #1
      Note	1003	select <expr_cache><`test`.`alias1`.`b`>((select min(`test`.`t2`.`b`) from `test`.`t1` join `test`.`t2` where ((`test`.`t2`.`b` = `test`.`t1`.`a`) and ((`test`.`t1`.`a` = `test`.`alias1`.`b`) or exists(select 1 from `test`.`t3`))))) AS `( 
      SELECT MIN(b) FROM t1, t2 
      WHERE b = a AND 
      ( b = alias1.b OR 
      EXISTS ( SELECT * FROM t3 ) 
      )
      )` from `test`.`t2` `alias1` join `test`.`t1` `alias2` join `test`.`t1` `alias3`

      Test case:

       
      CREATE TABLE t1 (a INT);
      INSERT INTO t1 VALUES
      (1),(7),(4),(7),(0),(2),(9),(4),(0),(9),
      (1),(3),(8),(8),(18),(84),(6),(3),(6),(6);
       
      CREATE TABLE t2 (b INT);
      INSERT INTO t2 VALUES
      (4),(5),(2),(5),(1),(1),(2),(2),(2),(197),
      (4),(5),(3),(1),(2),(7),(6),(1),(156),(8),
      (7),(2),(6),(2),(1),(0),(7),(5),(7),(2),(1),
      (80),(3),(8),(5),(0),(9),(9),(7),(0),(5),
      (6),(9),(3),(91),(6),(7),(3),(161),(7),(7),
      (6),(5),(8),(7),(2),(1),(3),(6),(6),(5),(0),
      (7),(7),(6),(0),(0),(8),(0),(4),(0),(213),
      (248),(1),(6),(6),(3),(140),(0),(7),(6),(6),
      (8),(5),(8),(7),(3),(7),(3),(8),(0),(1),(3),
      (6),(8),(1),(1),(9),(0),(6);
       
      CREATE TABLE t3 (c INT);
      INSERT INTO t3 VALUES (8),(3);
       
      SELECT ( 
        SELECT MIN(b) FROM t1, t2 
        WHERE b = a AND 
        ( b = alias1.b OR 
           EXISTS ( SELECT * FROM t3 )
        )
      ) 
      FROM t2 alias1, t1 alias2, t1 alias3;
       

      Attachments

        Issue Links

          Activity

            Analysis:

            The much slower execution in the case when constant subqueries are evaluated is a result of the subquery cache not working in this case. This is what happens in more detail. The subquery in the SELECT list contains a subquery. This subquery in turn contains a constant subquery.

            • If the constant subquery is considered expensive, then it is not evaluated during optimization. The subquery cache mechanism creates a cache for the correlated subquery, with a lookup argument 'alias1.b'. Execution is fast, because for all 40K results rows we execute the subquery only once, all other cases are lookups into the subquery cache.
            • If the constant subquery is consdered non-expensive, then it is evaluated during optimization. The WHERE clause of the outer subquery is reduced to "b = a" because the EXISTS predicate in the OR is TRUE, therefore the whole OR can be optimized away. Notice that because of this optimization, the subquery in the SELECT clause becomes non-correlated.
              • Item_cache_wrapper::init_on_demand() calls orig_item->get_cache_parameters(parameters) and determines that the list of parameters is empty because there are no correlated columns.
              • As a result expr_cache->init() returns immediately, and doesn't do any initialization. In particular it doesn't create the temp table for the cache.

            The ultimate result is that once the subquery becomes non-correlated, it is not cached any more, and is executed for each result row combination (40K times in the example). I consider the above behavior a deficiency of the subquery cache. This is being investigated ATM.

            timour Timour Katchaounov (Inactive) added a comment - Analysis: The much slower execution in the case when constant subqueries are evaluated is a result of the subquery cache not working in this case. This is what happens in more detail. The subquery in the SELECT list contains a subquery. This subquery in turn contains a constant subquery. If the constant subquery is considered expensive, then it is not evaluated during optimization. The subquery cache mechanism creates a cache for the correlated subquery, with a lookup argument 'alias1.b'. Execution is fast, because for all 40K results rows we execute the subquery only once, all other cases are lookups into the subquery cache. If the constant subquery is consdered non-expensive, then it is evaluated during optimization. The WHERE clause of the outer subquery is reduced to "b = a" because the EXISTS predicate in the OR is TRUE, therefore the whole OR can be optimized away. Notice that because of this optimization, the subquery in the SELECT clause becomes non-correlated. Item_cache_wrapper::init_on_demand() calls orig_item->get_cache_parameters(parameters) and determines that the list of parameters is empty because there are no correlated columns. As a result expr_cache->init() returns immediately, and doesn't do any initialization. In particular it doesn't create the temp table for the cache. The ultimate result is that once the subquery becomes non-correlated, it is not cached any more, and is executed for each result row combination (40K times in the example). I consider the above behavior a deficiency of the subquery cache. This is being investigated ATM.

            This problem is fixed by the following patch waiting for review:

            revno: 3406
            revision-id: timour@askmonty.org-20120529211853-hww47vl7d4u4ae23
            parent: timour@askmonty.org-20120524110828-r0mm8sm1vn8a095e
            committer: timour@askmonty.org
            branch nick: 5.5-lpb944706
            timestamp: Wed 2012-05-30 00:18:53 +0300
            message:
            Patch for mdev-287: CHEAP SQ: A query with subquery in SELECT list, EXISTS, inner joins takes hundreds times longer

            Analysis:

            The fix for lp:944706 introduces early subquery optimization.
            While a subquery is being optimized some of its predicates may be
            removed. In the test case, the EXISTS subquery is constant, and is
            evaluated to TRUE. As a result the whole OR is TRUE, and thus the
            correlated condition "b = alias1.b" is optimized away. The subquery
            becomes non-correlated.

            The subquery cache is designed to work only for correlated subqueries.
            If constant subquery optimization is disallowed, then the constant
            subquery is not evaluated, the subquery remains correlated, and its
            execution is cached. As a result execution is fast.

            However, when the constant subquery was optimized away, it was neither
            cached by the subquery cache, nor it was cached by the internal subquery
            caching. The latter was due to the fact that the subquery still appeared
            as correlated to the subselect_XYZ_engine::exec methods, and they
            re-executed the subquery on each call to Item_subselect::exec.

            Solution:

            The solution is to update the correlated status of the subquery after it has
            been optimized. This status consists of:

            • st_select_lex::is_correlated
            • Item_subselect::is_correlated
            • SELECT_LEX::uncacheable
            • SELECT_LEX_UNIT::uncacheable
              The status is updated by st_select_lex::update_correlated_cache(), and its
              caller st_select_lex::optimize_unflattened_subqueries. The solution relies
              on the fact that the optimizer already called
              st_select_lex::update_used_tables() for each subquery. This allows to
              efficiently update the correlated status of each subquery without walking
              the whole subquery tree.

            Notice that his patch is an improvement over MySQL 5.6 and older, where
            subqueries are not pre-optimized, and the above analysis is not possible.

            timour Timour Katchaounov (Inactive) added a comment - This problem is fixed by the following patch waiting for review: revno: 3406 revision-id: timour@askmonty.org-20120529211853-hww47vl7d4u4ae23 parent: timour@askmonty.org-20120524110828-r0mm8sm1vn8a095e committer: timour@askmonty.org branch nick: 5.5-lpb944706 timestamp: Wed 2012-05-30 00:18:53 +0300 message: Patch for mdev-287: CHEAP SQ: A query with subquery in SELECT list, EXISTS, inner joins takes hundreds times longer Analysis: The fix for lp:944706 introduces early subquery optimization. While a subquery is being optimized some of its predicates may be removed. In the test case, the EXISTS subquery is constant, and is evaluated to TRUE. As a result the whole OR is TRUE, and thus the correlated condition "b = alias1.b" is optimized away. The subquery becomes non-correlated. The subquery cache is designed to work only for correlated subqueries. If constant subquery optimization is disallowed, then the constant subquery is not evaluated, the subquery remains correlated, and its execution is cached. As a result execution is fast. However, when the constant subquery was optimized away, it was neither cached by the subquery cache, nor it was cached by the internal subquery caching. The latter was due to the fact that the subquery still appeared as correlated to the subselect_XYZ_engine::exec methods, and they re-executed the subquery on each call to Item_subselect::exec. Solution: The solution is to update the correlated status of the subquery after it has been optimized. This status consists of: st_select_lex::is_correlated Item_subselect::is_correlated SELECT_LEX::uncacheable SELECT_LEX_UNIT::uncacheable The status is updated by st_select_lex::update_correlated_cache(), and its caller st_select_lex::optimize_unflattened_subqueries. The solution relies on the fact that the optimizer already called st_select_lex::update_used_tables() for each subquery. This allows to efficiently update the correlated status of each subquery without walking the whole subquery tree. Notice that his patch is an improvement over MySQL 5.6 and older, where subqueries are not pre-optimized, and the above analysis is not possible.

            Merged, tested, pushed.

            timour Timour Katchaounov (Inactive) added a comment - Merged, tested, pushed.

            People

              timour Timour Katchaounov (Inactive)
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.