[MDEV-287] CHEAP SQ: A query with subquery in SELECT list, EXISTS, inner joins takes hundreds times longer Created: 2012-05-21  Updated: 2014-05-12  Resolved: 2012-06-19

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.5.25

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Timour Katchaounov (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-6231 Wrong result (extra rows) via cursor ... Closed
relates to MDEV-193 LP:944706 - Query with impossible or ... Closed

 Description   

The following query

 
SELECT ( 
  SELECT MIN(b) FROM t1, t2 
  WHERE b = a AND 
  ( b = alias1.b OR 
     EXISTS ( SELECT * FROM t3 )
  )
) 
FROM t2 alias1, t1 alias2, t1 alias3;

on my machine takes 80 sec and more on the work tree vs 0.5 sec and less on the main 5.5 tree (tried revno 3413 and revno 3402).

bzr version-info

revision-id: timour@askmonty.org-20120518115201-s6byggvesqxcntkd
date: 2012-05-18 14:52:01 +0300
revno: 3404

Reproducible with the default optimizer_switch as well as with all OFF values (except for in_to_exists which is required to execute the query).
Reproducible with MyISAM, Aria, InnoDB.

EXPLAIN on the work tree (with the default optimizer_switch):

id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	PRIMARY	alias2	ALL	NULL	NULL	NULL	NULL	20	100.00	
1	PRIMARY	alias3	ALL	NULL	NULL	NULL	NULL	20	100.00	Using join buffer (flat, BNL join)
1	PRIMARY	alias1	ALL	NULL	NULL	NULL	NULL	100	100.00	Using join buffer (incremental, BNL join)
2	DEPENDENT SUBQUERY	t1	ALL	NULL	NULL	NULL	NULL	20	100.00	
2	DEPENDENT SUBQUERY	t2	ALL	NULL	NULL	NULL	NULL	100	100.00	Using where; Using join buffer (flat, BNL join)
3	SUBQUERY	t3	ALL	NULL	NULL	NULL	NULL	2	100.00	
Warnings:
Note	1276	Field or reference 'test.alias1.b' of SELECT #2 was resolved in SELECT #1
Note	1003	select <expr_cache><>((select min(`test`.`t2`.`b`) from `test`.`t1` join `test`.`t2` where (`test`.`t2`.`b` = `test`.`t1`.`a`))) AS `( 
SELECT MIN(b) FROM t1, t2 
WHERE b = a AND 
( b = alias1.b OR 
EXISTS ( SELECT * FROM t3 ) 
)
)` from `test`.`t2` `alias1` join `test`.`t1` `alias2` join `test`.`t1` `alias3`

EXPLAIN on the main tree (with the default optimizer_switch):

id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	PRIMARY	alias2	ALL	NULL	NULL	NULL	NULL	20	100.00	
1	PRIMARY	alias3	ALL	NULL	NULL	NULL	NULL	20	100.00	Using join buffer (flat, BNL join)
1	PRIMARY	alias1	ALL	NULL	NULL	NULL	NULL	100	100.00	Using join buffer (incremental, BNL join)
2	DEPENDENT SUBQUERY	t1	ALL	NULL	NULL	NULL	NULL	20	100.00	Using where
2	DEPENDENT SUBQUERY	t2	ALL	NULL	NULL	NULL	NULL	100	100.00	Using where; Using join buffer (flat, BNL join)
3	SUBQUERY	t3	ALL	NULL	NULL	NULL	NULL	2	100.00	
Warnings:
Note	1276	Field or reference 'test.alias1.b' of SELECT #2 was resolved in SELECT #1
Note	1003	select <expr_cache><`test`.`alias1`.`b`>((select min(`test`.`t2`.`b`) from `test`.`t1` join `test`.`t2` where ((`test`.`t2`.`b` = `test`.`t1`.`a`) and ((`test`.`t1`.`a` = `test`.`alias1`.`b`) or exists(select 1 from `test`.`t3`))))) AS `( 
SELECT MIN(b) FROM t1, t2 
WHERE b = a AND 
( b = alias1.b OR 
EXISTS ( SELECT * FROM t3 ) 
)
)` from `test`.`t2` `alias1` join `test`.`t1` `alias2` join `test`.`t1` `alias3`

Test case:

 
CREATE TABLE t1 (a INT);
INSERT INTO t1 VALUES
(1),(7),(4),(7),(0),(2),(9),(4),(0),(9),
(1),(3),(8),(8),(18),(84),(6),(3),(6),(6);
 
CREATE TABLE t2 (b INT);
INSERT INTO t2 VALUES
(4),(5),(2),(5),(1),(1),(2),(2),(2),(197),
(4),(5),(3),(1),(2),(7),(6),(1),(156),(8),
(7),(2),(6),(2),(1),(0),(7),(5),(7),(2),(1),
(80),(3),(8),(5),(0),(9),(9),(7),(0),(5),
(6),(9),(3),(91),(6),(7),(3),(161),(7),(7),
(6),(5),(8),(7),(2),(1),(3),(6),(6),(5),(0),
(7),(7),(6),(0),(0),(8),(0),(4),(0),(213),
(248),(1),(6),(6),(3),(140),(0),(7),(6),(6),
(8),(5),(8),(7),(3),(7),(3),(8),(0),(1),(3),
(6),(8),(1),(1),(9),(0),(6);
 
CREATE TABLE t3 (c INT);
INSERT INTO t3 VALUES (8),(3);
 
SELECT ( 
  SELECT MIN(b) FROM t1, t2 
  WHERE b = a AND 
  ( b = alias1.b OR 
     EXISTS ( SELECT * FROM t3 )
  )
) 
FROM t2 alias1, t1 alias2, t1 alias3;
 



 Comments   
Comment by Timour Katchaounov (Inactive) [ 2012-05-28 ]

Analysis:

The much slower execution in the case when constant subqueries are evaluated is a result of the subquery cache not working in this case. This is what happens in more detail. The subquery in the SELECT list contains a subquery. This subquery in turn contains a constant subquery.

  • If the constant subquery is considered expensive, then it is not evaluated during optimization. The subquery cache mechanism creates a cache for the correlated subquery, with a lookup argument 'alias1.b'. Execution is fast, because for all 40K results rows we execute the subquery only once, all other cases are lookups into the subquery cache.
  • If the constant subquery is consdered non-expensive, then it is evaluated during optimization. The WHERE clause of the outer subquery is reduced to "b = a" because the EXISTS predicate in the OR is TRUE, therefore the whole OR can be optimized away. Notice that because of this optimization, the subquery in the SELECT clause becomes non-correlated.
    • Item_cache_wrapper::init_on_demand() calls orig_item->get_cache_parameters(parameters) and determines that the list of parameters is empty because there are no correlated columns.
    • As a result expr_cache->init() returns immediately, and doesn't do any initialization. In particular it doesn't create the temp table for the cache.

The ultimate result is that once the subquery becomes non-correlated, it is not cached any more, and is executed for each result row combination (40K times in the example). I consider the above behavior a deficiency of the subquery cache. This is being investigated ATM.

Comment by Timour Katchaounov (Inactive) [ 2012-06-08 ]

This problem is fixed by the following patch waiting for review:

revno: 3406
revision-id: timour@askmonty.org-20120529211853-hww47vl7d4u4ae23
parent: timour@askmonty.org-20120524110828-r0mm8sm1vn8a095e
committer: timour@askmonty.org
branch nick: 5.5-lpb944706
timestamp: Wed 2012-05-30 00:18:53 +0300
message:
Patch for mdev-287: CHEAP SQ: A query with subquery in SELECT list, EXISTS, inner joins takes hundreds times longer

Analysis:

The fix for lp:944706 introduces early subquery optimization.
While a subquery is being optimized some of its predicates may be
removed. In the test case, the EXISTS subquery is constant, and is
evaluated to TRUE. As a result the whole OR is TRUE, and thus the
correlated condition "b = alias1.b" is optimized away. The subquery
becomes non-correlated.

The subquery cache is designed to work only for correlated subqueries.
If constant subquery optimization is disallowed, then the constant
subquery is not evaluated, the subquery remains correlated, and its
execution is cached. As a result execution is fast.

However, when the constant subquery was optimized away, it was neither
cached by the subquery cache, nor it was cached by the internal subquery
caching. The latter was due to the fact that the subquery still appeared
as correlated to the subselect_XYZ_engine::exec methods, and they
re-executed the subquery on each call to Item_subselect::exec.

Solution:

The solution is to update the correlated status of the subquery after it has
been optimized. This status consists of:

  • st_select_lex::is_correlated
  • Item_subselect::is_correlated
  • SELECT_LEX::uncacheable
  • SELECT_LEX_UNIT::uncacheable
    The status is updated by st_select_lex::update_correlated_cache(), and its
    caller st_select_lex::optimize_unflattened_subqueries. The solution relies
    on the fact that the optimizer already called
    st_select_lex::update_used_tables() for each subquery. This allows to
    efficiently update the correlated status of each subquery without walking
    the whole subquery tree.

Notice that his patch is an improvement over MySQL 5.6 and older, where
subqueries are not pre-optimized, and the above analysis is not possible.

Comment by Timour Katchaounov (Inactive) [ 2012-06-19 ]

Merged, tested, pushed.

Generated at Thu Feb 08 06:27:37 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.