[MDEV-287] CHEAP SQ: A query with subquery in SELECT list, EXISTS, inner joins takes hundreds times longer - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 5.5.25
Component/s: None
Labels:
None

Description

The following query

SELECT (

  SELECT MIN(b) FROM t1, t2

  WHERE b = a AND

  ( b = alias1.b OR

     EXISTS ( SELECT * FROM t3 )

FROM t2 alias1, t1 alias2, t1 alias3;

on my machine takes 80 sec and more on the work tree vs 0.5 sec and less on the main 5.5 tree (tried revno 3413 and revno 3402).

bzr version-info

revision-id: timour@askmonty.org-20120518115201-s6byggvesqxcntkd

date: 2012-05-18 14:52:01 +0300

revno: 3404

Reproducible with the default optimizer_switch as well as with all OFF values (except for in_to_exists which is required to execute the query).
Reproducible with MyISAM, Aria, InnoDB.

EXPLAIN on the work tree (with the default optimizer_switch):

id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra

1	PRIMARY	alias2	ALL	NULL	NULL	NULL	NULL	20	100.00

1	PRIMARY	alias3	ALL	NULL	NULL	NULL	NULL	20	100.00	Using join buffer (flat, BNL join)

1	PRIMARY	alias1	ALL	NULL	NULL	NULL	NULL	100	100.00	Using join buffer (incremental, BNL join)

2	DEPENDENT SUBQUERY	t1	ALL	NULL	NULL	NULL	NULL	20	100.00

2	DEPENDENT SUBQUERY	t2	ALL	NULL	NULL	NULL	NULL	100	100.00	Using where; Using join buffer (flat, BNL join)

3	SUBQUERY	t3	ALL	NULL	NULL	NULL	NULL	2	100.00

Warnings:

Note	1276	Field or reference 'test.alias1.b' of SELECT #2 was resolved in SELECT #1

Note	1003	select <expr_cache><>((select min(`test`.`t2`.`b`) from `test`.`t1` join `test`.`t2` where (`test`.`t2`.`b` = `test`.`t1`.`a`))) AS `(

SELECT MIN(b) FROM t1, t2

WHERE b = a AND

( b = alias1.b OR

EXISTS ( SELECT * FROM t3 )

)` from `test`.`t2` `alias1` join `test`.`t1` `alias2` join `test`.`t1` `alias3`

EXPLAIN on the main tree (with the default optimizer_switch):

id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra

1	PRIMARY	alias2	ALL	NULL	NULL	NULL	NULL	20	100.00

1	PRIMARY	alias3	ALL	NULL	NULL	NULL	NULL	20	100.00	Using join buffer (flat, BNL join)

1	PRIMARY	alias1	ALL	NULL	NULL	NULL	NULL	100	100.00	Using join buffer (incremental, BNL join)

2	DEPENDENT SUBQUERY	t1	ALL	NULL	NULL	NULL	NULL	20	100.00	Using where

2	DEPENDENT SUBQUERY	t2	ALL	NULL	NULL	NULL	NULL	100	100.00	Using where; Using join buffer (flat, BNL join)

3	SUBQUERY	t3	ALL	NULL	NULL	NULL	NULL	2	100.00

Warnings:

Note	1276	Field or reference 'test.alias1.b' of SELECT #2 was resolved in SELECT #1

Note	1003	select <expr_cache><`test`.`alias1`.`b`>((select min(`test`.`t2`.`b`) from `test`.`t1` join `test`.`t2` where ((`test`.`t2`.`b` = `test`.`t1`.`a`) and ((`test`.`t1`.`a` = `test`.`alias1`.`b`) or exists(select 1 from `test`.`t3`))))) AS `(

SELECT MIN(b) FROM t1, t2

WHERE b = a AND

( b = alias1.b OR

EXISTS ( SELECT * FROM t3 )

)` from `test`.`t2` `alias1` join `test`.`t1` `alias2` join `test`.`t1` `alias3`

Test case:


CREATE TABLE t1 (a INT);
INSERT INTO t1 VALUES
(1),(7),(4),(7),(0),(2),(9),(4),(0),(9),
(1),(3),(8),(8),(18),(84),(6),(3),(6),(6);

CREATE TABLE t2 (b INT);
INSERT INTO t2 VALUES
(4),(5),(2),(5),(1),(1),(2),(2),(2),(197),
(4),(5),(3),(1),(2),(7),(6),(1),(156),(8),
(7),(2),(6),(2),(1),(0),(7),(5),(7),(2),(1),
(80),(3),(8),(5),(0),(9),(9),(7),(0),(5),
(6),(9),(3),(91),(6),(7),(3),(161),(7),(7),
(6),(5),(8),(7),(2),(1),(3),(6),(6),(5),(0),
(7),(7),(6),(0),(0),(8),(0),(4),(0),(213),
(248),(1),(6),(6),(3),(140),(0),(7),(6),(6),
(8),(5),(8),(7),(3),(7),(3),(8),(0),(1),(3),
(6),(8),(1),(1),(9),(0),(6);

CREATE TABLE t3 (c INT);
INSERT INTO t3 VALUES (8),(3);

SELECT (
SELECT MIN(b) FROM t1, t2
WHERE b = a AND
( b = alias1.b OR
EXISTS ( SELECT * FROM t3 )
)
)
FROM t2 alias1, t1 alias2, t1 alias3;

Attachments

Issue Links

relates to

MDEV-6231 Wrong result (extra rows) via cursor in a stored procedure

Closed

MDEV-193 LP:944706 - Query with impossible or constant subquery in WHERE or HAVING is not precomputed and thus not part of optimization

Closed

Activity

Ascending order - Click to sort in descending order

Timour Katchaounov (Inactive) added a comment - 2012-05-28 16:22

Analysis:

The much slower execution in the case when constant subqueries are evaluated is a result of the subquery cache not working in this case. This is what happens in more detail. The subquery in the SELECT list contains a subquery. This subquery in turn contains a constant subquery.

If the constant subquery is considered expensive, then it is not evaluated during optimization. The subquery cache mechanism creates a cache for the correlated subquery, with a lookup argument 'alias1.b'. Execution is fast, because for all 40K results rows we execute the subquery only once, all other cases are lookups into the subquery cache.

If the constant subquery is consdered non-expensive, then it is evaluated during optimization. The WHERE clause of the outer subquery is reduced to "b = a" because the EXISTS predicate in the OR is TRUE, therefore the whole OR can be optimized away. Notice that because of this optimization, the subquery in the SELECT clause becomes non-correlated.

- Item_cache_wrapper::init_on_demand() calls orig_item->get_cache_parameters(parameters) and determines that the list of parameters is empty because there are no correlated columns.

- As a result expr_cache->init() returns immediately, and doesn't do any initialization. In particular it doesn't create the temp table for the cache.

The ultimate result is that once the subquery becomes non-correlated, it is not cached any more, and is executed for each result row combination (40K times in the example). I consider the above behavior a deficiency of the subquery cache. This is being investigated ATM.

Timour Katchaounov (Inactive) added a comment - 2012-05-28 16:22 Analysis: The much slower execution in the case when constant subqueries are evaluated is a result of the subquery cache not working in this case. This is what happens in more detail. The subquery in the SELECT list contains a subquery. This subquery in turn contains a constant subquery. If the constant subquery is considered expensive, then it is not evaluated during optimization. The subquery cache mechanism creates a cache for the correlated subquery, with a lookup argument 'alias1.b'. Execution is fast, because for all 40K results rows we execute the subquery only once, all other cases are lookups into the subquery cache. If the constant subquery is consdered non-expensive, then it is evaluated during optimization. The WHERE clause of the outer subquery is reduced to "b = a" because the EXISTS predicate in the OR is TRUE, therefore the whole OR can be optimized away. Notice that because of this optimization, the subquery in the SELECT clause becomes non-correlated. Item_cache_wrapper::init_on_demand() calls orig_item->get_cache_parameters(parameters) and determines that the list of parameters is empty because there are no correlated columns. As a result expr_cache->init() returns immediately, and doesn't do any initialization. In particular it doesn't create the temp table for the cache. The ultimate result is that once the subquery becomes non-correlated, it is not cached any more, and is executed for each result row combination (40K times in the example). I consider the above behavior a deficiency of the subquery cache. This is being investigated ATM.

Timour Katchaounov (Inactive) added a comment - 2012-06-08 18:40

This problem is fixed by the following patch waiting for review:

revno: 3406
revision-id: timour@askmonty.org-20120529211853-hww47vl7d4u4ae23
parent: timour@askmonty.org-20120524110828-r0mm8sm1vn8a095e
committer: timour@askmonty.org
branch nick: 5.5-lpb944706
timestamp: Wed 2012-05-30 00:18:53 +0300
message:
Patch for mdev-287: CHEAP SQ: A query with subquery in SELECT list, EXISTS, inner joins takes hundreds times longer

Analysis:

The fix for lp:944706 introduces early subquery optimization.
While a subquery is being optimized some of its predicates may be
removed. In the test case, the EXISTS subquery is constant, and is
evaluated to TRUE. As a result the whole OR is TRUE, and thus the
correlated condition "b = alias1.b" is optimized away. The subquery
becomes non-correlated.

The subquery cache is designed to work only for correlated subqueries.
If constant subquery optimization is disallowed, then the constant
subquery is not evaluated, the subquery remains correlated, and its
execution is cached. As a result execution is fast.

However, when the constant subquery was optimized away, it was neither
cached by the subquery cache, nor it was cached by the internal subquery
caching. The latter was due to the fact that the subquery still appeared
as correlated to the subselect_XYZ_engine::exec methods, and they
re-executed the subquery on each call to Item_subselect::exec.

Solution:

The solution is to update the correlated status of the subquery after it has
been optimized. This status consists of:

st_select_lex::is_correlated
Item_subselect::is_correlated
SELECT_LEX::uncacheable
SELECT_LEX_UNIT::uncacheable
The status is updated by st_select_lex::update_correlated_cache(), and its
caller st_select_lex::optimize_unflattened_subqueries. The solution relies
on the fact that the optimizer already called
st_select_lex::update_used_tables() for each subquery. This allows to
efficiently update the correlated status of each subquery without walking
the whole subquery tree.

Notice that his patch is an improvement over MySQL 5.6 and older, where
subqueries are not pre-optimized, and the above analysis is not possible.

Timour Katchaounov (Inactive) added a comment - 2012-06-08 18:40 This problem is fixed by the following patch waiting for review: revno: 3406 revision-id: timour@askmonty.org-20120529211853-hww47vl7d4u4ae23 parent: timour@askmonty.org-20120524110828-r0mm8sm1vn8a095e committer: timour@askmonty.org branch nick: 5.5-lpb944706 timestamp: Wed 2012-05-30 00:18:53 +0300 message: Patch for mdev-287: CHEAP SQ: A query with subquery in SELECT list, EXISTS, inner joins takes hundreds times longer Analysis: The fix for lp:944706 introduces early subquery optimization. While a subquery is being optimized some of its predicates may be removed. In the test case, the EXISTS subquery is constant, and is evaluated to TRUE. As a result the whole OR is TRUE, and thus the correlated condition "b = alias1.b" is optimized away. The subquery becomes non-correlated. The subquery cache is designed to work only for correlated subqueries. If constant subquery optimization is disallowed, then the constant subquery is not evaluated, the subquery remains correlated, and its execution is cached. As a result execution is fast. However, when the constant subquery was optimized away, it was neither cached by the subquery cache, nor it was cached by the internal subquery caching. The latter was due to the fact that the subquery still appeared as correlated to the subselect_XYZ_engine::exec methods, and they re-executed the subquery on each call to Item_subselect::exec. Solution: The solution is to update the correlated status of the subquery after it has been optimized. This status consists of: st_select_lex::is_correlated Item_subselect::is_correlated SELECT_LEX::uncacheable SELECT_LEX_UNIT::uncacheable The status is updated by st_select_lex::update_correlated_cache(), and its caller st_select_lex::optimize_unflattened_subqueries. The solution relies on the fact that the optimizer already called st_select_lex::update_used_tables() for each subquery. This allows to efficiently update the correlated status of each subquery without walking the whole subquery tree. Notice that his patch is an improvement over MySQL 5.6 and older, where subqueries are not pre-optimized, and the above analysis is not possible.

Timour Katchaounov (Inactive) added a comment - 2012-06-19 15:31

Merged, tested, pushed.

Timour Katchaounov (Inactive) added a comment - 2012-06-19 15:31 Merged, tested, pushed.

MariaDB Server

CHEAP SQ: A query with subquery in SELECT list, EXISTS, inner joins takes hundreds times longer

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration