[MDEV-317] CHEAP SQ: A query with EXISTS subquery, aggregate function in subquery, ORDER BY and LIMIT takes much longer than on main tree if optimizer_prune_level=0 Created: 2012-06-06  Updated: 2012-06-13  Resolved: 2012-06-13

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.5.27

Type: Bug Priority: Minor
Reporter: Elena Stepanova Assignee: Timour Katchaounov (Inactive)
Resolution: Won't Fix Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-193 LP:944706 - Query with impossible or ... Closed

 Description   

Note: with the provided test case, the problem is only reproducible when optimizer_prune_level=0. If you decide that it's acceptable and isn't worth fixing, I won't object, but I'm filing it so you could check if it's expected.

The following query

SELECT a FROM t1, t2, t3 AS t3_alias
WHERE b <= ( SELECT SUM(a) FROM t1 )
  AND EXISTS ( SELECT 1 FROM t3 WHERE c > t3_alias.c )
ORDER BY a LIMIT 1;

takes ~18 times longer on MDEV-193 tree comparing to main/5.5 revno 3426 (and to 3402).

Reproducible with MyISAM and Aria; with InnoDB it's sporadic.
Reproducible with the default optimizer_switch.

EXPLAIN on MDEV-193:

 
id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	PRIMARY	t1	index	NULL	a	5	NULL	40	100.00	Using index; Using temporary; Using filesort
1	PRIMARY	t2	range	b	b	5	NULL	96	100.00	Using where; Using index; Using join buffer (flat, BNL join)
1	PRIMARY	t3_alias	index	NULL	c	5	NULL	40	100.00	Using where; Using index; Using join buffer (incremental, BNL join)
3	DEPENDENT SUBQUERY	t3	index	c	c	5	NULL	40	100.00	Using where; Using index
2	SUBQUERY	t1	index	NULL	a	5	NULL	40	100.00	Using index
Warnings:
Note	1276	Field or reference 'test.t3_alias.c' of SELECT #3 was resolved in SELECT #1
Note	1003	select `test`.`t1`.`a` AS `a` from `test`.`t1` join `test`.`t2` join `test`.`t3` `t3_alias` where ((`test`.`t2`.`b` <= (select sum(`test`.`t1`.`a`) from `test`.`t1`)) and <expr_cache><`test`.`t3_alias`.`c`>(exists(select 1 from `test`.`t3` where (`test`.`t3`.`c` > `test`.`t3_alias`.`c`)))) order by `test`.`t1`.`a` limit 1

EXPLAIN on maria/5.5:

 
id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	PRIMARY	t1	index	NULL	a	5	NULL	40	100.00	Using index; Using temporary; Using filesort
1	PRIMARY	t3_alias	index	NULL	c	5	NULL	40	100.00	Using where; Using index; Using join buffer (flat, BNL join)
1	PRIMARY	t2	index	b	b	5	NULL	100	100.00	Using where; Using index; Using join buffer (incremental, BNL join)
3	DEPENDENT SUBQUERY	t3	index	c	c	5	NULL	40	100.00	Using where; Using index
2	SUBQUERY	t1	index	NULL	a	5	NULL	40	100.00	Using index
Warnings:
Note	1276	Field or reference 'test.t3_alias.c' of SELECT #3 was resolved in SELECT #1
Note	1003	select `test`.`t1`.`a` AS `a` from `test`.`t1` join `test`.`t2` join `test`.`t3` `t3_alias` where ((`test`.`t2`.`b` <= (select sum(`test`.`t1`.`a`) from `test`.`t1`)) and <expr_cache><`test`.`t3_alias`.`c`>(exists(select 1 from `test`.`t3` where (`test`.`t3`.`c` > `test`.`t3_alias`.`c`)))) order by `test`.`t1`.`a` limit 1

Test case:

SET optimizer_prune_level = 0;
 
CREATE TABLE t1 (a INT, KEY(a)) ENGINE=MyISAM;
INSERT INTO t1 VALUES
  (0),(8),(1),(8),(9),(24),(6),(1),(6),(2),
  (4),(8),(4),(4),(7),(4),(1),(9),(4),(8);
INSERT INTO t1 SELECT * FROM t1;
 
CREATE TABLE t2 (b INT, KEY(b)) ENGINE=MyISAM;
INSERT INTO t2 VALUES
  (4),(8),(0),(0),(0),(7),(7),(5),(3),(188),
  (4),(9),(6),(1),(5),(6),(2),(4),(231),(4),
  (3),(3),(7),(6),(7),(9),(4),(4),(2),(1),(2),
  (194),(2),(3),(8),(4),(9),(4),(5),(5),(9),
  (3),(8),(0),(98),(3),(1),(0),(189),(8),(3),
  (3),(9),(6),(8),(3),(9),(5),(9),(2),(2),(5),
  (8),(6),(9),(0),(3),(6),(5),(8),(2),(120),
  (25),(1),(3),(1),(3),(153),(5),(9),(1),(8),
  (7),(6),(2),(4),(7),(3),(8),(4),(6),(1),(7),
  (1),(0),(2),(7),(2),(1),(5);
 
CREATE TABLE t3 (c INT, KEY(c)) ENGINE=MyISAM;
INSERT INTO t3 VALUES
  (7),(0),(9),(3),(4),(2),(5),(3),(1),(3),(6),
  (7),(5),(1),(204),(224),(9),(5),(0),(3);
INSERT INTO t3 SELECT * FROM t3;
 
SELECT a FROM t1, t2, t3 AS t3_alias
WHERE b <= ( SELECT SUM(a) FROM t1 )
  AND EXISTS ( SELECT 1 FROM t3 WHERE c > t3_alias.c )
ORDER BY a LIMIT 1;



 Comments   
Comment by Timour Katchaounov (Inactive) [ 2012-06-08 ]

This is not a bug, but rather a deficiency of the current optimizer that is
completely unrelated to mdev-193. It is a regression only with respect to
MariaDB 5.5/5.3. Detailed explanation follows.

1. What happens in mdev-193

The patch for mdev-193 makes MariaDB 5.5 behave like any MySQL version, or
MariaDB 5.2, namely - it allows constant subqueries to be evaluated during
optimization, and the resulting value to be used by the optimizer.

Specifically in this test case, the subquery (SELECT SUM(a) FROM t1) is
evaluated by the range optimizer, and the final query plan uses "range"
access to table "t2".

If we set "optimizer_prune_level = 0" (that is, not pruning, consider all
plans), the optimizer determines that the optimal query plan is:

------------------------------------------------------------------------------------------------------------------------------------------------

id select_type table type possible_keys key key_len ref rows filtered Extra

------------------------------------------------------------------------------------------------------------------------------------------------

1 PRIMARY t1 index NULL a 5 NULL 40 100.00 Using index
1 PRIMARY t2 range b b 5 NULL 96 100.00 Using where; Using index; Using join buffer (flat, BNL join)
1 PRIMARY t3_alias index NULL c 5 NULL 40 100.00 Using where; Using index; Using join buffer (incremental, BNL join)
3 DEPENDENT SUBQUERY t3 index c c 5 NULL 40 100.00 Using where; Using index
2 SUBQUERY t1 index NULL a 5 NULL 40 100.00 Using index

------------------------------------------------------------------------------------------------------------------------------------------------
join->best_read = 32396

If we set "optimizer_prune_level = 1" (that is prune some plans), the optimizer
prunes the above plan, and finds a worse plan with respect to its cost model:

------------------------------------------------------------------------------------------------------------------------------------------------

id select_type table type possible_keys key key_len ref rows filtered Extra

------------------------------------------------------------------------------------------------------------------------------------------------

1 PRIMARY t1 index NULL a 5 NULL 40 100.00 Using index
1 PRIMARY t3_alias index NULL c 5 NULL 40 100.00 Using where; Using index; Using join buffer (flat, BNL join)
1 PRIMARY t2 range b b 5 NULL 96 100.00 Using where; Using index; Using join buffer (incremental, BNL join)
3 DEPENDENT SUBQUERY t3 index c c 5 NULL 40 100.00 Using where; Using index
2 SUBQUERY t1 index NULL a 5 NULL 40 100.00 Using index

------------------------------------------------------------------------------------------------------------------------------------------------
join->best_read = 66869 (~ two times higher cost)

However, the second plan in reality is much faster. The reason for this is that
table "t3_alias" appears in second place in the plan instead of in third, as a
result the correlated EXISTS subquery is executed fewer times (1531 times). The
first "optimal" plan puts table "t3_alias" in third place in the join, which
results in the EXISTS subquery being executed 152011 times. At the same time
the result of the constant subquery is cached, so it doesn't matter much how
many times it is evaluated.

2. Exposing the same problem in the main 5.5 branch

If we substitute manually the aggregate subquery with its constant value, we can
reproduce the problem in 5.5 via the following test case:

– better plan
SET optimizer_prune_level = 1;

EXPLAIN
SELECT count FROM t1, t2, t3 AS t3_alias
WHERE b <= 236
AND EXISTS ( SELECT 1 FROM t3 WHERE c > t3_alias.c )
ORDER BY a LIMIT 1;

– worse plan
SET optimizer_prune_level = 0;

EXPLAIN
SELECT count FROM t1, t2, t3 AS t3_alias
WHERE b <= 236
AND EXISTS ( SELECT 1 FROM t3 WHERE c > t3_alias.c )
ORDER BY a LIMIT 1;

Comment by Timour Katchaounov (Inactive) [ 2012-06-13 ]

As explained in the last comment this is not a bug, but a deficiency of the current optimizer.
This problem was filed as a new task MDEV-338.

Generated at Thu Feb 08 06:27:50 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.