[MDEV-321] CHEAP SQ: A query with inner joins and EXISTS subquery takes several times longer on MDEV-193 tree than on 5.5 main tree Created: 2012-06-06  Updated: 2012-06-08  Resolved: 2012-06-08

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.5.27

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Timour Katchaounov (Inactive)
Resolution: Won't Fix Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-193 LP:944706 - Query with impossible or ... Closed

 Description   

The following query

SELECT MAX(al1.a) 
FROM t1 AS al1, t1 AS al2, t1 AS al3, t1 AS al4
WHERE al4.a = al3.a 
  AND (
    EXISTS ( SELECT 1 FROM t1 )
    OR al1.a > al2.a
  )

takes several times longer on MDEV-193 tree revno 3407 comparing to maria/5.5 tree 3426.

Reproducible with the default optimizer_switch as well as all OFF values except for in_to_exists required to execute the query.
Reproducible with MyISAM (~3 times longer, 30 vs 10 sec on my machine), Aria and InnoDB (~6 times longer, 60 vs 10 sec).

EXPLAIN on MDEV-193 (with the default optimizer_switch):

id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	PRIMARY	al1	index	NULL	a	5	NULL	90	100.00	Using index
1	PRIMARY	al2	index	NULL	a	5	NULL	90	100.00	Using index; Using join buffer (flat, BNL join)
1	PRIMARY	al3	index	a	a	5	NULL	90	100.00	Using where; Using index; Using join buffer (incremental, BNL join)
1	PRIMARY	al4	ref	a	a	5	test.al3.a	10	100.00	Using index
2	SUBQUERY	t1	index	NULL	a	5	NULL	90	100.00	Using index
Warnings:
Note	1003	select max(`test`.`al1`.`a`) AS `MAX(al1.a)` from `test`.`t1` `al1` join `test`.`t1` `al2` join `test`.`t1` `al3` join `test`.`t1` `al4` where (`test`.`al4`.`a` = `test`.`al3`.`a`)

EXPLAIN on maria/5.5 (with the default optimizer_switch):

id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	PRIMARY	al3	index	a	a	5	NULL	90	100.00	Using where; Using index
1	PRIMARY	al4	ref	a	a	5	test.al3.a	10	100.00	Using index
1	PRIMARY	al1	index	a	a	5	NULL	90	100.00	Using index; Using join buffer (flat, BNL join)
1	PRIMARY	al2	index	a	a	5	NULL	90	100.00	Using where; Using index; Using join buffer (incremental, BNL join)
2	SUBQUERY	t1	index	NULL	a	5	NULL	90	100.00	Using index
Warnings:
Note	1003	select max(`test`.`al1`.`a`) AS `MAX(al1.a)` from `test`.`t1` `al1` join `test`.`t1` `al2` join `test`.`t1` `al3` join `test`.`t1` `al4` where ((`test`.`al4`.`a` = `test`.`al3`.`a`) and (exists(select 1 from `test`.`t1`) or (`test`.`al1`.`a` > `test`.`al2`.`a`)))

Test case:

SET optimizer_switch = 'in_to_exists=on';
 
CREATE TABLE t1 (a INT, KEY(a));
INSERT INTO t1 VALUES
(7),(5),(1),(204),(224),(9),(5),(0),(3);
 
INSERT INTO t1 SELECT al1.* FROM t1 al1, t1 al2;
 
SELECT MAX(al1.a) 
FROM t1 AS al1, t1 AS al2, t1 AS al3, t1 AS al4
WHERE al4.a = al3.a 
  AND (
    EXISTS ( SELECT 1 FROM t1 )
    OR al1.a > al2.a
  );



 Comments   
Comment by Timour Katchaounov (Inactive) [ 2012-06-07 ]

This is a generic join optimizer problem that is completely unrelated to mdev-193.
The relationship with mdev-193 is that it makes it possible to control whether some
constant predicate is expensive or not. MDEV-193 implements this control for
subquery predicates. When a constant subquery predicate is considered expensive,
then it cannot be used in constant optimization during the rewrite optimization phase.

In the example above, if the predicate is considered non-expensive, then it is evaluated
to TRUE, and the whole disjunctive condition is optimized away. The where clause is
thus reduced to "al4.a = al3.a".

If one runs the simplified query directly:

SELECT MAX(al1.a) FROM t1 AS al1, t1 AS al2, t1 AS al3, t1 AS al4 WHERE al4.a = al3.a;

It executes equally slowly. This can also be verified with mysql-trunk (both with the simplified
query and the original query from the test case).

The reason for the slow execution is the join order chosen by the optimizer:

-------------------------------------------------------------------------------------------------------------------------------------+

id select_type table type possible_keys key key_len ref rows Extra

-------------------------------------------------------------------------------------------------------------------------------------+

1 SIMPLE al1 index NULL a 5 NULL 90 Using index
1 SIMPLE al2 index NULL a 5 NULL 90 Using index; Using join buffer (flat, BNL join)
1 SIMPLE al3 index a a 5 NULL 90 Using where; Using index; Using join buffer (incremental, BNL join)
1 SIMPLE al4 ref a a 5 md321.al3.a 5 Using index

-------------------------------------------------------------------------------------------------------------------------------------+

This order coincides with the order of the tables in the FROM clause. The reason why this plan is slow, is because
the only join condition is attached to last two tables in the plan. This means that the condition is executed as many
times as the size of the cross-product of the first three tables (90^3 times).

If we reorder the tables in the FROM clause (or use STRAIGHT_JOIN) as follows:

MariaDB [md321]> explain SELECT MAX(al1.a) FROM t1 AS al4, t1 AS al3, t1 AS al2, t1 AS al1 WHERE al4.a = al3.a;
------------------------------------------------------------------------------------------------------------------------+

id select_type table type possible_keys key key_len ref rows Extra

------------------------------------------------------------------------------------------------------------------------+

1 SIMPLE al4 index a a 5 NULL 90 Using where; Using index
1 SIMPLE al3 ref a a 5 md321.al4.a 5 Using index
1 SIMPLE al2 index NULL a 5 NULL 90 Using index; Using join buffer (flat, BNL join)
1 SIMPLE al1 index NULL a 5 NULL 90 Using index; Using join buffer (incremental, BNL join)

------------------------------------------------------------------------------------------------------------------------+

This plan takes ~4 sec instead of ~50 sec of the previous plan.

If we run the original test case with the correct table order, the plan is also very fast ~9 sec.

Finally, if the test case query is run with 5.5 main, or with mdev-193 and set expensive_subquery_limit=0,
execution is fast again because the optimizer keeps all join conditions. As a result, the optimizer chooses
the optimal join order 3-4-1-2, and the selective condition "al4.a = al3.a" is evaluated with the first join in the
plan.

In summary, mdev-193 allows the optimizer to correctly simplify the WHERE clause, but it exposes a
weakness in the JOIN optimizer. Most likely the problem appears when all tables are of the same
(or similar) size.

Comment by Igor Babaev [ 2012-06-08 ]

Timour,

If it exposes a weakness of the join optimizer then report a new bug.

From your analysis it's not clear why the cost of the first plan is less than the cost of the second plan.
The first plan performs (90^3) look-ups while the second only 90.
We need the cost estimates calculated by the optimizer (including the estimates for the partial plans)
to see the its reasons.

Comment by Igor Babaev [ 2012-06-08 ]

Here's what I had in the latest MariaDB 5.5:

MariaDB [test]> explain SELECT MAX(al1.a) FROM t1 AS al4, t1 AS al3, t1 AS al2, t1 AS al1 WHERE al4.a = al3.a;
-----------------------------------------------------------------------------------------------------------------------+

id select_type table type possible_keys key key_len ref rows Extra

-----------------------------------------------------------------------------------------------------------------------+

1 SIMPLE al4 index a a 5 NULL 90 Using where; Using index
1 SIMPLE al3 ref a a 5 test.al4.a 5 Using index
1 SIMPLE al2 index NULL a 5 NULL 90 Using index; Using join buffer (flat, BNL join)
1 SIMPLE al1 index NULL a 5 NULL 90 Using index; Using join buffer (incremental, BNL join)

-----------------------------------------------------------------------------------------------------------------------+
4 rows in set (0.00 sec)

MariaDB [test]> SHOW SESSION STATUS LIKE 'Last_query_cost';
------------------------------+

Variable_name Value

------------------------------+

Last_query_cost 737304.481750

------------------------------+
1 row in set (0.00 sec)

MariaDB [test]> EXPLAIN SELECT MAX(al1.a) FROM t1 AS al1, t1 AS al2, t1 AS al3, t1 AS al4 WHERE al4.a = al3.a;
------------------------------------------------------------------------------------------------------------------------------------+

id select_type table type possible_keys key key_len ref rows Extra

------------------------------------------------------------------------------------------------------------------------------------+

1 SIMPLE al1 index NULL a 5 NULL 90 Using index
1 SIMPLE al2 index NULL a 5 NULL 90 Using index; Using join buffer (flat, BNL join)
1 SIMPLE al3 index a a 5 NULL 90 Using where; Using index; Using join buffer (incremental, BNL join)
1 SIMPLE al4 ref a a 5 test.al3.a 5 Using index

------------------------------------------------------------------------------------------------------------------------------------+
4 rows in set (0.00 sec)

MariaDB [test]> SHOW SESSION STATUS LIKE 'Last_query_cost';
-------------------------------+

Variable_name Value

-------------------------------+

Last_query_cost 1609351.275728

-------------------------------+
1 row in set (0.01 sec)

MariaDB [test]> set optimizer_prune_level=1;
Query OK, 0 rows affected (0.00 sec)

MariaDB [test]> EXPLAIN SELECT MAX(al1.a) FROM t1 AS al1, t1 AS al2, t1 AS al3, t1 AS al4 WHERE al4.a = al3.a;
---------------------------------------------------------------------------------------------------------------------------------+

id select_type table type possible_keys key key_len ref rows Extra

---------------------------------------------------------------------------------------------------------------------------------+

1 SIMPLE al1 index NULL a 5 NULL 90 Using index
1 SIMPLE al2 index NULL a 5 NULL 90 Using index; Using join buffer (f, BNL join)
1 SIMPLE al3 index a a 5 NULL 90 Using where; Using index; Using j buffer (incremental, BNL join)
1 SIMPLE al4 ref a a 5 test.al3.a 5 Using index

---------------------------------------------------------------------------------------------------------------------------------+
4 rows in set (0.00 sec)

MariaDB [test]> SHOW SESSION STATUS LIKE 'Last_query_cost';
-------------------------------+

Variable_name Value

-------------------------------+

Last_query_cost 1609351.275728

-------------------------------+
1 row in set (0.00 sec)

Comment by Igor Babaev [ 2012-06-08 ]

The problem is repeatable in MariaDB 5.2.

Comment by Timour Katchaounov (Inactive) [ 2012-06-08 ]

As said above, this problem is unrelated to mdev-193.
I filed the problem as bug lp:1010395 (https://bugs.launchpad.net/maria/+bug/1010395).

Generated at Thu Feb 08 06:27:52 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.