[MDEV-19532] Query with subqueries hangs for long time with default optimizer_search_depth Created: 2019-05-20  Updated: 2021-02-12  Resolved: 2019-05-23

Status: Closed
Project: MariaDB Server
Component/s: Data Manipulation - Subquery, Storage Engine - MyISAM
Affects Version/s: 10.0, 10.1, 10.3.14, 10.3.15, 10.2, 10.3, 10.4
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Tom Dolezal Assignee: Igor Babaev
Resolution: Not a Bug Votes: 0
Labels: SELECT, optimizer, subquery
Environment:

Windows 10 x64, MariaDB 10.3.15 with none my.ini file
CentOS 7.6.1810 x64, MariaDB 10.3.14 with default configuration


Attachments: Text File prepare.sql     Text File test.sql    
Issue Links:
Relates
relates to MDEV-11287 Query that should result in only four... Stalled
relates to MDEV-24849 high optimizer_search_depth affects q... Closed

 Description   

We have found a difference in subquery/join optimizer between MySQL 5.x and MariaDB 10.3.x. When we run a simple select query with several subqueries on MariaDB with the default value of configuration "optimizer_search_depth" (62), the query is executing for very long time, hangs in "statistics" state, with quadratic or exponencional growing of the execution time. The same query performs on MySQL 5 during a few milliseconds. We have prepared test suite. The "prepare.sql" prepares test environment a the "test.sql" demonstrates this problem. In test.sql, the first 10 subqueries are uncommented. Query in this form takes arround 6 seconds, but when we add 11. subquery, query takes 112 second already. When we have changed "optimizer_search_depth" to 1, the execution of same query was done immediately. Same result we got on Windows and Linux, in version 10.3.14 and 10.3.15



 Comments   
Comment by Alice Sherepa [ 2019-05-21 ]

Thanks a lot! Reproducible on MariaDB 10.0-10.4, also with optimizer_switch='semijoin=on'.
On 5.5 the query is fast. As a workaround optimizer_switch='semijoin=off' could be used.

create table t1(id int not null auto_increment primary key);
insert into t1 values (1), (2);
 
create table t2( id int not null auto_increment primary key, id2 int not null, b int not null, 
	index (id2), index (b));
 
delimiter $$;
 
CREATE PROCEDURE fill_tables()
BEGIN
   DECLARE i int DEFAULT 0;
   WHILE i <= 200 DO
      INSERT INTO t2 (id, id2, b) VALUES (0, 1, i), (0,2,i);
      SET i = i + 1;
   END WHILE;
END$$
 
delimiter ;$$
CALL fill_tables();
 
set optimizer_switch='semijoin=off';
 
SELECT  COUNT(*) FROM t1  WHERE 1
	AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 1 LIMIT 1)
	AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 2 LIMIT 1)
	AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 3 LIMIT 1)
	AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 4 LIMIT 1)
	AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 5 LIMIT 1)
	AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 6 LIMIT 1)
	AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 7 LIMIT 1)
	AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 8 LIMIT 1)
	AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 9 LIMIT 1) 
	AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 10 LIMIT 1)
	AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 11 LIMIT 1)
	AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 12 LIMIT 1)
	AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 13 LIMIT 1)
	AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 14 LIMIT 1) 
	AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 15 LIMIT 1);
 
set optimizer_switch='semijoin=on';
 
SELECT  COUNT(*) FROM t1  WHERE 1
	AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 1 LIMIT 1)
	AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 2 LIMIT 1)
	AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 3 LIMIT 1)
	AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 4 LIMIT 1)
	AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 5 LIMIT 1)
	AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 6 LIMIT 1)
	AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 7 LIMIT 1)
	AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 8 LIMIT 1)
	AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 9 LIMIT 1) 
	AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 10 LIMIT 1)
	AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 11 LIMIT 1)
	AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 12 LIMIT 1)
	AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 13 LIMIT 1)
	AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 14 LIMIT 1) 
	AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 15 LIMIT 1);
 
drop table t1,t2;
drop procedure fill_tables;

 
MariaDB [test]> SELECT  COUNT(*) FROM t1  WHERE 1
    -> AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 1 LIMIT 1)
    -> AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 2 LIMIT 1)
    -> AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 3 LIMIT 1)
    -> AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 4 LIMIT 1)
    -> AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 5 LIMIT 1)
    -> AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 6 LIMIT 1)
    -> AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 7 LIMIT 1)
    -> AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 8 LIMIT 1)
    -> AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 9 LIMIT 1) 
    -> AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 10 LIMIT 1);
+----------+
| COUNT(*) |
+----------+
|        2 |
+----------+
1 row in set (5.240 sec)
 
MariaDB [test]> explain extended SELECT  COUNT(*) FROM t1  WHERE 1 AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 1 LIMIT 1) AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 2 LIMIT 1) AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 3 LIMIT 1) AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 4 LIMIT 1) AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 5 LIMIT 1) AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 6 LIMIT 1) AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 7 LIMIT 1) AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 8 LIMIT 1) AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 9 LIMIT 1)  AND EXISTS (SELECT 1 FROM t2 WHERE t2.id2 = t1.id AND t2.b = 10 LIMIT 1);
+------+--------------+--------------+--------+---------------+--------------+---------+-------------+------+----------+-------------+
| id   | select_type  | table        | type   | possible_keys | key          | key_len | ref         | rows | filtered | Extra       |
+------+--------------+--------------+--------+---------------+--------------+---------+-------------+------+----------+-------------+
|    1 | PRIMARY      | <subquery11> | ALL    | distinct_key  | NULL         | NULL    | NULL        |    2 |   100.00 |             |
|    1 | PRIMARY      | t1           | eq_ref | PRIMARY       | PRIMARY      | 4       | test.t2.id2 |    1 |   100.00 | Using index |
|    1 | PRIMARY      | <subquery10> | eq_ref | distinct_key  | distinct_key | 4       | func        |    1 |   100.00 |             |
|    1 | PRIMARY      | <subquery9>  | eq_ref | distinct_key  | distinct_key | 4       | func        |    1 |   100.00 |             |
|    1 | PRIMARY      | <subquery8>  | eq_ref | distinct_key  | distinct_key | 4       | func        |    1 |   100.00 |             |
|    1 | PRIMARY      | <subquery7>  | eq_ref | distinct_key  | distinct_key | 4       | func        |    1 |   100.00 |             |
|    1 | PRIMARY      | <subquery6>  | eq_ref | distinct_key  | distinct_key | 4       | func        |    1 |   100.00 |             |
|    1 | PRIMARY      | <subquery5>  | eq_ref | distinct_key  | distinct_key | 4       | func        |    1 |   100.00 |             |
|    1 | PRIMARY      | <subquery4>  | eq_ref | distinct_key  | distinct_key | 4       | func        |    1 |   100.00 |             |
|    1 | PRIMARY      | <subquery3>  | eq_ref | distinct_key  | distinct_key | 4       | func        |    1 |   100.00 |             |
|    1 | PRIMARY      | <subquery2>  | eq_ref | distinct_key  | distinct_key | 4       | func        |    1 |   100.00 |             |
|   11 | MATERIALIZED | t2           | ref    | id2,b         | b            | 2       | const       |    2 |   100.00 |             |
|   10 | MATERIALIZED | t2           | ref    | id2,b         | b            | 2       | const       |    2 |   100.00 |             |
|    9 | MATERIALIZED | t2           | ref    | id2,b         | b            | 2       | const       |    2 |   100.00 |             |
|    8 | MATERIALIZED | t2           | ref    | id2,b         | b            | 2       | const       |    2 |   100.00 |             |
|    7 | MATERIALIZED | t2           | ref    | id2,b         | b            | 2       | const       |    2 |   100.00 |             |
|    6 | MATERIALIZED | t2           | ref    | id2,b         | b            | 2       | const       |    2 |   100.00 |             |
|    5 | MATERIALIZED | t2           | ref    | id2,b         | b            | 2       | const       |    2 |   100.00 |             |
|    4 | MATERIALIZED | t2           | ref    | id2,b         | b            | 2       | const       |    2 |   100.00 |             |
|    3 | MATERIALIZED | t2           | ref    | id2,b         | b            | 2       | const       |    2 |   100.00 |             |
|    2 | MATERIALIZED | t2           | ref    | id2,b         | b            | 2       | const       |    2 |   100.00 |             |
+------+--------------+--------------+--------+---------------+--------------+---------+-------------+------+----------+-------------+
21 rows in set, 11 warnings (5.256 sec)
 
Note (Code 1276): Field or reference 'test.t1.id' of SELECT #2 was resolved in SELECT #1
Note (Code 1276): Field or reference 'test.t1.id' of SELECT #3 was resolved in SELECT #1
Note (Code 1276): Field or reference 'test.t1.id' of SELECT #4 was resolved in SELECT #1
Note (Code 1276): Field or reference 'test.t1.id' of SELECT #5 was resolved in SELECT #1
Note (Code 1276): Field or reference 'test.t1.id' of SELECT #6 was resolved in SELECT #1
Note (Code 1276): Field or reference 'test.t1.id' of SELECT #7 was resolved in SELECT #1
Note (Code 1276): Field or reference 'test.t1.id' of SELECT #8 was resolved in SELECT #1
Note (Code 1276): Field or reference 'test.t1.id' of SELECT #9 was resolved in SELECT #1
Note (Code 1276): Field or reference 'test.t1.id' of SELECT #10 was resolved in SELECT #1
Note (Code 1276): Field or reference 'test.t1.id' of SELECT #11 was resolved in SELECT #1
Note (Code 1003): select count(0) AS `COUNT(*)` from `test`.`t1` semi join (`test`.`t2`) semi join (`test`.`t2`) semi join (`test`.`t2`) semi join (`test`.`t2`) semi join (`test`.`t2`) semi join (`test`.`t2`) semi join (`test`.`t2`) semi join (`test`.`t2`) semi join (`test`.`t2`) semi join (`test`.`t2`) where `test`.`t2`.`b` = 1 and `test`.`t1`.`id` = `test`.`t2`.`id2` and `test`.`t2`.`b` = 2 and `test`.`t2`.`b` = 3 and `test`.`t2`.`b` = 4 and `test`.`t2`.`b` = 5 and `test`.`t2`.`b` = 6 and `test`.`t2`.`b` = 7 and `test`.`t2`.`b` = 8 and `test`.`t2`.`b` = 9 and `test`.`t2`.`b` = 10

Comment by Igor Babaev [ 2019-05-21 ]

Hi Tom,

MariaDB 10.0 introduced a new subquery optimization called 'EXISTS-to-IN Optimization' that all MySQL versions still does not have.
You query is subject to this optimization. As a result of it the server transforms the query into the following one with semi-join IN subqueries:

SELECT COUNT(*) FROM table_parent AS p WHERE 1
	AND p.id IN (SELECT c.parent_id FROM table_child AS c WHERE c.value = 1)
	AND p.id IN (SELECT c.parent_id FROM table_child AS c WHERE c.value = 2)
	AND p.id IN (SELECT c.parent_id FROM table_child AS c WHERE c.value = 3)
	AND p.id IN (SELECT c.parent_id FROM table_child AS c WHERE c.value = 4)
	AND p.id IN (SELECT c.parent_id FROM table_child AS c WHERE c.value = 5)
	AND p.id IN (SELECT c.parent_id FROM table_child AS c WHERE c.value = 6)
	AND p.id IN (SELECT c.parent_id FROM table_child AS c WHERE c.value = 7)
	AND p.id IN (SELECT c.parent_id FROM table_child AS c WHERE c.value = 8)
	AND p.id IN (SELECT c.parent_id FROM table_child AS c WHERE c.value = 9)
	AND p.id IN (SELECT c.parent_id FROM table_child AS c WHERE c.value = 10)
	AND p.id IN (SELECT c.parent_id FROM table_child AS c WHERE c.value = 11)
	AND p.id IN (SELECT c.parent_id FROM table_child AS c WHERE c.value = 12)
	AND p.id IN (SELECT c.parent_id FROM table_child AS c WHERE c.value = 13)
	AND p.id IN (SELECT c.parent_id FROM table_child AS c WHERE c.value = 14)
	AND p.id IN (SELECT c.parent_id FROM table_child AS c WHERE c.value = 15)

The latter is subject to semi-join conversion that transforms it into the sequence of 14 semi-join operations. As the semi-join operation is commutative the optimizer has to evaluate 15! permutations. No wonder it takes a lot of time if you use the default setting for optimizer_search_depth[=62]. If you change the setting to 0 the optimizer will choose a proper number for search depth (now it's 7).
Alternatively you can switch the EXISTS-to-IN Optimization off by setting:

set optimizer_switch='exists_to_in=off';

Unfortunately now neither MariaDB nor MySQL cost-base optimizer can take into account the time spent by the optimizer to choose an optimal plan. If you submit the converted query to MySQL optimizer it will 'hang' as well as in MariaDB optimizer.

If you are satisfied with my comment please let me know and I'll close the tickets as 'Not a Bug'.
(just in case I've checked that with exists_to_in=off MariaDB 10.3 returns the answer in no time).

Comment by Tom Dolezal [ 2019-05-22 ]

Hi Igor,

thanks for detailed explanaition, but I don't understand why MariaDB 10 does this "optimization" and converts EXISTS to IN if the result is much more time-cost than older version of optimizer in MariaDB 5 without this optimization. And second, shouldn't be the default value of optimizer_search_depth 0 if the current default value 62 is not optimal now?

You right. The settings optimizer_search_depth to 0 si much more better but still little bit slower than older version of MySQL (0,216 sec. vs. 0,000 sec.)

Comment by Igor Babaev [ 2019-05-22 ]

Tom,

1. Do you understand why MySQL/MariaDB optimizer does semi-join optimizations?
2. Do you realize that the query in my comment Q2 is equivalent to your query Q1?
3. Do you see that the optimizer of MySQL is slow with Q2 for the same reason as the optimizer of MariaDB is slow with Q1?
4. Did you try set @@optimizer_switch='exists_to_in=off' in MariaDB?
5. Can I close the ticket?

Comment by Tom Dolezal [ 2019-05-22 ]

Igor,

1. Yes, you describe it at https://mariadb.com/kb/en/library/semi-join-subquery-optimizations/
2. No, it's not about this query. Yes, I know that we can make the same result by many other and more effective ways. We have been solving a difference between MySQL and MariaDB.
3. Yes
4. No, we have set optimizer_search_depth to 0.
5. Yes, thanks

Comment by Igor Babaev [ 2019-05-22 ]

Tom,
With @@optimizer_switch='exists_to_in=off' you will have absolutely the same execution plan
as MySQL and as a consequence the same performance. Here you practically does not spend any time for optimization.
With @@optimizer_search_depth=0 you'll probably have not so good plan and besides you'll still spend some significant amount of time to search for the best plan.

Comment by Igor Babaev [ 2019-05-23 ]

See my explanations in the comments

Generated at Thu Feb 08 08:52:24 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.