[MDEV-4622] Union of uncorrelated subqueries becomes dependant Created: 2013-06-06  Updated: 2017-09-25  Resolved: 2017-09-25

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 5.5.31
Fix Version/s: N/A

Type: Bug Priority: Minor
Reporter: Piotr Blasiak Assignee: Sergei Petrunia
Resolution: Incomplete Votes: 0
Labels: need_feedback, optimizer
Environment:

Probably all



 Description   

This query:

select * from a where id in (select id from a)

correctly materializes the subquery before running the outer query. However, this query:

select * from a where id in (select id from a union all select id from a)

Does not. EXPLAIN shows the UNION and the subqueries as DEPENDANT, which is wrong as they are obviously uncorrelated.

The fix for this is to use derived tables:

select * from a where id in (select * from (select id from a union all select id from a) dq)

We have 2 problems here:
1) Obviously the union is causing the subqueries to become dependant and it in itself is dependant
2) Another problem is that without the union the select is using the index, with the union it is using where instead - a serious performance killer as well.

Question is, is the derived tables fix the only workaround? And is this a bug or a limitation in the optimizer?

Note: the a table is a simple table with only id primary key column. I have a real-life case where this problem is causing some really bad performance, and I created this case just to illustrate the problem.



 Comments   
Comment by Elena Stepanova [ 2013-06-06 ]

Sergei,

Could you please consult on this or reassign to somebody who can?

-----------------

Quick test (on ~50K rows in the table, the initial query ~6 times slower than the modified one):

MariaDB [test]> explain extended select * from a where id in (select id from a union all select id from a);
+------+--------------------+------------+--------+---------------+---------+---------+------+-------+----------+--------------------------+
| id   | select_type        | table      | type   | possible_keys | key     | key_len | ref  | rows  | filtered | Extra                    |
+------+--------------------+------------+--------+---------------+---------+---------+------+-------+----------+--------------------------+
|    1 | PRIMARY            | a          | index  | NULL          | PRIMARY | 4       | NULL | 49152 |   100.00 | Using where; Using index |
|    2 | DEPENDENT SUBQUERY | a          | eq_ref | PRIMARY       | PRIMARY | 4       | func |     1 |   100.00 | Using index              |
|    3 | DEPENDENT UNION    | a          | eq_ref | PRIMARY       | PRIMARY | 4       | func |     1 |   100.00 | Using index              |
| NULL | UNION RESULT       | <union2,3> | ALL    | NULL          | NULL    | NULL    | NULL |  NULL |     NULL |                          |
+------+--------------------+------------+--------+---------------+---------+---------+------+-------+----------+--------------------------+
4 rows in set, 1 warning (0.00 sec)
 
MariaDB [test]> show warnings;
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                                                                                                                                                                     |
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | select `test`.`a`.`id` AS `id` from `test`.`a` where <expr_cache><`test`.`a`.`id`>(<in_optimizer>(`test`.`a`.`id`,<exists>(select `test`.`a`.`id` from `test`.`a` where (<cache>(`test`.`a`.`id`) = `test`.`a`.`id`) union all select `test`.`a`.`id` from `test`.`a` where (<cache>(`test`.`a`.`id`) = `test`.`a`.`id`)))) |
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
 
MariaDB [test]> explain extended select * from a where id in (select * from (select id from a union all select id from a) dq);
+------+--------------+-------------+--------+---------------+--------------+---------+------+-------+----------+-------------+
| id   | select_type  | table       | type   | possible_keys | key          | key_len | ref  | rows  | filtered | Extra       |
+------+--------------+-------------+--------+---------------+--------------+---------+------+-------+----------+-------------+
|    1 | PRIMARY      | a           | index  | PRIMARY       | PRIMARY      | 4       | NULL | 49152 |   100.00 | Using index |
|    1 | PRIMARY      | <subquery2> | eq_ref | distinct_key  | distinct_key | 4       | func |     1 |   100.00 |             |
|    2 | MATERIALIZED | <derived3>  | ALL    | NULL          | NULL         | NULL    | NULL | 98304 |   100.00 |             |
|    3 | DERIVED      | a           | index  | NULL          | PRIMARY      | 4       | NULL | 49152 |   100.00 | Using index |
|    4 | UNION        | a           | index  | NULL          | PRIMARY      | 4       | NULL | 49152 |   100.00 | Using index |
| NULL | UNION RESULT | <union3,4>  | ALL    | NULL          | NULL         | NULL    | NULL |  NULL |     NULL |             |
+------+--------------+-------------+--------+---------------+--------------+---------+------+-------+----------+-------------+
6 rows in set, 1 warning (0.01 sec)
 
MariaDB [test]> show warnings;
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                           |
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | select `test`.`a`.`id` AS `id` from `test`.`a` semi join ((select `test`.`a`.`id` AS `id` from `test`.`a` union all select `test`.`a`.`id` AS `id` from `test`.`a`) `dq`) where 1 |
+-------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
 
MariaDB [test]> pager cat > /dev/null;
PAGER set to 'cat > /dev/null'
MariaDB [test]> select * from a where id in (select id from a union all select id from a);
49152 rows in set (3.20 sec)
 
MariaDB [test]> select * from a where id in (select * from (select id from a union all select id from a) dq);
49152 rows in set (0.64 sec)
 
MariaDB [test]> select * from a where id in (select id from a union all select id from a);
49152 rows in set (3.23 sec)
 
MariaDB [test]> select * from a where id in (select * from (select id from a union all select id from a) dq);
49152 rows in set (0.65 sec)
 
MariaDB [test]> pager;
Default pager wasn't set, using stdout.
MariaDB [test]> show create table a;
+-------+---------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                |
+-------+---------------------------------------------------------------------------------------------------------------------------------------------+
| a     | CREATE TABLE `a` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=49153 DEFAULT CHARSET=latin1 |
+-------+---------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)
 
MariaDB [test]> select count(*) from a;
+----------+
| count(*) |
+----------+
|    49152 |
+----------+
1 row in set (0.00 sec)
 
MariaDB [test]> select @@version;
+----------------------+
| @@version            |
+----------------------+
| 5.5.31-MariaDB-debug |
+----------------------+
1 row in set (0.00 sec)

Comment by Piotr Blasiak [ 2013-06-06 ]

After further testing I realized I could make the outer query use indexes by wrapping the subqueries themselves in the "select * from (...) alias" derived table fix:

explain select * from a where id in (select * from (select * from (select id from a) dq1 union all select * from (select id from a) dq2) dq3)

Comment by Sergei Petrunia [ 2013-07-16 ]

Looked at the code.

About Materialization:
The code considers Materialization (and/or SJ-Materialization) strategy only
when the subquery is a single SELECT (i.e. not a UNION). This is a limitation
in the optimizer.

When Materialialization is not applicable, the only remaining strategy for an
IN subquery is the IN->EXISTS transformation. IN->EXISTS transformation pushes
IN-equality down into the subquery which makes the subquery "DEPENDENT".

Pushing down IN-equality allows parts of subquery to use index

MariaDB [j6]> explain select * from a where id in (select id from a union  select id from a);
+------+--------------------+------------+--------+---------------+---------+---------+------+-------+--------------------------+
| id   | select_type        | table      | type   | possible_keys | key     | key_len | ref  | rows  | Extra                    |
+------+--------------------+------------+--------+---------------+---------+---------+------+-------+--------------------------+
|    1 | PRIMARY            | a          | index  | NULL          | PRIMARY | 4       | NULL | 51200 | Using where; Using index |
|    2 | DEPENDENT SUBQUERY | a          | eq_ref | PRIMARY       | PRIMARY | 4       | func |     1 | Using index              |
|    3 | DEPENDENT UNION    | a          | eq_ref | PRIMARY       | PRIMARY | 4       | func |     1 | Using index              |
| NULL | UNION RESULT       | <union2,3> | ALL    | NULL          | NULL    | NULL    | NULL |  NULL |                          |
+------+--------------------+------------+--------+---------------+---------+---------+------+-------+--------------------------+
4 rows in set (0.00 sec)

Note the "eq_ref" in line #2 and line #3. It means index is used inside the UNION.

Comment by Sergei Petrunia [ 2013-07-16 ]

" 2) Another problem is that without the union the select is using the index, with the union it is using where instead - a serious performance killer as well. "

I'm unable to observe this. Piotr, can you post a query where that happens (both the variant with UNION and without), and EXPLAIN outputs?

Generated at Thu Feb 08 06:57:49 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.