[MDEV-14855] UNION subquery optimization Created: 2018-01-02  Updated: 2018-01-30  Resolved: 2018-01-30

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.2.11
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: yoch Assignee: Alice Sherepa
Resolution: Not a Bug Votes: 0
Labels: optimizer, subquery
Environment:

Debian 9


Attachments: File my.cnf    

 Description   

The optimizer is not able to handle UNION subqueries.

For example :

SELECT * FROM user
WHERE user.id IN (
	SELECT idcontact FROM contact WHERE iduser = %s
	UNION
	SELECT iduser FROM contact WHERE idcontact = %s);

is extremly slow, because the main query is of type ALL instead of using index (if I run two queries separately, its extremly fast).

Here is the result of EXPLAIN :

id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY user ALL NULL NULL NULL NULL 570642 Using where
2 DEPENDENT SUBQUERY contact eq_ref PRIMARY,idcontact PRIMARY 8 const,func 1 Using index
3 DEPENDENT UNION contact eq_ref PRIMARY,idcontact PRIMARY 8 func,const 1 Using index
NULL UNION RESULT <union2,3> ALL NULL NULL NULL NULL NULL  

(NOTE: the second line give a wrong numbers of rows, it's greater for me)

And this is the EXPLAIN for a simpler query without union :

id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY contact ref PRIMARY,idcontact PRIMARY 4 const 35 Using index
1 PRIMARY user eq_ref PRIMARY PRIMARY 4 contact.idcontact 1  

Using two clauses joined with OR doesn't work either.



 Comments   
Comment by Alice Sherepa [ 2018-01-03 ]

please add if it is possible exact entry from slow log, what is %s - what function is there;
how long does this query takes and how long it takes without UNION;
output of SHOW CREATE TABLE for involved tables (user, contact);
please explain what exactly does not work with OR- long time of execution, wrong result, smth else?
cnf file(s);

Comment by yoch [ 2018-01-03 ]

The OR query is also too slow for me. The results are consistent in all the cases.

Here a minimal working example (we have 500000 users in our base, and a few hundred of contacts) :

CREATE TABLE `user` (
	`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
	PRIMARY KEY (`id`) USING HASH
) ENGINE=InnoDB;
 
CREATE TABLE `contact` (
	`iduser` INT(10) UNSIGNED NOT NULL,
	`idcontact` INT(10) UNSIGNED NOT NULL,
	PRIMARY KEY (`iduser`, `idcontact`),
	INDEX `idcontact` (`idcontact`),
	CONSTRAINT `contact_ibfk_1` FOREIGN KEY (`iduser`) REFERENCES `user` (`id`) ON DELETE CASCADE,
	CONSTRAINT `contact_ibfk_2` FOREIGN KEY (`idcontact`) REFERENCES `user` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB;

And here the results of my queries (curiously, the third with OR do better than in my real case, but it's still slow) :

SELECT * FROM user
WHERE user.id IN (
	SELECT idcontact FROM contact WHERE iduser = 44
	UNION
	SELECT iduser FROM contact WHERE idcontact = 44);
/* Lignes affectées : 0  Lignes trouvées : 35  Avertissements : 0  Durée pour 1 requête : 18,875 sec. */
 
SELECT * FROM user
WHERE user.id IN (
	SELECT idcontact FROM contact WHERE iduser = 44)
OR user.id IN (
	SELECT iduser FROM contact WHERE idcontact = 44);
/* Lignes affectées : 0  Lignes trouvées : 35  Avertissements : 0  Durée pour 1 requête : 1,516 sec. */
	
SELECT * FROM user
WHERE user.id IN (
	SELECT idcontact FROM contact WHERE iduser = 44);
SELECT * FROM user
WHERE user.id IN (
	SELECT iduser FROM contact WHERE idcontact = 44);
/* Lignes affectées : 0  Lignes trouvées : 35  Avertissements : 0  Durée pour 2 requêtes : 0,281 sec. */
 
SELECT * FROM user
JOIN (
	SELECT idcontact AS id FROM contact WHERE iduser = 44
	UNION
	SELECT iduser AS id FROM contact WHERE idcontact = 44
) ct
ON user.id = ct.id;
/* Lignes affectées : 0  Lignes trouvées : 35  Avertissements : 0  Durée pour 1 requête : 0,093 sec. */

Also note that I currently use a JOIN query (the last one) which performs well in my case.

Comment by Alice Sherepa [ 2018-01-08 ]

Please attach your .cnf file(s)
When I am trying to reproduce, I get query plan, that use index for table t1(user), execution time is ca. 3s for 1000000rows.

+------+--------------------+------------+--------+---------------+---------+---------+------------+---------+--------------------------+
| id   | select_type        | table      | type   | possible_keys | key     | key_len | ref        | rows    | Extra                    |
+------+--------------------+------------+--------+---------------+---------+---------+------------+---------+--------------------------+
|    1 | PRIMARY            | t1         | index  | NULL          | PRIMARY | 4       | NULL       | 1000000 | Using where; Using index |
|    2 | DEPENDENT SUBQUERY | t2         | eq_ref | PRIMARY,idt2  | PRIMARY | 8       | const,func |       1 | Using index              |
|    3 | DEPENDENT UNION    | t2         | eq_ref | PRIMARY,idt2  | PRIMARY | 8       | func,const |       1 | Using index              |
| NULL | UNION RESULT       | <union2,3> | ALL    | NULL          | NULL    | NULL    | NULL       |    NULL |                          |
+------+--------------------+------------+--------+---------------+---------+---------+------------+---------+--------------------------+
4 rows in set (0.00 sec)

Comment by yoch [ 2018-01-08 ]

Here is my cnf.

But in your output, I can see in the first line that 1000000 rows were scanned, so it's clear that index doesn't in use here (also, the 3s execution time is still slow).

my.cnf

Comment by Alice Sherepa [ 2018-01-23 ]

Hi yoch,
I can see that using union in subqueries leads to a low performance. Documentation of MariaDB and Mysql suggests
only in_to_exists optimization (that is used) or query tuning, rewriting it to joins. I hope that developers will take a look at it
and decide whether it is appropriate to optimize this cases.

CREATE TABLE `t1` (`id` int NOT NULL PRIMARY KEY);
insert into t1 (id) SELECT seq FROM seq_1_to_1000000;
SELECT * FROM t1  WHERE t1.id IN (SELECT 1 as id UNION SELECT 2 UNION SELECT 3);	
explain extended SELECT * FROM t1  WHERE t1.id IN (SELECT 1 UNION SELECT 2 UNION SELECT 3);

MariaDB [test]> SELECT * FROM t1  WHERE t1.id IN (SELECT 1 as id UNION SELECT 2 UNION SELECT 3);
+----+
| id |
+----+
|  1 |
|  2 |
|  3 |
+----+
3 rows in set (1.45 sec)
 
MariaDB [test]> explain extended SELECT * FROM t1  WHERE t1.id IN (SELECT 1 UNION SELECT 2 UNION SELECT 3);
+------+--------------------+--------------+-------+---------------+---------+---------+------+---------+----------+--------------------------+
| id   | select_type        | table        | type  | possible_keys | key     | key_len | ref  | rows    | filtered | Extra                    |
+------+--------------------+--------------+-------+---------------+---------+---------+------+---------+----------+--------------------------+
|    1 | PRIMARY            | t1           | index | NULL          | PRIMARY | 4       | NULL | 1000000 |   100.00 | Using where; Using index |
|    2 | DEPENDENT SUBQUERY | NULL         | NULL  | NULL          | NULL    | NULL    | NULL |    NULL |     NULL | No tables used           |
|    3 | DEPENDENT UNION    | NULL         | NULL  | NULL          | NULL    | NULL    | NULL |    NULL |     NULL | No tables used           |
|    4 | DEPENDENT UNION    | NULL         | NULL  | NULL          | NULL    | NULL    | NULL |    NULL |     NULL | No tables used           |
| NULL | UNION RESULT       | <union2,3,4> | ALL   | NULL          | NULL    | NULL    | NULL |    NULL |     NULL |                          |
+------+--------------------+--------------+-------+---------------+---------+---------+------+---------+----------+--------------------------+
5 rows in set, 1 warning (0.00 sec)
 
Note (Code 1003): select `test`.`t1`.`id` AS `id` from `test`.`t1` where <expr_cache><`test`.`t1`.`id`>(<in_optimizer>(`test`.`t1`.`id`,<exists>(select 1 having <cache>(`test`.`t1`.`id`) = <ref_null_helper>(1) union select 2 having <cache>(`test`.`t1`.`id`) = <ref_null_helper>(2) union select 3 having <cache>(`test`.`t1`.`id`) = <ref_null_helper>(3))))

Generated at Thu Feb 08 08:16:48 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.