[MDEV-9964] Optimize DISTINCT away when it is not needed Created: 2016-04-21  Updated: 2021-12-08

Status: Open
Project: MariaDB Server
Component/s: Optimizer
Fix Version/s: None

Type: Task Priority: Minor
Reporter: Erik Cederstrand Assignee: Sergei Petrunia
Resolution: Unresolved Votes: 3
Labels: optimizer


 Description   

Consider two tables like this:

CREATE TABLE `t1` (
 `i` int(11) NOT NULL,
 `j` int(11) NOT NULL,
 `l` int(11) NOT NULL,
 `m` int(11) DEFAULT NULL,
 PRIMARY KEY (`i`,`j`,`l`)
);
 
CREATE TABLE `t2` (
 `i` int(11) NOT NULL,
 `k` int(11) DEFAULT NULL,
 PRIMARY KEY (`i`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Now, I issue a query is this:

SELECT DISTINCT
    t2.i, t2.k, t1.j, t1.l, t1.m
FROM t2 LEFT JOIN t1 ON t2.i=t1.i
WHERE
    t2.k < 123;

Assuming that the table has a sizable amount of rows, then the query is slow because the DISTINCT clause will likely force the query to use a temporary table.

Now, it is technically impossible for the query to return non-unique rows even if the DISTINCT clause is missing. If the left join matched nothing, then t2.i is unique because it is a PK. If the left join does match, then the selected combination (t2.i, t1.j, t1.l) is unique because we defined t2.i=t1.i and (t1.i, t1.j, t1.l) is a PK.

In fact, removing DISTINCT makes the query 100x faster in my real-world example. Normally, I would simply remove the DISTINCT, but this is a query issued by software over which I have no control. It would be really cool if the optimizer could detect cases where DISTINCT is unnecessary.



 Comments   
Comment by Elena Stepanova [ 2016-04-22 ]

MySQL (as of 5.7.12) does not have this optimization either, so it's more of a feature request than of a bug.

Comment by Erik Cederstrand [ 2016-04-22 ]

I agree, I just didn't see "Feature request" in the "Type" dropdown

Generated at Thu Feb 08 07:38:39 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.