Details
-
Task
-
Status: Open (View Workflow)
-
Minor
-
Resolution: Unresolved
-
None
Description
Consider two tables like this:
CREATE TABLE `t1` (
|
`i` int(11) NOT NULL,
|
`j` int(11) NOT NULL,
|
`l` int(11) NOT NULL,
|
`m` int(11) DEFAULT NULL,
|
PRIMARY KEY (`i`,`j`,`l`)
|
);
|
|
CREATE TABLE `t2` (
|
`i` int(11) NOT NULL,
|
`k` int(11) DEFAULT NULL,
|
PRIMARY KEY (`i`)
|
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
|
Now, I issue a query is this:
SELECT DISTINCT
|
t2.i, t2.k, t1.j, t1.l, t1.m
|
FROM t2 LEFT JOIN t1 ON t2.i=t1.i
|
WHERE
|
t2.k < 123;
|
Assuming that the table has a sizable amount of rows, then the query is slow because the DISTINCT clause will likely force the query to use a temporary table.
Now, it is technically impossible for the query to return non-unique rows even if the DISTINCT clause is missing. If the left join matched nothing, then t2.i is unique because it is a PK. If the left join does match, then the selected combination (t2.i, t1.j, t1.l) is unique because we defined t2.i=t1.i and (t1.i, t1.j, t1.l) is a PK.
In fact, removing DISTINCT makes the query 100x faster in my real-world example. Normally, I would simply remove the DISTINCT, but this is a query issued by software over which I have no control. It would be really cool if the optimizer could detect cases where DISTINCT is unnecessary.
MySQL (as of 5.7.12) does not have this optimization either, so it's more of a feature request than of a bug.