Details
-
Task
-
Status: Open (View Workflow)
-
Minor
-
Resolution: Unresolved
-
None
Description
Consider two tables like this:
CREATE TABLE `t1` (
|
`i` int(11) NOT NULL,
|
`j` int(11) NOT NULL,
|
`l` int(11) NOT NULL,
|
`m` int(11) DEFAULT NULL,
|
PRIMARY KEY (`i`,`j`,`l`)
|
);
|
 |
CREATE TABLE `t2` (
|
`i` int(11) NOT NULL,
|
`k` int(11) DEFAULT NULL,
|
PRIMARY KEY (`i`)
|
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
|
Now, I issue a query is this:
SELECT DISTINCT
|
t2.i, t2.k, t1.j, t1.l, t1.m
|
FROM t2 LEFT JOIN t1 ON t2.i=t1.i
|
WHERE
|
t2.k < 123;
|
Assuming that the table has a sizable amount of rows, then the query is slow because the DISTINCT clause will likely force the query to use a temporary table.
Now, it is technically impossible for the query to return non-unique rows even if the DISTINCT clause is missing. If the left join matched nothing, then t2.i is unique because it is a PK. If the left join does match, then the selected combination (t2.i, t1.j, t1.l) is unique because we defined t2.i=t1.i and (t1.i, t1.j, t1.l) is a PK.
In fact, removing DISTINCT makes the query 100x faster in my real-world example. Normally, I would simply remove the DISTINCT, but this is a query issued by software over which I have no control. It would be really cool if the optimizer could detect cases where DISTINCT is unnecessary.