[MDEV-9964] Optimize DISTINCT away when it is not needed Created: 2016-04-21 Updated: 2021-12-08 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | Optimizer |
| Fix Version/s: | None |
| Type: | Task | Priority: | Minor |
| Reporter: | Erik Cederstrand | Assignee: | Sergei Petrunia |
| Resolution: | Unresolved | Votes: | 3 |
| Labels: | optimizer | ||
| Description |
|
Consider two tables like this:
Now, I issue a query is this:
Assuming that the table has a sizable amount of rows, then the query is slow because the DISTINCT clause will likely force the query to use a temporary table. Now, it is technically impossible for the query to return non-unique rows even if the DISTINCT clause is missing. If the left join matched nothing, then t2.i is unique because it is a PK. If the left join does match, then the selected combination (t2.i, t1.j, t1.l) is unique because we defined t2.i=t1.i and (t1.i, t1.j, t1.l) is a PK. In fact, removing DISTINCT makes the query 100x faster in my real-world example. Normally, I would simply remove the DISTINCT, but this is a query issued by software over which I have no control. It would be really cool if the optimizer could detect cases where DISTINCT is unnecessary. |
| Comments |
| Comment by Elena Stepanova [ 2016-04-22 ] |
|
MySQL (as of 5.7.12) does not have this optimization either, so it's more of a feature request than of a bug. |
| Comment by Erik Cederstrand [ 2016-04-22 ] |
|
I agree, I just didn't see "Feature request" in the "Type" dropdown |