Details
-
Task
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
Putting conditions on the result of window functions requires nested queries. This unfortunately always causes a full table scan. But when the window function is applied to a properly indexed or ordered data set, it would be theoretically possible abort the scan under some conditions. The same applies to GROUP BY in subqueries.
CREATE TABLE `test` (
|
`uid` int(11) NOT NULL AUTO_INCREMENT, |
`wert` int(11) NOT NULL, |
PRIMARY KEY (`uid`)
|
);
|
INSERT INTO test (wert) VALUES (5),(4),(3),(2),(1) |
ANALYZE SELECT uid FROM test GROUP BY uid LIMIT 3 |
Here we obtain r_rows=3 because the GROUP is applied to an indexed column. The table scan can be aborted as soon as the LIMIT of 3 is reached.
But such optimizations do not occur on subqueries.
ANALYZE SELECT * FROM
|
(
|
SELECT uid FROM test GROUP BY uid
|
) tabelle LIMIT 3 |
Result: r_rows=5.
The same hold true for window functions. In case of window functions, such nested queries are highly relevant, because conditions on the result of window functions can only be applied in an outer query.
ANALYZE SELECT * FROM
|
(
|
SELECT uid,DENSE_RANK() OVER (ORDER BY uid) AS rank FROM test
|
) tabelle WHERE rank<=3 |
Result: r_rows=5. This result is the same when we replace "rank<=3" by "LIMIT 3".
The lack of optimization can cause low performance in applications such as pagination where the ranked/paginated field is not unique. E.g. I have a list of values 1,1,2,2,...,2,3,3,...,3,3,3,4,5,... and want to fetch the top rows until having n unique numbers and then abort the table scan.