Details
-
New Feature
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
As for now, DISTINCT queries are rewritten using rule-based optimizer in the following way:
SELECT DISTINCT col1, col2, ..., colN |
FROM <from> |
WHERE <where clause> |
HAVING <having clause> |
ORDER BY ordCol1, ordCol2, ..., ordColN |
becomes
SELECT col1, col2, ..., colN |
FROM
|
(
|
SELECT DISTINCT col1, col2, ..., colN, ordCol1, ordCol2, ..., ordColN |
FROM <from> |
WHERE <where clause> |
HAVING <having clause> |
) $added_sub_0
|
GROUP BY 1, 2, ..., N |
ORDER BY `$added_sub_o`.ordCol1, ..., `$added_sub_0`.ordColN |
All ORDER BY columns of the original query that are not in the result set are added to resultset, so we can to use it in the ORDER BY of the outer aggregate query.
Unfortunately, this means that every ORDER BY column will be calculated for every row in the inner query, which is suboptimal.
Looks like a proper way to handle ORDER BY columns is to put all simple columns involved in the ORDER BY expressions into the result set of the inner query, and to use a full functional/arithmetic expression in the ORDER BY of the outer query. In this way, expressions will be calculated only once per aggregated row. This also applies to the HAVING clause that should be moved to the outer query.
Attachments
Issue Links
- relates to
-
MCOL-5250 Disk-based DISTINCT
-
- In Progress
-