[MCOL-6123] Improve performance of rewritten DISTINCT queries - Jira

XML

Word

Printable

Details

Type: New Feature
Status: Open (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: MDB Plugin
Labels:
None

Description

As for now, DISTINCT queries are rewritten using rule-based optimizer in the following way:

SELECT DISTINCT col1, col2, ..., colN

FROM <from>

WHERE <where clause>

HAVING <having clause>

ORDER BY ordCol1, ordCol2, ..., ordColN

becomes

SELECT col1, col2, ..., colN

FROM

    SELECT DISTINCT col1, col2, ..., colN, ordCol1, ordCol2, ..., ordColN

    FROM <from>

    WHERE <where clause>

    HAVING <having clause>

  ) $added_sub_0

GROUP BY 1, 2, ..., N

ORDER BY `$added_sub_o`.ordCol1, ..., `$added_sub_0`.ordColN

All ORDER BY columns of the original query that are not in the result set are added to resultset, so we can to use it in the ORDER BY of the outer aggregate query.
Unfortunately, this means that every ORDER BY column will be calculated for every row in the inner query, which is suboptimal.

Looks like a proper way to handle ORDER BY columns is to put all simple columns involved in the ORDER BY expressions into the result set of the inner query, and to use a full functional/arithmetic expression in the ORDER BY of the outer query. In this way, expressions will be calculated only once per aggregated row. This also applies to the HAVING clause that should be moved to the outer query.

Attachments

Issue Links

relates to

MCOL-5250 Disk-based DISTINCT

Closed

Activity

People

Assignee:: Unassigned

Reporter:: Aleksei Antipovskii

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 2025-08-01 10:38

Updated:: 2025-08-01 11:31

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.