[MDEV-4119] Remove duplicate ORDER BY/GROUP BY column references Created: 2013-01-31  Updated: 2023-12-05

Status: Stalled
Project: MariaDB Server
Component/s: None
Fix Version/s: 10.1

Type: Task Priority: Minor
Reporter: Timour Katchaounov (Inactive) Assignee: Michael Widenius
Resolution: Unresolved Votes: 2
Labels: optimizer

Attachments: File md641.diff    
Sprint: 10.1.6-2, 10.1.8-1

 Description   

Currently the optimizer doesn't cannot take into account duplicate column references in the GROUP/ORDER BY clauses. As a result, queries like:

select a col1, a col2 from t1 group by col1, col2;

will not use an index on t1(a), even though the duplicate columns
do not change the meaning of the query.

This is a simple task to remove duplicates in the order/group lists.
Care should be taken of queries with the ROLLUP modifier.

There is already an implementation attached to this task. The implementation has to be verified/changed so that:

  • it takes care of queries with ROLLUP,
  • it is verified that the place where duplicate removal is done covers all cases when it can and should be done,
  • there is sufficient coverage with test cases.


 Comments   
Comment by Michael Widenius [ 2015-06-29 ]

I added deleting of duplicate fields in GROUP BY and ORDER BY in 2014, so this part of the patch is irrelevant.

I did however notice that removing duplicates and constants was done too late for some optimizations to have effects, which resulted in not optimal query plans.

Those that was affected was:

SELECT SQL_BIG_RESULT col1 AS field1, col1 AS field2 FROM t1 GROUP BY field1, field2

  • This did an extra filesort

select col1 f1, col1 f2 from t1 group by f1, f2
select col1 f1, col1 f2 from t1 group by f1, 1+1;

  • These did not do 'Using index for group by'

Will now work on what can be easily fixed

Comment by Michael Widenius [ 2015-07-02 ]

Have now fixed the following problems:

  • SELECT on one table with group by will now remove duplicated fields and constants from GROUP BY before doing other optimizations.
  • Added better test if we are using filesort or not for EXPLAIN

At the same time I also fixed a bug when we where referring to a calculated field in HAVING:

Example:
select a, round(rand(100)*10) r2, sum(1) r1 from t1 group by a having r1>1 and r2<=2;

In some case, in the old code, r2 in having was calculated independent of r2 in select, which cause the select to display rows with r2 > 2.

Comment by Julien Fritsch [ 2023-12-05 ]

Automated message:
----------------------------
Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

Generated at Thu Feb 08 06:53:53 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.