Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-4119

Remove duplicate ORDER BY/GROUP BY column references

Details

    • 10.1.6-2, 10.1.8-1

    Description

      Currently the optimizer doesn't cannot take into account duplicate column references in the GROUP/ORDER BY clauses. As a result, queries like:

      select a col1, a col2 from t1 group by col1, col2;

      will not use an index on t1(a), even though the duplicate columns
      do not change the meaning of the query.

      This is a simple task to remove duplicates in the order/group lists.
      Care should be taken of queries with the ROLLUP modifier.

      There is already an implementation attached to this task. The implementation has to be verified/changed so that:

      • it takes care of queries with ROLLUP,
      • it is verified that the place where duplicate removal is done covers all cases when it can and should be done,
      • there is sufficient coverage with test cases.

      Attachments

        Activity

          monty Michael Widenius added a comment - - edited

          I added deleting of duplicate fields in GROUP BY and ORDER BY in 2014, so this part of the patch is irrelevant.

          I did however notice that removing duplicates and constants was done too late for some optimizations to have effects, which resulted in not optimal query plans.

          Those that was affected was:

          SELECT SQL_BIG_RESULT col1 AS field1, col1 AS field2 FROM t1 GROUP BY field1, field2

          • This did an extra filesort

          select col1 f1, col1 f2 from t1 group by f1, f2
          select col1 f1, col1 f2 from t1 group by f1, 1+1;

          • These did not do 'Using index for group by'

          Will now work on what can be easily fixed

          monty Michael Widenius added a comment - - edited I added deleting of duplicate fields in GROUP BY and ORDER BY in 2014, so this part of the patch is irrelevant. I did however notice that removing duplicates and constants was done too late for some optimizations to have effects, which resulted in not optimal query plans. Those that was affected was: SELECT SQL_BIG_RESULT col1 AS field1, col1 AS field2 FROM t1 GROUP BY field1, field2 This did an extra filesort select col1 f1, col1 f2 from t1 group by f1, f2 select col1 f1, col1 f2 from t1 group by f1, 1+1; These did not do 'Using index for group by' Will now work on what can be easily fixed

          Have now fixed the following problems:

          • SELECT on one table with group by will now remove duplicated fields and constants from GROUP BY before doing other optimizations.
          • Added better test if we are using filesort or not for EXPLAIN

          At the same time I also fixed a bug when we where referring to a calculated field in HAVING:

          Example:
          select a, round(rand(100)*10) r2, sum(1) r1 from t1 group by a having r1>1 and r2<=2;

          In some case, in the old code, r2 in having was calculated independent of r2 in select, which cause the select to display rows with r2 > 2.

          monty Michael Widenius added a comment - Have now fixed the following problems: SELECT on one table with group by will now remove duplicated fields and constants from GROUP BY before doing other optimizations. Added better test if we are using filesort or not for EXPLAIN At the same time I also fixed a bug when we where referring to a calculated field in HAVING: Example: select a, round(rand(100)*10) r2, sum(1) r1 from t1 group by a having r1>1 and r2<=2; In some case, in the old code, r2 in having was calculated independent of r2 in select, which cause the select to display rows with r2 > 2.

          Automated message:
          ----------------------------
          Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

          julien.fritsch Julien Fritsch added a comment - Automated message: ---------------------------- Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

          People

            monty Michael Widenius
            timour Timour Katchaounov (Inactive)
            Votes:
            2 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.