Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-10859

Wrong result of aggregate window function in query with HAVING and no ORDER BY

Details

    Description

      It appears that when both ORDER BY and HAVING are present, HAVING is treated as WHERE clause.

      Note: I don't see anything about it in the standard, maybe it's not even a part of the standard at all, the behavior just does not look logical.

      create table empsalary (depname varchar(32), empno smallint primary key, salary int);
       insert into empsalary values  ('develop',1,5000),('develop',2,4000),('sales',3,'6000'),('sales',4,5000);
      

      No ORDER BY, no HAVING -- OK

      MariaDB [test]> SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname) FROM empsalary;
      +---------+-------+--------+-----------------------------------------+
      | depname | empno | salary | avg(salary) OVER (PARTITION BY depname) |
      +---------+-------+--------+-----------------------------------------+
      | develop |     1 |   5000 |                               4500.0000 |
      | develop |     2 |   4000 |                               4500.0000 |
      | sales   |     3 |   6000 |                               5500.0000 |
      | sales   |     4 |   5000 |                               5500.0000 |
      +---------+-------+--------+-----------------------------------------+
      

      ORDER BY, no HAVING -- OK

      MariaDB [test]> SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname) FROM empsalary order by depname;
      +---------+-------+--------+-----------------------------------------+
      | depname | empno | salary | avg(salary) OVER (PARTITION BY depname) |
      +---------+-------+--------+-----------------------------------------+
      | develop |     1 |   5000 |                               4500.0000 |
      | develop |     2 |   4000 |                               4500.0000 |
      | sales   |     4 |   5000 |                               5500.0000 |
      | sales   |     3 |   6000 |                               5500.0000 |
      +---------+-------+--------+-----------------------------------------+
      

      HAVING, no ORDER BY -- WRONG

      MariaDB [test]> SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname) FROM empsalary having empno > 1;
      +---------+-------+--------+-----------------------------------------+
      | depname | empno | salary | avg(salary) OVER (PARTITION BY depname) |
      +---------+-------+--------+-----------------------------------------+
      | develop |     2 |   4000 |                               4500.0000 |
      | sales   |     3 |   6000 |                               5500.0000 |
      | sales   |     4 |   5000 |                               5500.0000 |
      +---------+-------+--------+-----------------------------------------+
      3 rows in set (0.00 sec)
      

      ORDER BY and HAVING -- OK

      MariaDB [test]> SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname) FROM empsalary having empno > 1 order by depname;
      +---------+-------+--------+-----------------------------------------+
      | depname | empno | salary | avg(salary) OVER (PARTITION BY depname) |
      +---------+-------+--------+-----------------------------------------+
      | develop |     2 |   4000 |                               4000.0000 |
      | sales   |     4 |   5000 |                               5500.0000 |
      | sales   |     3 |   6000 |                               5500.0000 |
      +---------+-------+--------+-----------------------------------------+
      3 rows in set (0.00 sec)
      

      Attachments

        Issue Links

          Activity

            The problem is with the 3rd select.

            The correct result is always of the following form:
            1st compute table without window functions column.
            2nd compute the remaining window functions columns, using the results from the 1st computation.

            cvicentiu Vicențiu Ciorbaru added a comment - The problem is with the 3rd select. The correct result is always of the following form: 1st compute table without window functions column. 2nd compute the remaining window functions columns, using the results from the 1st computation.

            I agree, if we want to stick to a part of the standard (the one which says that having must be applied before window functions), then the 3rd query is incorrect.

            The problem, from my point of view, is that such query is already an extension of the standard, just like the whole "non-full-group-by" in MySQL is; and everyone knows how many problems partial group by caused and still causes, and how little actual benefit it brings.

            So, I would suggest that at least for new features, where we still have a choice, we stick to the standard and didn't allow potentially non-deterministic syntax unless it is specified by the standard or it is so extremely important for majority of users that it's worth the trouble.

            So, my general request, for this and probably many other winfunc-related issues, is to double-check whether the syntax is allowed by the standard, and if it's not, and we don't have a critical real-life use case for it, to forbid it altogether.

            elenst Elena Stepanova added a comment - I agree, if we want to stick to a part of the standard (the one which says that having must be applied before window functions), then the 3rd query is incorrect. The problem, from my point of view, is that such query is already an extension of the standard, just like the whole "non-full-group-by" in MySQL is; and everyone knows how many problems partial group by caused and still causes, and how little actual benefit it brings. So, I would suggest that at least for new features, where we still have a choice, we stick to the standard and didn't allow potentially non-deterministic syntax unless it is specified by the standard or it is so extremely important for majority of users that it's worth the trouble. So, my general request, for this and probably many other winfunc-related issues, is to double-check whether the syntax is allowed by the standard, and if it's not, and we don't have a critical real-life use case for it, to forbid it altogether.

            Hi Sergey!

            Can you please review a patch for this issue?
            http://lists.askmonty.org/pipermail/commits/2017-February/010688.html
            Thanks,
            Vicențiu

            cvicentiu Vicențiu Ciorbaru added a comment - Hi Sergey! Can you please review a patch for this issue? http://lists.askmonty.org/pipermail/commits/2017-February/010688.html Thanks, Vicențiu

            I have pushed the proposed patch into 10.2 as it seemed to cause no regressions and fixes the problems directly. If there are better approaches we can reconsider in a separate mdev.

            CC psergey

            cvicentiu Vicențiu Ciorbaru added a comment - I have pushed the proposed patch into 10.2 as it seemed to cause no regressions and fixes the problems directly. If there are better approaches we can reconsider in a separate mdev. CC psergey

            People

              cvicentiu Vicențiu Ciorbaru
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.