[MDEV-10859] Wrong result of aggregate window function in query with HAVING and no ORDER BY Created: 2016-09-21  Updated: 2017-02-15  Resolved: 2017-02-15

Status: Closed
Project: MariaDB Server
Component/s: Optimizer - Window functions
Affects Version/s: 10.2
Fix Version/s: 10.2.4

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Vicențiu Ciorbaru
Resolution: Fixed Votes: 0
Labels: 10.2-rc

Issue Links:
Relates
relates to MDEV-9896 Testing for window functions Open

 Description   

It appears that when both ORDER BY and HAVING are present, HAVING is treated as WHERE clause.

Note: I don't see anything about it in the standard, maybe it's not even a part of the standard at all, the behavior just does not look logical.

create table empsalary (depname varchar(32), empno smallint primary key, salary int);
 insert into empsalary values  ('develop',1,5000),('develop',2,4000),('sales',3,'6000'),('sales',4,5000);

No ORDER BY, no HAVING -- OK

MariaDB [test]> SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname) FROM empsalary;
+---------+-------+--------+-----------------------------------------+
| depname | empno | salary | avg(salary) OVER (PARTITION BY depname) |
+---------+-------+--------+-----------------------------------------+
| develop |     1 |   5000 |                               4500.0000 |
| develop |     2 |   4000 |                               4500.0000 |
| sales   |     3 |   6000 |                               5500.0000 |
| sales   |     4 |   5000 |                               5500.0000 |
+---------+-------+--------+-----------------------------------------+

ORDER BY, no HAVING -- OK

MariaDB [test]> SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname) FROM empsalary order by depname;
+---------+-------+--------+-----------------------------------------+
| depname | empno | salary | avg(salary) OVER (PARTITION BY depname) |
+---------+-------+--------+-----------------------------------------+
| develop |     1 |   5000 |                               4500.0000 |
| develop |     2 |   4000 |                               4500.0000 |
| sales   |     4 |   5000 |                               5500.0000 |
| sales   |     3 |   6000 |                               5500.0000 |
+---------+-------+--------+-----------------------------------------+

HAVING, no ORDER BY -- WRONG

MariaDB [test]> SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname) FROM empsalary having empno > 1;
+---------+-------+--------+-----------------------------------------+
| depname | empno | salary | avg(salary) OVER (PARTITION BY depname) |
+---------+-------+--------+-----------------------------------------+
| develop |     2 |   4000 |                               4500.0000 |
| sales   |     3 |   6000 |                               5500.0000 |
| sales   |     4 |   5000 |                               5500.0000 |
+---------+-------+--------+-----------------------------------------+
3 rows in set (0.00 sec)

ORDER BY and HAVING -- OK

MariaDB [test]> SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname) FROM empsalary having empno > 1 order by depname;
+---------+-------+--------+-----------------------------------------+
| depname | empno | salary | avg(salary) OVER (PARTITION BY depname) |
+---------+-------+--------+-----------------------------------------+
| develop |     2 |   4000 |                               4000.0000 |
| sales   |     4 |   5000 |                               5500.0000 |
| sales   |     3 |   6000 |                               5500.0000 |
+---------+-------+--------+-----------------------------------------+
3 rows in set (0.00 sec)



 Comments   
Comment by Vicențiu Ciorbaru [ 2016-09-21 ]

The problem is with the 3rd select.

The correct result is always of the following form:
1st compute table without window functions column.
2nd compute the remaining window functions columns, using the results from the 1st computation.

Comment by Elena Stepanova [ 2016-09-21 ]

I agree, if we want to stick to a part of the standard (the one which says that having must be applied before window functions), then the 3rd query is incorrect.

The problem, from my point of view, is that such query is already an extension of the standard, just like the whole "non-full-group-by" in MySQL is; and everyone knows how many problems partial group by caused and still causes, and how little actual benefit it brings.

So, I would suggest that at least for new features, where we still have a choice, we stick to the standard and didn't allow potentially non-deterministic syntax unless it is specified by the standard or it is so extremely important for majority of users that it's worth the trouble.

So, my general request, for this and probably many other winfunc-related issues, is to double-check whether the syntax is allowed by the standard, and if it's not, and we don't have a critical real-life use case for it, to forbid it altogether.

Comment by Vicențiu Ciorbaru [ 2017-02-14 ]

Hi Sergey!

Can you please review a patch for this issue?
http://lists.askmonty.org/pipermail/commits/2017-February/010688.html
Thanks,
Vicențiu

Comment by Vicențiu Ciorbaru [ 2017-02-15 ]

I have pushed the proposed patch into 10.2 as it seemed to cause no regressions and fixes the problems directly. If there are better approaches we can reconsider in a separate mdev.

CC psergey

Generated at Thu Feb 08 07:45:28 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.