[MDEV-9951] Wrong windowing functions results with duplicate values Created: 2016-04-19  Updated: 2016-04-20  Resolved: 2016-04-20

Status: Closed
Project: MariaDB Server
Component/s: Optimizer - Window functions
Affects Version/s: 10.2.0
Fix Version/s: N/A

Type: Bug Priority: Minor
Reporter: Federico Razzoli Assignee: Sergei Petrunia
Resolution: Not a Bug Votes: 0
Labels: None


 Description   

When using an aggregate function as a running function (SUM(...) OVER (...)), identical values are processed together. Which means, if you sum 1 three times, you will get 3 in all records.

MariaDB [test]> CREATE OR REPLACE TABLE employee_salary
    -> (
    ->         name VARCHAR(100) NOT NULL,
    ->         salary INT UNSIGNED NOT NULL,
    ->         
    ->         INDEX idx_salary (salary)
    -> )
    ->         ENGINE InnoDB
    -> ;
Query OK, 0 rows affected (0.56 sec)
 
MariaDB [test]> 
MariaDB [test]> INSERT INTO employee_salary
    ->                 (name, salary)
    ->         VALUES
    ->                 ('Fank Zappa',         10000), -- unique values...
    ->                 ('Jan Anderson',       20000),
    ->                 ('Ian Gillan',         30000),
    ->                 ('Ozzy Osbourne',      40000), -- 4000 * 2
    ->                 ('Jimmy Page',         40000),
    ->                 ('Vinnie Colaiuta',    50000), -- 50000 * 3
    ->                 ('Frank Zappa',        50000),
    ->                 ('Joe Satriani',       50000)
    -> ;
Query OK, 8 rows affected (0.06 sec)
Records: 8  Duplicates: 0  Warnings: 0
 
MariaDB [test]> 
MariaDB [test]> SELECT
    ->                 salary, SUM(salary) OVER (ORDER BY salary) AS sum_salary
    ->         FROM employee_salary;
+--------+------------+
| salary | sum_salary |
+--------+------------+
|  10000 |      10000 |
|  20000 |      30000 |
|  30000 |      60000 |
|  40000 |     140000 |
|  40000 |     140000 |
|  50000 |     290000 |
|  50000 |     290000 |
|  50000 |     290000 |
+--------+------------+
8 rows in set (0.00 sec)



 Comments   
Comment by Elena Stepanova [ 2016-04-20 ]

I think that's how it's supposed to work.
I can't yet browse the standard freely enough to find the proof, but here is one from PostreSQL documentation:

By default, if ORDER BY is supplied then the frame consists of all rows from the start of the partition up through the current row, plus any following rows that are equal to the current row according to the ORDER BY clause.

f_razzoli, does it explain the results you are getting, or did you mean something else that I've missed?

Comment by Federico Razzoli [ 2016-04-20 ]

You are not missing anything. I didn't know that this is standard behaviour. Honestly I can't imagine the reason.
That documentation says "by default". In MariaDB, is there a way to get a different behaviour?

Comment by Elena Stepanova [ 2016-04-20 ]

I think "by default" here relates to the first part, "the frame consists of all rows from the start of the partition up through the current row", in other words, omitting the explicit range means that RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW will be used.

I doubt that the behavior with this particular query can be changed, although you can get more sensible results playing with the query itself; and the query where you only select a non-unique column, order by it and aggregate it at the same time is probably not very realistic anyway. But I will re-address your question to psergey, the expert in the area.

Comment by Sergei Petrunia [ 2016-04-20 ]

>in other words, omitting the explicit range means that RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW will be used.

Confirm, that's how it works.

f_razzoli ,
> In MariaDB, is there a way to get a different behaviour?

As far as I understand, ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW is what you're looking for.

Comment by Sergei Petrunia [ 2016-04-20 ]

Closing as the behaviour is as expected.

Generated at Thu Feb 08 07:38:33 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.