[MDEV-9746] Window functions: CUME_DIST function Created: 2016-03-16  Updated: 2016-04-14  Resolved: 2016-03-29

Status: Closed
Project: MariaDB Server
Component/s: Optimizer - Window functions
Fix Version/s: 10.2.0

Type: Task Priority: Major
Reporter: Sergei Petrunia Assignee: Vicențiu Ciorbaru
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
PartOf
is part of MDEV-6115 window functions as in the SQL standard Closed
Sprint: 10.2.0-9

 Description   

CUME_DIST is a bit special. It needs to either

1. remember a value for each peer group somewhere,
2. three cursors.

Details:

create table t1 (
  pk int, 
  a int
);
 
insert into t1 values
(1, 20),
(2, 20),
(3, 20),
(4, 20),
(5, 30),
(6, 30),
(7, 30),
(8, 40);

select pk,a,cume_dist() over (order by a) from t1;
+------+------+
| pk   | a    |
+------+------+
|    1 |   20 | 0.5
|    2 |   20 | 0.5
|    3 |   20 | 0.5
|    4 |   20 | 0.5
|    5 |   30 | 0.875
|    6 |   30 | 0.875
|    7 |   30 | 0.875
|    8 |   40 | 1
+------+------+

How to compute these?
1. The first pass needs to find how many rows are in the partition.
(Q: BTW, is it true that "any two-pass window function needs the first pass
only to find #rows in the partition"?)

2. On the second pass, we need
2.1 find how many peers are in the peer group (5 rows have a=20)
2.2 compute the value and set it for every peer member.

Possible ways to do step#2:
A: make it in two passes:

  • first, let the first row of the group store #values in the group
  • second, propagate this value to all group members.

B: Use two cursors.

  • The front cursor reaches the end of the peer group, remembers how many values
    it saw and then stays there.
  • The back cursor walks through the peer group and updates the values.


 Comments   
Comment by Sergei Petrunia [ 2016-03-16 ]

I like the two-cursor approach better.

  • It does fewer update operations. I can't prove it but looks like it's faster
  • We will need window functions to use additional cursors anyway for LAG() and LEAD().
  • I don't think the approach of "use the temp.table for temporary data during computations" will be useful for any other window function
Generated at Thu Feb 08 07:37:01 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.