[MDEV-11263] Aggregate Window Functions Created: 2016-11-09  Updated: 2021-03-19

Status: Stalled
Project: MariaDB Server
Component/s: Optimizer - Window functions, Stored routines
Fix Version/s: None

Type: Task Priority: Minor
Reporter: Varun Gupta (Inactive) Assignee: Sergei Petrunia
Resolution: Unresolved Votes: 0
Labels: gsoc18, gsoc19, gsoc20

Issue Links:
Relates
relates to MDEV-7773 Aggregate stored functions Closed
Sprint: 10.2.4-4

 Description   

With a few exceptions, most native aggregate functions are supported as window functions.
https://mariadb.com/kb/en/library/aggregate-functions-as-window-functions/

In MDEV-7773, support for creating of custom aggregate functions was added.
This task proposes to extend that feature and allow custom aggregate functions to be used as window functions

An example of a creating a custom aggregate function is given below:

create aggregate function agg_sum(x INT) returns double
begin
  declare z double default 0;
  declare continue handler for not found return z;
  loop
    fetch group next row;
    set z = z + x;
  end loop;
end|

This functions can be used in the following query:

create table balances (id int, amount int);
insert into balances values (1, 10), (2, 20), (3, 30);
 
select agg_sum(amount) from balances;

After this task is complete the following must also work:

select agg_sum(amount) over (order by id);



 Comments   
Comment by Harman Wadhwa [ 2019-03-04 ]

Hi Varun,
I'm a 3rd-year Computer Science student at the University of Toronto (Canada) and am considering taking up this task as a GSoC 2019 Project.
Could you provide a list of potential aggregate functions that need to be implemented?

Comment by Varun Gupta (Inactive) [ 2019-03-05 ]

Hi Harman,
We had introduced custom aggregate function in MariaDB https://mariadb.com/kb/en/library/stored-aggregate-functions/
So the basic idea is to have a custom aggregate function(aka stored aggregate function) to work as a window function. This interface would help us to use any aggregate function as a window function.

An example would be

1) create an aggregate function (this is already done)

create aggregate function agg_sum(x INT) returns double
begin
  declare z double default 0;
  declare continue handler for not found return z;
  loop
    fetch group next row;
    set z = z + x;
  end loop;
end|

2) Execute the aggregate function as a window function [This needs to be done]

select agg_sum(t1.col) over (partition_by_clause order by clause) from t1;

Comment by Harman Wadhwa [ 2019-03-05 ]

Hi Varun,
I'm still unclear on what needs to be done.
Do aggregates (like sum, count etc..) need to be implemented using `CREATE AGGREGATE FUNCTION ...` or is that already done ?

Comment by Varun Gupta (Inactive) [ 2019-03-05 ]

Here is the list of aggregate functions that work as window function
https://mariadb.com/kb/en/library/aggregate-functions-as-window-functions/

One can implement any aggregate function using STORED AGGREGATE FUNTIONS, for example the above example agg_sum is created to behave like the SUM aggregate function.

This motivation for this project is to execute any STORED AGGREGATE FUNCTION like agg_sum as a WINDOW FUNCTION.

Comment by KHAMAR ALI SHAIKH [ 2019-03-24 ]

Hi Varun Gupta ,i am a student from RV college of Engineering Banglore. I m interested to take this user story as my task for GSOC 2019. Please guide me with further proceedings.

Comment by Vicențiu Ciorbaru [ 2019-03-25 ]

Hi khamar

It's good that you are interested in GSoC projects! varun has provided some hints above. I suggest you get comfortable with our regular stored user defined aggregate functions, like the one Varun posted in a previous comment.

There are plenty of places in the code that this project will touch upon. I can recommend you start looking into the Item_sum class and particularly subclasses of it like Item_sum_sum. These classes represent aggregate functions stored internally within the server. For example:
When you query the server with:

create table balances (id int, amount int);
insert into balances values (1, 10), (2, 20), (3, 30);
SELECT sum(amount) from balances;

for the sum() function, an Item_sum_sum will be created.

These classes support the "add" method, which is used to pass values to an accumulator. I recommend you set up a debugger for MariaDB, follow the steps here to get started:

https://mariadb.org/get-involved/getting-started-for-developers/get-code-build-test/
https://mariadb.com/kb/en/library/debugging-mariadb-with-a-debugger/

Then run the queries above and put a breakpoint into Item_sum_sum::add.

This will get you started with stepping into our regular aggregate functions.

After you see the flow for this bit of code, you need to also test with Item_sum_sp::add method. You can do this by:

Create a custom aggregate function like the one Varun showed above.
Then try executing it just like you did with regular Item_sum_sum and put a breakpoint in Item_sum_sp::add method.

There's quite a bit of code there, but if you manage to get to this point, then we can discuss more. Feel free to ask any questions if you get stuck in the meantime.

Comment by Arhant Jain [ 2019-03-26 ]

Thanks Vicentiu

Please clarify me.
As you said " start looking into the Item_sum class and particularly subclasses of it like Item_sum_sum".
Can you please detail about it and where should I look for these subclasses.

Comment by Vicențiu Ciorbaru [ 2019-03-29 ]

Hi Arhant

You can find the classes using your IDE's search feature or a regular expression search tool like grep, ack, etc.

I suggested in my previous post to investigate the add() method and try to figure out how it is used inside the server. Did you manage to set up a breakpoint and have the debugger stop at it when a query to the server is issued? What did you try to do so far? Providing more details as to what actually got you stuck will help in getting a more coherent reply

Comment by Tavneet Singh [ 2020-03-22 ]

Hi Varun and Vincent, was any progress made on this issue during the last iteration ? I have built the project and will start looking at the window and aggregate functions touch points in the code. I will also pick up a recent beginner friendly task to get familiar with the codebase.

Comment by Vicențiu Ciorbaru [ 2020-03-23 ]

tavneet No progress has been done since last GSoC. Feel free to have a look and ask any questions you might have. I recommend sending an email to maria-developers mailing list. You stand a much higher chance of getting a reply there, rather than though JIRA only.

Comment by Tavneet Singh [ 2020-03-28 ]

Hi @Vicențiu Ciorbaru @Varun,

I have set up the debugger and have some idea about the code flow for the regular aggregate function (sum) and the custom aggregate function posted by Varun.

I had a couple of queries which I have asked on the email and I am also posting here.

  1. In do_add from decimal.c, there are three parts with comments - /* part 1 - MY_MAX(frac) ... min (frac) /, / part 2 - MY_MIN(frac) ... MY_MIN(intg) */. Can someone please elaborate on what do the comments mean ?
  2. In Item_sum_sum::add_helper, there is an unlikely branch for variable direct_added. Can someone please give an idea about when will direct added be true ? In fact in all the uses for direct_added, it is always in an unlikely branch in Item_sum.cc.
  3. What is the design choice behind the struct decimal_t in decimal.h ? Why are we representing the numbers in this format (Is it helpful for handling floating point inaccuracies and overflows )?
Generated at Thu Feb 08 07:48:36 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.