[MDEV-7773] Aggregate stored functions - Jira

Details

Type: Task
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Fix Version/s: 10.3.3
Component/s: Data Definition - Procedure
Labels:
- 10.3-beta
- gsoc15
- gsoc16

Sprint:
10.3.1-1, 10.3.1-2, 10.3.3-1

Description

With CREATE FUNCTION one can create functions in SQL, but this syntax doesn't allow one to create an aggregate function (like SUM, AVG, etc). This task is to add support for aggregate stored functions.

Syntax

is not decided yet. SQL Standard 2003 doesn't support aggregate stored functions (may be the newer standard does). Oracle, PostgreSQL, HSQL (may be more) all implement aggregate stored functions using their own incompatible syntax extensions. SQL Server and DB2 do not support creating new aggregate stored functions in SQL.

The syntax should at least allow for

the code to return the function value
the code to be called per row in a group, it accumulates data, does not return a value
storage that is preserved between calls within a group

and possibly

code to invoke between groups to reset the storage
code to remove data from the group (useful for window functions)

Attachments

Issue Links

relates to

MDEV-14520 Custom aggregate functions work incorrectly with WITH ROLLUP clause

Closed

MDEV-24580 Aggregate stored function return always NULL in case of empty input set

Open

MCOL-523 support user defined aggregate functions

Closed

MDEV-11263 Aggregate Window Functions

Stalled

MDEV-11301 SUGGESTION: Increase the number of Aggregate Functions

Open

MDEV-14564 support FOR loop in stored aggregate functions

Stalled

MDEV-28849 I_S tables don't tell that the stored function is AGGREGATE

Open

(2 relates to)

Activity

Ascending order - Click to sort in descending order

Sergei Golubchik added a comment - 2016-02-19 17:40 - edited

A possible approach is to have only one function that does everything — accumulates and returns the result. This is what HSQL has:

 CREATE AGGREGATE FUNCTION udavg(IN x INTEGER, IN flag BOOLEAN, INOUT acc BIGINT, INOUT counter INT)

...

here the first argument — x — is the actual argument of the aggregate function (called as SELECT udagv(x) ...), the second — flag — tells where to accumulate or return the result, the last two — acc and counter — are the storage, their value is preserved between calls and automatically reset to NULL when a new group starts.

The function then does something like

  IF flag THEN

    -- return the value

  ELSE

    -- update count and acc with the new value of x

  END IF;

Sergei Golubchik added a comment - 2016-02-19 17:40 - edited A possible approach is to have only one function that does everything — accumulates and returns the result. This is what HSQL has: CREATE AGGREGATE FUNCTION udavg( IN x INTEGER , IN flag BOOLEAN, INOUT acc BIGINT , INOUT counter INT ) ... here the first argument — x — is the actual argument of the aggregate function (called as SELECT udagv(x) ... ), the second — flag — tells where to accumulate or return the result, the last two — acc and counter — are the storage, their value is preserved between calls and automatically reset to NULL when a new group starts. The function then does something like IF flag THEN -- return the value ELSE -- update count and acc with the new value of x END IF ;

Sergei Golubchik added a comment - 2016-02-19 17:50

PostgreSQL assembles an aggregate function out of two regular functions. The state, like in the HSQL case, is an INOUT argument for these functions. The syntax looks like

CREATE FUNCTION norm1 (... ) ... ;

CREATE FUNCTION norm2 (... ) ... ;

CREATE AGGREGATE (...) ( SFUNC=norm1, FINALFUNC=norm2, ... )

it also allows to specify the initial value for the state and if the function is MIN or MAX (that is, can be answered by taking the first or last value from the index).

Sergei Golubchik added a comment - 2016-02-19 17:50 PostgreSQL assembles an aggregate function out of two regular functions. The state, like in the HSQL case, is an INOUT argument for these functions. The syntax looks like CREATE FUNCTION norm1 (... ) ... ; CREATE FUNCTION norm2 (... ) ... ; CREATE AGGREGATE (...) ( SFUNC=norm1, FINALFUNC=norm2, ... ) it also allows to specify the initial value for the state and if the function is MIN or MAX (that is, can be answered by taking the first or last value from the index).

Sergei Golubchik added a comment - 2016-02-19 17:58 - edited

Oracle uses an object-oriented approach. An aggregate function is an object, the state is stored in the object, initialize/iterate/terminate/merge are the methods.

create type SecondMaxImpl as object

  max NUMBER, -- highest value seen so far

  secmax NUMBER, -- second highest value seen so far

  static function ODCIAggregateInitialize(sctx IN OUT SecondMaxImpl) return number,

  member function ODCIAggregateIterate(self IN OUT SecondMaxImpl, value IN number) return number,

  member function ODCIAggregateTerminate(self IN SecondMaxImpl, returnValue OUT number, flags IN number) return number,

  member function ODCIAggregateMerge(self IN OUT SecondMaxImpl, ctx2 IN SecondMaxImpl) return number

);

create type body SecondMaxImpl is

static function ODCIAggregateInitialize(sctx IN OUT SecondMaxImpl) return number is

begin

  sctx := SecondMaxImpl(0, 0);

  return ODCIConst.Success;

end;

-- define other methods accordingly...

CREATE FUNCTION SecondMax (input NUMBER) RETURN NUMBER

PARALLEL_ENABLE AGGREGATE USING SecondMaxImpl;

Sergei Golubchik added a comment - 2016-02-19 17:58 - edited Oracle uses an object-oriented approach. An aggregate function is an object, the state is stored in the object, initialize/iterate/terminate/merge are the methods. create type SecondMaxImpl as object ( max NUMBER, -- highest value seen so far secmax NUMBER, -- second highest value seen so far static function ODCIAggregateInitialize(sctx IN OUT SecondMaxImpl) return number, member function ODCIAggregateIterate(self IN OUT SecondMaxImpl, value IN number) return number, member function ODCIAggregateTerminate(self IN SecondMaxImpl, returnValue OUT number, flags IN number) return number, member function ODCIAggregateMerge(self IN OUT SecondMaxImpl, ctx2 IN SecondMaxImpl) return number ); create type body SecondMaxImpl is static function ODCIAggregateInitialize(sctx IN OUT SecondMaxImpl) return number is begin sctx := SecondMaxImpl(0, 0); return ODCIConst.Success; end ; -- define other methods accordingly... CREATE FUNCTION SecondMax (input NUMBER) RETURN NUMBER PARALLEL_ENABLE AGGREGATE USING SecondMaxImpl;

Sergei Golubchik added a comment - 2016-02-19 18:24 - edited

Another approach is to use a pre-defined cursor that iterates the group. Like this:

CREATE AGGREGATE FUNCTION avg(x DOUBLE) RETURNS DOUBLE

BEGIN

  DECLARE count INT DEFAULT 0;

  DECLARE sum DOUBLE DEFAULT 0;

  DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' RETURN sum/count;

  LOOP

    FETCH GROUP NEXT ROW;

    SET count:=count+1;

    SET sum:=sum+x;

  END LOOP;

END

Note that GROUP keyword is reserved, so this syntax can be unambiguously distinguished from FETCH cursor_name.

Sergei Golubchik added a comment - 2016-02-19 18:24 - edited Another approach is to use a pre-defined cursor that iterates the group. Like this: CREATE AGGREGATE FUNCTION avg (x DOUBLE ) RETURNS DOUBLE BEGIN DECLARE count INT DEFAULT 0; DECLARE sum DOUBLE DEFAULT 0; DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' RETURN sum / count ; LOOP FETCH GROUP NEXT ROW; SET count := count +1; SET sum := sum +x; END LOOP; END Note that GROUP keyword is reserved, so this syntax can be unambiguously distinguished from FETCH cursor_name .

Sergei Golubchik added a comment - 2016-02-19 18:28 - edited

Another possibility is to use a complex CREATE FUNCTION with many blocks:

CREATE AGGREGATE FUNCTION avg(x DOUBLE) RETURNS DOUBLE

  DECLARE count INT DEFAULT 0;  -- note, state in declarations before the first BEGIN

  DECLARE sum DOUBLE DEFAULT 0;

BEGIN                           -- this is called for every row in a group

  SET count:=count+1;

  SET sum:=sum+x;

END;

BEGIN                           -- this returns the result

  RETURN sum/count;

END;

but this is too confusing. BEGIN/END blocks need keywords or labels to specify what they do.

Sergei Golubchik added a comment - 2016-02-19 18:28 - edited Another possibility is to use a complex CREATE FUNCTION with many blocks: CREATE AGGREGATE FUNCTION avg (x DOUBLE ) RETURNS DOUBLE DECLARE count INT DEFAULT 0; -- note, state in declarations before the first BEGIN DECLARE sum DOUBLE DEFAULT 0; BEGIN -- this is called for every row in a group SET count := count +1; SET sum := sum +x; END ; BEGIN -- this returns the result RETURN sum / count ; END ; but this is too confusing. BEGIN / END blocks need keywords or labels to specify what they do.

varun gupta added a comment - 2016-02-29 03:58

Can you give more details on the argument x, what i understand that it is one of the attributes of the table, and we need to have an aggregate function for the attribute x

varun gupta added a comment - 2016-02-29 03:58 Can you give more details on the argument x, what i understand that it is one of the attributes of the table, and we need to have an aggregate function for the attribute x

Sergei Golubchik added a comment - 2016-02-29 15:25

x is the function argument. Like in

SELECT avg(val) FROM table1 GROUP BY grn;

Sergei Golubchik added a comment - 2016-02-29 15:25 x is the function argument. Like in SELECT avg (val) FROM table1 GROUP BY grn;

varun gupta added a comment - 2016-02-29 17:48

Yeah understand it is the argument to the function.
When we write
SELECT avg(val) FROM table1 GROUP BY grn;
val is one of the columns of our table(attribute). Am I correct?

varun gupta added a comment - 2016-02-29 17:48 Yeah understand it is the argument to the function. When we write SELECT avg(val) FROM table1 GROUP BY grn; val is one of the columns of our table(attribute). Am I correct?

Oleksandr Byelkin added a comment - 2016-03-30 14:46

val should be an arbitrary expression.

Oleksandr Byelkin added a comment - 2016-03-30 14:46 val should be an arbitrary expression.

Sergei Golubchik added a comment - 2016-04-08 21:19 - edited

Window function support

To work efficiently over windows, an aggregate function should be able to remove a row from a group. This can be implemented in all syntax variants discussed above. In all cases row-removal functionality is optional, it is an optimization, but a aggregate function can work over windows without it too.

Adding support for row removal is not part of this ~~MDEV-7773~~, but it should be possible to add that functionality later, ~~MDEV-7773~~ should be compatible with this future extension.

HSQL

The second argument IN becomes an ENUM(ADD,REMOVE,RESULT)

PostgreSQL

The definition supports a new clause REMOVEFUNC=

Oracle

The object gets a new method remove

Cursor approach

A new syntax FETCH GROUP REMOVED ROW. A value is generated when both cursors ("next" and "removed") have returned SQLSTATE 02000. Note: This is not enough, in addition return needs not to return, but that would be too confusing, wouldn't it?

Many blocks approach

One more block, that is invoked to remove a row

Sergei Golubchik added a comment - 2016-04-08 21:19 - edited Window function support To work efficiently over windows, an aggregate function should be able to remove a row from a group. This can be implemented in all syntax variants discussed above. In all cases row-removal functionality is optional, it is an optimization, but a aggregate function can work over windows without it too. Adding support for row removal is not part of this MDEV-7773 , but it should be possible to add that functionality later, MDEV-7773 should be compatible with this future extension. HSQL The second argument IN becomes an ENUM(ADD,REMOVE,RESULT) PostgreSQL The definition supports a new clause REMOVEFUNC= Oracle The object gets a new method remove Cursor approach A new syntax FETCH GROUP REMOVED ROW . A value is generated when both cursors ("next" and "removed") have returned SQLSTATE 02000. Note: This is not enough, in addition return needs not to return, but that would be too confusing, wouldn't it? Many blocks approach One more block, that is invoked to remove a row

Sergei Golubchik added a comment - 2016-04-09 06:11 - edited

The complete example of the AVGOW (average over window) function using the cursor approach:

CREATE AGGREGATE FUNCTION avgow(x DOUBLE) RETURNS DOUBLE

BEGIN

  DECLARE count INT DEFAULT 0;

  DECLARE sum DOUBLE DEFAULT 0;

  DECLARE loop_done INT;

  DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET loop_done=1;

  LOOP

    SET loop_done=0;

    WHILE loop_done = 0

      FETCH GROUP REMOVED ROW;

      SET count:=count-1;

      SET sum:=sum-x;

    END WHILE;

    SET loop_done=0;

    WHILE loop_done = 0

      FETCH GROUP NEXT ROW;

      SET count:=count+1;

      SET sum:=sum+x;

    END WHILE;

    RETURN IF(count,sum/count,NULL);

  END LOOP;

END

It is more complex than necessary, but shows all the features. Note, that

there are two cursors, not one (the presence of a second cursor makes this function “window-aware” and enables two-cursor optimization).
after return execution continues! The function is restarted for every window partition, but it is not restarted for every frame inside a window.
The function exhausts the REMOVED cursor first, then the NEXT cursor. This order is required. There is no logical reason for that, it is how our window functions work at the moment. This limitation can (and should) be removed, but let's do it as the next step.
This function is universal, it works both for GROUP BY and for OVER WINDOW. In the former case the “frame” is equal to the whole group, so the function is restarted after every return (in other words, it has the usual semantics).

Sergei Golubchik added a comment - 2016-04-09 06:11 - edited The complete example of the AVGOW (average over window) function using the cursor approach: CREATE AGGREGATE FUNCTION avgow(x DOUBLE ) RETURNS DOUBLE BEGIN DECLARE count INT DEFAULT 0; DECLARE sum DOUBLE DEFAULT 0; DECLARE loop_done INT ; DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET loop_done=1; LOOP SET loop_done=0; WHILE loop_done = 0 FETCH GROUP REMOVED ROW; SET count := count -1; SET sum := sum -x; END WHILE; SET loop_done=0; WHILE loop_done = 0 FETCH GROUP NEXT ROW; SET count := count +1; SET sum := sum +x; END WHILE; RETURN IF ( count , sum / count , NULL ); END LOOP; END It is more complex than necessary, but shows all the features. Note, that there are two cursors, not one (the presence of a second cursor makes this function “window-aware” and enables two-cursor optimization). after return execution continues! The function is restarted for every window partition, but it is not restarted for every frame inside a window. The function exhausts the REMOVED cursor first, then the NEXT cursor. This order is required . There is no logical reason for that, it is how our window functions work at the moment. This limitation can (and should) be removed, but let's do it as the next step. This function is universal, it works both for GROUP BY and for OVER WINDOW . In the former case the “frame” is equal to the whole group, so the function is restarted after every return (in other words, it has the usual semantics).

Sergei Golubchik added a comment - 2016-04-29 06:48

Another problem of the cursor approach — if we ever will support non-SQL stored procedures (like Antony Curtis work), this will unlikely to be able to use cursors. At least, not generally for all languages.

Sergei Golubchik added a comment - 2016-04-29 06:48 Another problem of the cursor approach — if we ever will support non-SQL stored procedures (like Antony Curtis work ), this will unlikely to be able to use cursors. At least, not generally for all languages.

Oleksandr Byelkin added a comment - 2016-04-29 07:36

There should be ways to read cursors from any stored procedures (special functions or something like this) in any case we get control at that moment and do what we need. It is as difficult as for our procedures IMHO.

Oleksandr Byelkin added a comment - 2016-04-29 07:36 There should be ways to read cursors from any stored procedures (special functions or something like this) in any case we get control at that moment and do what we need. It is as difficult as for our procedures IMHO.

Alexander Barkov added a comment - 2016-05-01 05:28 - edited

The proposed syntax looks fine for simple cases.

I'd propose to think towards another direction though:
implement aggregate functions as SQL classes!

Internally all aggregate functions are implemented as classes in the Item_sum hierarchy and they have various methods and members. To be able to support all features that a built-in aggregate function supports (like WINDOW functionality, DISTINCT, etc) we could implement stored functions as classes as well, using the SQL standard definitions for user-defined types.

See the "<user-defined type definition>" in the SQL standard.

A skeleton for a class definition would look like this:

-- Define an aggregate function handler

-- Derive it from some top-level built-in abstract class MariaDBAggregateFunctionHandler

CREATE TYPE handler_avgow UNDER MariaDBAggregateFunctionHandler

AS

 count INT DEFAULT 0,  # member 1

 sum DOUBLE DEFAULT 0  # member 2

OVERRIDING METHOD val_real() RETURNS DOUBLE

OVERRIDING METHOD fix_length_and_dec() RETURNS BOOLEAN

OVERRIDING METHOD reset_and_add() RETURNS BOOLEAN

...

After a class is defined, we can create methods using the standard syntax:
(in SQL method cannot be defined "inline", so there will be separate statements for methods:

-- Define aggregate handler methods

CREATE METHOD val_real() FOR handler_avgov

 BEGIN

   RETURN IF(count,sum/count,NULL);

 END;

 CREATE METHOD fix_length_and_dec() ...;

...

Now, when we have a class with its methods created with the standard syntax, we can create an aggregate function using non-standard syntax, for example like this:

-- Define the function itself

CREATE AGGREGATE FUNCTION avgow(x DOUBLE) RETURNS DOUBLE

USING handler_avgrow;

For non-SQL definitions we could use some syntax (perhaps it already exists in the SQL standard, I saw something like EXTERNAL NAME '...', which looks very related) to load implementations from external shared libraries. This syntax could include <language clause>.
How SQL and non-SQL definitions would relate:

The CREATE TYPE statement will be the same for an SQL and non-SQL implementation.
The CREATE METHOD statement will be different for SQL and non-SQL!!! (explicit SQL routine definition vs external module reference)
The CREATE AGGREGATE FUNCTION statement will be the same for SQL and non-SQL

Advantages of the class-based conception:

A great step towards the SQL standard. This CREATE TYPE useful per se, to create user-defined data types.
Splits the code logically. You don't have to care about the order of different SQL code pieces, like cursors in the above example. This code will reside in different methods, so the server will make sure to execute them in proper order. Also, you don't have to care about signals. All together this should give better performance, and better user-defined SQL code readability.
SQL and Non-SQL have very close interface (only CREATE METHOD differs)
Classes can be used to add SQL code for other user-defined purposes (not only for aggregate functions), e.g.:
- Hybrid type functions, which need not only val_xxx() (RETURN in terms of SQL), but also fix_length_and_dec() to decide their return type depending on the parameters passed.
- PROCEDURE in SQL (rewrite the SELECT data set into a different data set)
- SQL code wrappers around pluggable modules for various server components (for example parser or optimizer), engines, and all other plugin types. A user defined parser could derive from the built-in parser to call the derived parse method then additionally store some statistics somewhere using SQL. An SQL wrapper for an engine could override some engine methods (using SQL) to disable certain components (e.g. there is a request to disable everything in ConnectSE, except ODBC). So there could be two instances of ConnectSE: the standard full-featured one with full functionality available for authorized users, and a derived ODBC-only one for non-authorized users.

Alexander Barkov added a comment - 2016-05-01 05:28 - edited The proposed syntax looks fine for simple cases. I'd propose to think towards another direction though: implement aggregate functions as SQL classes! Internally all aggregate functions are implemented as classes in the Item_sum hierarchy and they have various methods and members. To be able to support all features that a built-in aggregate function supports (like WINDOW functionality, DISTINCT, etc) we could implement stored functions as classes as well, using the SQL standard definitions for user-defined types. See the "<user-defined type definition>" in the SQL standard. A skeleton for a class definition would look like this: -- Define an aggregate function handler -- Derive it from some top-level built-in abstract class MariaDBAggregateFunctionHandler CREATE TYPE handler_avgow UNDER MariaDBAggregateFunctionHandler AS ( count INT DEFAULT 0, # member 1 sum DOUBLE DEFAULT 0 # member 2 ) OVERRIDING METHOD val_real() RETURNS DOUBLE OVERRIDING METHOD fix_length_and_dec() RETURNS BOOLEAN OVERRIDING METHOD reset_and_add() RETURNS BOOLEAN ... ; After a class is defined, we can create methods using the standard syntax: (in SQL method cannot be defined "inline", so there will be separate statements for methods: -- Define aggregate handler methods CREATE METHOD val_real() FOR handler_avgov BEGIN RETURN IF ( count , sum / count , NULL ); END ; CREATE METHOD fix_length_and_dec() ...; ... Now, when we have a class with its methods created with the standard syntax, we can create an aggregate function using non-standard syntax, for example like this: -- Define the function itself CREATE AGGREGATE FUNCTION avgow(x DOUBLE ) RETURNS DOUBLE USING handler_avgrow; For non-SQL definitions we could use some syntax (perhaps it already exists in the SQL standard, I saw something like EXTERNAL NAME '...', which looks very related) to load implementations from external shared libraries. This syntax could include <language clause>. How SQL and non-SQL definitions would relate: The CREATE TYPE statement will be the same for an SQL and non-SQL implementation. The CREATE METHOD statement will be different for SQL and non-SQL!!! (explicit SQL routine definition vs external module reference) The CREATE AGGREGATE FUNCTION statement will be the same for SQL and non-SQL Advantages of the class-based conception: A great step towards the SQL standard. This CREATE TYPE useful per se, to create user-defined data types. Splits the code logically. You don't have to care about the order of different SQL code pieces, like cursors in the above example. This code will reside in different methods, so the server will make sure to execute them in proper order. Also, you don't have to care about signals. All together this should give better performance, and better user-defined SQL code readability. SQL and Non-SQL have very close interface (only CREATE METHOD differs) Classes can be used to add SQL code for other user-defined purposes (not only for aggregate functions), e.g.: Hybrid type functions, which need not only val_xxx() (RETURN in terms of SQL), but also fix_length_and_dec() to decide their return type depending on the parameters passed. PROCEDURE in SQL (rewrite the SELECT data set into a different data set) SQL code wrappers around pluggable modules for various server components (for example parser or optimizer), engines, and all other plugin types. A user defined parser could derive from the built-in parser to call the derived parse method then additionally store some statistics somewhere using SQL. An SQL wrapper for an engine could override some engine methods (using SQL) to disable certain components (e.g. there is a request to disable everything in ConnectSE, except ODBC). So there could be two instances of ConnectSE: the standard full-featured one with full functionality available for authorized users, and a derived ODBC-only one for non-authorized users.

Sergei Golubchik added a comment - 2016-05-06 05:39

This is very similar to Oracle approach (see above).

But this is way too verbose and complex in my opinion. It could be simplified
as:

CREATE AGGREGATE FUNCTION avgow

AS

 count INT DEFAULT 0,  -- member 1

 sum DOUBLE DEFAULT 0  -- member 2

OVERRIDING METHOD val_real() RETURNS DOUBLE

OVERRIDING METHOD fix_length_and_dec() RETURNS BOOLEAN

OVERRIDING METHOD reset_and_add() RETURNS BOOLEAN

...

CREATE METHOD val_real() FOR avgow

 BEGIN

   RETURN IF(count,sum/count,NULL);

 END;

CREATE METHOD fix_length_and_dec() ...;

...

or even as

CREATE AGGREGATE FUNCTION avgow

AS

 count INT DEFAULT 0,  -- member 1

 sum DOUBLE DEFAULT 0  -- member 2

);

CREATE METHOD val_real() FOR avgow

 BEGIN

   RETURN IF(count,sum/count,NULL);

 END;

CREATE METHOD fix_length_and_dec() ...;

...

But in certain cases it is more limited than cursor-based approach, it doesn't
allow one to implement a median or a percentile function, because you cannot
have temporary tables or arrays in the AS ( ... ) clause. Still, both
approaches can be combined:

CREATE AGGREGATE FUNCTION avgow(x DOUBLE) RETURNS DOUBLE

BEGIN

  -- ... aggreagate function body, as above

END;

CREATE METHOD fix_length_and_dec() ... FOR avgow...

...

And the beauty of this is that it perfectly applies to non-aggregate functions
too:

CREATE FUNCTION foo(x DOUBLE) RETURNS DECIMAL(3,1)

BEGIN

  -- ...

END;

CREATE METHOD fix_length_and_dec() ... FOR foo...

...

Sergei Golubchik added a comment - 2016-05-06 05:39 This is very similar to Oracle approach (see above). But this is way too verbose and complex in my opinion. It could be simplified as: CREATE AGGREGATE FUNCTION avgow AS ( count INT DEFAULT 0, -- member 1 sum DOUBLE DEFAULT 0 -- member 2 ) OVERRIDING METHOD val_real() RETURNS DOUBLE OVERRIDING METHOD fix_length_and_dec() RETURNS BOOLEAN OVERRIDING METHOD reset_and_add() RETURNS BOOLEAN ... ; CREATE METHOD val_real() FOR avgow BEGIN RETURN IF ( count , sum / count , NULL ); END ; CREATE METHOD fix_length_and_dec() ...; ... or even as CREATE AGGREGATE FUNCTION avgow AS ( count INT DEFAULT 0, -- member 1 sum DOUBLE DEFAULT 0 -- member 2 ); CREATE METHOD val_real() FOR avgow BEGIN RETURN IF ( count , sum / count , NULL ); END ; CREATE METHOD fix_length_and_dec() ...; ... But in certain cases it is more limited than cursor-based approach, it doesn't allow one to implement a median or a percentile function, because you cannot have temporary tables or arrays in the AS ( ... ) clause. Still, both approaches can be combined: CREATE AGGREGATE FUNCTION avgow(x DOUBLE ) RETURNS DOUBLE BEGIN -- ... aggreagate function body, as above END ; CREATE METHOD fix_length_and_dec() ... FOR avgow... ... And the beauty of this is that it perfectly applies to non-aggregate functions too: CREATE FUNCTION foo(x DOUBLE ) RETURNS DECIMAL (3,1) BEGIN -- ... END ; CREATE METHOD fix_length_and_dec() ... FOR foo... ...

Sergei Golubchik added a comment - 2016-05-06 05:43

Hm, may be omitting the OVERRIDING clause is not a good idea — it might allow someone from a concurrent connection to invoke the function before it's fully defined. And that might be dangerous, if, for example, one of the overriding methods does some kind of the access checks.

But OVERRIDING can be made optional. In the above case one will specify OVERRIDING, but often one will be able to omit it.

Or, may be, it'll be simpler to stick to the standard here and require the OVERRIDING clause?

Sergei Golubchik added a comment - 2016-05-06 05:43 Hm, may be omitting the OVERRIDING clause is not a good idea — it might allow someone from a concurrent connection to invoke the function before it's fully defined. And that might be dangerous, if, for example, one of the overriding methods does some kind of the access checks. But OVERRIDING can be made optional. In the above case one will specify OVERRIDING , but often one will be able to omit it. Or, may be, it'll be simpler to stick to the standard here and require the OVERRIDING clause?

Alexander Barkov added a comment - 2016-05-06 06:57 - edited

Good idea to build CREATE TYPE inside CREATE [AGGREGATE] FUNCTION.
We should probably try to define method bodies inline (without having separate CREATE METHOD statements).
This would solve the problem with concurrent connections.

Alexander Barkov added a comment - 2016-05-06 06:57 - edited Good idea to build CREATE TYPE inside CREATE [AGGREGATE] FUNCTION. We should probably try to define method bodies inline (without having separate CREATE METHOD statements). This would solve the problem with concurrent connections.

Vicențiu Ciorbaru added a comment - 2017-07-10 13:30

The current test case seems to corrupt the procs table:

delimiter |;

create aggregate function agg_sum(x INT) returns double

begin

declare z double default 0;

declare continue handler for not found return z;

loop

fetch group next row;

set z = z + x;

end loop;

end|

delimiter ;|

create table t1 (id int, salary int);

INSERT INTO t1 VALUES (1, 100), (2, 40), (3, 6);

show create function agg_sum;

alter function agg_sum aggregate none;

show create function agg_sum;

Final query fails with:

# This fails with 'show create function agg_sum' failed: 1457: Failed to load routine test.agg_sum. The table mysql.proc is missing, corrupt, or contains bad data (internal code -6)

Vicențiu Ciorbaru added a comment - 2017-07-10 13:30 The current test case seems to corrupt the procs table: delimiter |; create aggregate function agg_sum(x INT ) returns double begin declare z double default 0; declare continue handler for not found return z; loop fetch group next row; set z = z + x; end loop; end | delimiter ;| create table t1 (id int , salary int ); INSERT INTO t1 VALUES (1, 100), (2, 40), (3, 6); show create function agg_sum; alter function agg_sum aggregate none; show create function agg_sum; Final query fails with: # This fails with 'show create function agg_sum' failed: 1457: Failed to load routine test.agg_sum. The table mysql.proc is missing, corrupt, or contains bad data (internal code -6)

Vicențiu Ciorbaru added a comment - 2017-10-28 14:41

Just a few minor dead-code removal and history cleanup, then we can push.

Vicențiu Ciorbaru added a comment - 2017-10-28 14:41 Just a few minor dead-code removal and history cleanup, then we can push.

Varun Gupta (Inactive) added a comment - 2017-11-26 18:29

A final rebase was required, resolved the conflicts and have sent it to cvicentiu for review.
Have pushed it to bb-10.3-varun for final testing

Varun Gupta (Inactive) added a comment - 2017-11-26 18:29 A final rebase was required, resolved the conflicts and have sent it to cvicentiu for review. Have pushed it to bb-10.3-varun for final testing

Elena Stepanova added a comment - 2017-11-29 13:16

I haven't got anything of significance in a round of tests on bb-10.3-varun 7d8c5ee4039.
Testing of correctness of results is limited due to numerous existing problems in this area, unrelated to the new functionality. I haven't got any functionality-specific crashes.
After fixing ~~MDEV-14520~~ and addressing all reviews, please go ahead and push to 10.3. If reviews require significant changes in the code, please notify so that I re-run the tests.

Please make sure the functionality is documented in the KB, if not done yet.

Among other things, I think it should be specifically mentioned in the documentation that unlike built-in functions, the custom ones don't provide automatic type recognition and act according to the types specified in the function itself. For a simple example, consider the following

create or replace table t1 (i int, c char(8), ci char(8));

insert into t1 values (10,'foo','10'),(2,'bar','2');

delimiter $

create or replace aggregate function agg_max1(x INT) returns INT

begin

  declare m INT default NULL;

  declare continue handler for not found return m;

  loop

   fetch group next row;

   if (x is not null) and (m is null or m < x) then

      set m= x;

   end if;

  end loop;

end $

create or replace aggregate function agg_max2(x BLOB) returns BLOB

begin

  declare m BLOB default NULL;

  declare continue handler for not found return m;

  loop

   fetch group next row;

   if (x is not null) and (m is null or m < x) then

      set m= x;

   end if;

  end loop;

end $

delimiter ;

Here, column `i` is integer, column `ci` is text which contains integer values, and column `c` is text.
The in-built function MAX will treat each of them accordingly – the result will be based on integer comparison for `i`, on text comparison for `ci` and `c`.

MariaDB [test]> select max(i), max(ci), max(c) from t1;

+--------+---------+--------+

| max(i) | max(ci) | max(c) |

+--------+---------+--------+

|     10 | 2       | foo    |

+--------+---------+--------+

1 row in set (0.00 sec)

Definitions of functions agg_max1 and agg_max2 are identical, except that agg_max1 works with integers and agg_max2 with blobs.

On the same data, agg_max2 will do text comparison for all columns, regardless their initial type

MariaDB [test]> select agg_max2(i), agg_max2(ci), agg_max2(c) from t1;

+-------------+--------------+-------------+

| agg_max2(i) | agg_max2(ci) | agg_max2(c) |

+-------------+--------------+-------------+

| 2           | 2            | foo         |

+-------------+--------------+-------------+

1 row in set (0.00 sec)

Function agg_max1 will do integer comparison for anything that looks like integer – in this case, for `i` and for `ci`:

MariaDB [test]> select agg_max1(i), agg_max1(ci) from t1;

+-------------+--------------+

| agg_max1(i) | agg_max1(ci) |

+-------------+--------------+

|          10 |           10 |

+-------------+--------------+

1 row in set (0.00 sec)

And for something that doesn't look like integer, it doesn't work at all:

MariaDB [test]> set sql_mode= 'STRICT_ALL_TABLES';

Query OK, 0 rows affected (0.02 sec)

MariaDB [test]> select agg_max1(c) from t1;

ERROR 1366 (22007): Incorrect integer value: 'foo' for column 'x' at row 1

MariaDB [test]> set sql_mode= '';

Query OK, 0 rows affected (0.00 sec)

MariaDB [test]> select agg_max1(c) from t1;

+-------------+

| agg_max1(c) |

+-------------+

|           0 |

+-------------+

1 row in set, 3 warnings (0.00 sec)

Given that functions must define explicit types, I don't see a bug here, but it might be something users don't expect, so better to document it.

Elena Stepanova added a comment - 2017-11-29 13:16 I haven't got anything of significance in a round of tests on bb-10.3-varun 7d8c5ee4039. Testing of correctness of results is limited due to numerous existing problems in this area, unrelated to the new functionality. I haven't got any functionality-specific crashes. After fixing MDEV-14520 and addressing all reviews, please go ahead and push to 10.3. If reviews require significant changes in the code, please notify so that I re-run the tests. Please make sure the functionality is documented in the KB, if not done yet. Among other things, I think it should be specifically mentioned in the documentation that unlike built-in functions, the custom ones don't provide automatic type recognition and act according to the types specified in the function itself. For a simple example, consider the following create or replace table t1 (i int , c char (8), ci char (8)); insert into t1 values (10, 'foo' , '10' ),(2, 'bar' , '2' ); delimiter $ create or replace aggregate function agg_max1(x INT ) returns INT begin declare m INT default NULL ; declare continue handler for not found return m; loop fetch group next row; if (x is not null ) and (m is null or m < x) then set m= x; end if ; end loop; end $ create or replace aggregate function agg_max2(x BLOB) returns BLOB begin declare m BLOB default NULL ; declare continue handler for not found return m; loop fetch group next row; if (x is not null ) and (m is null or m < x) then set m= x; end if ; end loop; end $ delimiter ; Here, column `i` is integer, column `ci` is text which contains integer values, and column `c` is text. The in-built function MAX will treat each of them accordingly – the result will be based on integer comparison for `i` , on text comparison for `ci` and `c` . MariaDB [test]> select max (i), max (ci), max (c) from t1; + --------+---------+--------+ | max (i) | max (ci) | max (c) | + --------+---------+--------+ | 10 | 2 | foo | + --------+---------+--------+ 1 row in set (0.00 sec) Definitions of functions agg_max1 and agg_max2 are identical, except that agg_max1 works with integers and agg_max2 with blobs. On the same data, agg_max2 will do text comparison for all columns, regardless their initial type MariaDB [test]> select agg_max2(i), agg_max2(ci), agg_max2(c) from t1; + -------------+--------------+-------------+ | agg_max2(i) | agg_max2(ci) | agg_max2(c) | + -------------+--------------+-------------+ | 2 | 2 | foo | + -------------+--------------+-------------+ 1 row in set (0.00 sec) Function agg_max1 will do integer comparison for anything that looks like integer – in this case, for `i` and for `ci` : MariaDB [test]> select agg_max1(i), agg_max1(ci) from t1; + -------------+--------------+ | agg_max1(i) | agg_max1(ci) | + -------------+--------------+ | 10 | 10 | + -------------+--------------+ 1 row in set (0.00 sec) And for something that doesn't look like integer, it doesn't work at all: MariaDB [test]> set sql_mode= 'STRICT_ALL_TABLES' ; Query OK, 0 rows affected (0.02 sec) MariaDB [test]> select agg_max1(c) from t1; ERROR 1366 (22007): Incorrect integer value: 'foo' for column 'x' at row 1 MariaDB [test]> set sql_mode= '' ; Query OK, 0 rows affected (0.00 sec) MariaDB [test]> select agg_max1(c) from t1; + -------------+ | agg_max1(c) | + -------------+ | 0 | + -------------+ 1 row in set , 3 warnings (0.00 sec) Given that functions must define explicit types, I don't see a bug here, but it might be something users don't expect, so better to document it.

People

Assignee:: Varun Gupta (Inactive)

Reporter:: Sergei Golubchik

Votes:: 3 Vote for this issue

Watchers:: 14 Start watching this issue

Dates

Created:: 2015-03-13 13:40

Updated:: 2022-06-15 12:16

Resolved:: 2017-12-04 11:40

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server