Details

    • 10.3.1-1, 10.3.1-2, 10.3.3-1

    Description

      With CREATE FUNCTION one can create functions in SQL, but this syntax doesn't allow one to create an aggregate function (like SUM, AVG, etc). This task is to add support for aggregate stored functions.

      Syntax

      is not decided yet. SQL Standard 2003 doesn't support aggregate stored functions (may be the newer standard does). Oracle, PostgreSQL, HSQL (may be more) all implement aggregate stored functions using their own incompatible syntax extensions. SQL Server and DB2 do not support creating new aggregate stored functions in SQL.

      The syntax should at least allow for

      • the code to return the function value
      • the code to be called per row in a group, it accumulates data, does not return a value
      • storage that is preserved between calls within a group

      and possibly

      • code to invoke between groups to reset the storage
      • code to remove data from the group (useful for window functions)

      Attachments

        Issue Links

          Activity

            serg Sergei Golubchik created issue -
            serg Sergei Golubchik made changes -
            Field Original Value New Value
            Description With {{CREATE FUNCTION}} one can create functions in SQL, but this syntax doesn't allow one to create an aggregate function (like {{SUM}}, {{AVG}}, etc). This task is to add support for aggregate stored functions.

            .h2 Syntax

            is not decided yet. SQL Standard 2003 doesn't support aggregate storage functions (may be the newer standard does). Oracle, PostgreSQL, HSQL (may be more) all implement aggregate stored functions using their own incompatible syntax extensions.

            The syntax should at least allow for
            * the code to return the function value
            * the code to be called per row in a group, it accumulates data, does not return a value
            * storage that is preserved between calls within a group
            With {{CREATE FUNCTION}} one can create functions in SQL, but this syntax doesn't allow one to create an aggregate function (like {{SUM}}, {{AVG}}, etc). This task is to add support for aggregate stored functions.

            h2. Syntax

            is not decided yet. SQL Standard 2003 doesn't support aggregate storage functions (may be the newer standard does). Oracle, PostgreSQL, HSQL (may be more) all implement aggregate stored functions using their own incompatible syntax extensions.

            The syntax should at least allow for
            * the code to return the function value
            * the code to be called per row in a group, it accumulates data, does not return a value
            * storage that is preserved between calls within a group
            serg Sergei Golubchik made changes -
            Description With {{CREATE FUNCTION}} one can create functions in SQL, but this syntax doesn't allow one to create an aggregate function (like {{SUM}}, {{AVG}}, etc). This task is to add support for aggregate stored functions.

            h2. Syntax

            is not decided yet. SQL Standard 2003 doesn't support aggregate storage functions (may be the newer standard does). Oracle, PostgreSQL, HSQL (may be more) all implement aggregate stored functions using their own incompatible syntax extensions.

            The syntax should at least allow for
            * the code to return the function value
            * the code to be called per row in a group, it accumulates data, does not return a value
            * storage that is preserved between calls within a group
            With {{CREATE FUNCTION}} one can create functions in SQL, but this syntax doesn't allow one to create an aggregate function (like {{SUM}}, {{AVG}}, etc). This task is to add support for aggregate stored functions.

            h2. Syntax

            is not decided yet. SQL Standard 2003 doesn't support aggregate storage functions (may be the newer standard does). [Oracle|http://docs.oracle.com/cd/B19306_01/appdev.102/b14289/dciaggfns.htm], [PostgreSQL|http://www.postgresql.org/docs/8.3/static/sql-createaggregate.html], [HSQL|http://hsqldb.org/doc/guide/sqlroutines-chapt.html#src_aggregate_functions] (may be more) all implement aggregate stored functions using their own incompatible syntax extensions.

            The syntax should at least allow for
            * the code to return the function value
            * the code to be called per row in a group, it accumulates data, does not return a value
            * storage that is preserved between calls within a group
            ratzpo Rasmus Johansson (Inactive) made changes -
            Workflow MariaDB v2 [ 60114 ] MariaDB v3 [ 64604 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 10.2 [ 14601 ]
            colin Colin Charles made changes -
            Labels gsoc15 gsoc16
            serg Sergei Golubchik made changes -
            Labels gsoc16 gsoc15 gsoc16
            serg Sergei Golubchik made changes -
            Description With {{CREATE FUNCTION}} one can create functions in SQL, but this syntax doesn't allow one to create an aggregate function (like {{SUM}}, {{AVG}}, etc). This task is to add support for aggregate stored functions.

            h2. Syntax

            is not decided yet. SQL Standard 2003 doesn't support aggregate storage functions (may be the newer standard does). [Oracle|http://docs.oracle.com/cd/B19306_01/appdev.102/b14289/dciaggfns.htm], [PostgreSQL|http://www.postgresql.org/docs/8.3/static/sql-createaggregate.html], [HSQL|http://hsqldb.org/doc/guide/sqlroutines-chapt.html#src_aggregate_functions] (may be more) all implement aggregate stored functions using their own incompatible syntax extensions.

            The syntax should at least allow for
            * the code to return the function value
            * the code to be called per row in a group, it accumulates data, does not return a value
            * storage that is preserved between calls within a group
            With {{CREATE FUNCTION}} one can create functions in SQL, but this syntax doesn't allow one to create an aggregate function (like {{SUM}}, {{AVG}}, etc). This task is to add support for aggregate stored functions.

            h2. Syntax

            is not decided yet. SQL Standard 2003 doesn't support aggregate storage functions (may be the newer standard does). [Oracle|http://docs.oracle.com/cd/B19306_01/appdev.102/b14289/dciaggfns.htm], [PostgreSQL|http://www.postgresql.org/docs/8.3/static/sql-createaggregate.html], [HSQL|http://hsqldb.org/doc/guide/sqlroutines-chapt.html#src_aggregate_functions] (may be more) all implement aggregate stored functions using their own incompatible syntax extensions.

            The syntax should at least allow for
            * the code to return the function value
            * the code to be called per row in a group, it accumulates data, does not return a value
            * storage that is preserved between calls within a group
            * and possibly
            ** code to invoke between groups to reset the storage
            ** code to *remove* data from the group (useful for window functions)
            serg Sergei Golubchik made changes -
            Description With {{CREATE FUNCTION}} one can create functions in SQL, but this syntax doesn't allow one to create an aggregate function (like {{SUM}}, {{AVG}}, etc). This task is to add support for aggregate stored functions.

            h2. Syntax

            is not decided yet. SQL Standard 2003 doesn't support aggregate storage functions (may be the newer standard does). [Oracle|http://docs.oracle.com/cd/B19306_01/appdev.102/b14289/dciaggfns.htm], [PostgreSQL|http://www.postgresql.org/docs/8.3/static/sql-createaggregate.html], [HSQL|http://hsqldb.org/doc/guide/sqlroutines-chapt.html#src_aggregate_functions] (may be more) all implement aggregate stored functions using their own incompatible syntax extensions.

            The syntax should at least allow for
            * the code to return the function value
            * the code to be called per row in a group, it accumulates data, does not return a value
            * storage that is preserved between calls within a group
            * and possibly
            ** code to invoke between groups to reset the storage
            ** code to *remove* data from the group (useful for window functions)
            With {{CREATE FUNCTION}} one can create functions in SQL, but this syntax doesn't allow one to create an aggregate function (like {{SUM}}, {{AVG}}, etc). This task is to add support for aggregate stored functions.

            h2. Syntax

            is not decided yet. SQL Standard 2003 doesn't support aggregate storage functions (may be the newer standard does). [Oracle|http://docs.oracle.com/cd/B19306_01/appdev.102/b14289/dciaggfns.htm], [PostgreSQL|http://www.postgresql.org/docs/8.3/static/sql-createaggregate.html], [HSQL|http://hsqldb.org/doc/guide/sqlroutines-chapt.html#src_aggregate_functions] (may be more) all implement aggregate stored functions using their own incompatible syntax extensions.

            The syntax should at least allow for
            * the code to return the function value
            * the code to be called per row in a group, it accumulates data, does not return a value
            * storage that is preserved between calls within a group
            and possibly
            * code to invoke between groups to reset the storage
            * code to *remove* data from the group (useful for window functions)
            serg Sergei Golubchik made changes -
            Description With {{CREATE FUNCTION}} one can create functions in SQL, but this syntax doesn't allow one to create an aggregate function (like {{SUM}}, {{AVG}}, etc). This task is to add support for aggregate stored functions.

            h2. Syntax

            is not decided yet. SQL Standard 2003 doesn't support aggregate storage functions (may be the newer standard does). [Oracle|http://docs.oracle.com/cd/B19306_01/appdev.102/b14289/dciaggfns.htm], [PostgreSQL|http://www.postgresql.org/docs/8.3/static/sql-createaggregate.html], [HSQL|http://hsqldb.org/doc/guide/sqlroutines-chapt.html#src_aggregate_functions] (may be more) all implement aggregate stored functions using their own incompatible syntax extensions.

            The syntax should at least allow for
            * the code to return the function value
            * the code to be called per row in a group, it accumulates data, does not return a value
            * storage that is preserved between calls within a group
            and possibly
            * code to invoke between groups to reset the storage
            * code to *remove* data from the group (useful for window functions)
            With {{CREATE FUNCTION}} one can create functions in SQL, but this syntax doesn't allow one to create an aggregate function (like {{SUM}}, {{AVG}}, etc). This task is to add support for aggregate stored functions.

            h2. Syntax

            is not decided yet. SQL Standard 2003 doesn't support aggregate storage functions (may be the newer standard does). [Oracle|http://docs.oracle.com/cd/B19306_01/appdev.102/b14289/dciaggfns.htm], [PostgreSQL|http://www.postgresql.org/docs/8.3/static/sql-createaggregate.html], [HSQL|http://hsqldb.org/doc/guide/sqlroutines-chapt.html#src_aggregate_functions] (may be more) all implement aggregate stored functions using their own incompatible syntax extensions.

            The syntax should at least allow for
            * the code to return the function value
            * the code to be called per row in a group, it accumulates data, does not return a value
            * storage that is preserved between calls within a group

            and possibly
            * code to invoke between groups to reset the storage
            * code to *remove* data from the group (useful for window functions)
            serg Sergei Golubchik added a comment - - edited

            A possible approach is to have only one function that does everything — accumulates and returns the result. This is what HSQL has:

             CREATE AGGREGATE FUNCTION udavg(IN x INTEGER, IN flag BOOLEAN, INOUT acc BIGINT, INOUT counter INT)
            ...

            here the first argument — x — is the actual argument of the aggregate function (called as SELECT udagv(x) ...), the second — flag — tells where to accumulate or return the result, the last two — acc and counter — are the storage, their value is preserved between calls and automatically reset to NULL when a new group starts.

            The function then does something like

              IF flag THEN
                -- return the value
              ELSE
                -- update count and acc with the new value of x
              END IF;

            serg Sergei Golubchik added a comment - - edited A possible approach is to have only one function that does everything — accumulates and returns the result. This is what HSQL has: CREATE AGGREGATE FUNCTION udavg( IN x INTEGER , IN flag BOOLEAN, INOUT acc BIGINT , INOUT counter INT ) ... here the first argument — x — is the actual argument of the aggregate function (called as SELECT udagv(x) ... ), the second — flag — tells where to accumulate or return the result, the last two — acc and counter — are the storage, their value is preserved between calls and automatically reset to NULL when a new group starts. The function then does something like IF flag THEN -- return the value ELSE -- update count and acc with the new value of x END IF ;

            PostgreSQL assembles an aggregate function out of two regular functions. The state, like in the HSQL case, is an INOUT argument for these functions. The syntax looks like

            CREATE FUNCTION norm1 (... ) ... ;
            CREATE FUNCTION norm2 (... ) ... ;
            CREATE AGGREGATE (...) ( SFUNC=norm1, FINALFUNC=norm2, ... )

            it also allows to specify the initial value for the state and if the function is MIN or MAX (that is, can be answered by taking the first or last value from the index).

            serg Sergei Golubchik added a comment - PostgreSQL assembles an aggregate function out of two regular functions. The state, like in the HSQL case, is an INOUT argument for these functions. The syntax looks like CREATE FUNCTION norm1 (... ) ... ; CREATE FUNCTION norm2 (... ) ... ; CREATE AGGREGATE (...) ( SFUNC=norm1, FINALFUNC=norm2, ... ) it also allows to specify the initial value for the state and if the function is MIN or MAX (that is, can be answered by taking the first or last value from the index).
            serg Sergei Golubchik added a comment - - edited

            Oracle uses an object-oriented approach. An aggregate function is an object, the state is stored in the object, initialize/iterate/terminate/merge are the methods.

            create type SecondMaxImpl as object
            (
              max NUMBER, -- highest value seen so far 
              secmax NUMBER, -- second highest value seen so far
              static function ODCIAggregateInitialize(sctx IN OUT SecondMaxImpl) return number,
              member function ODCIAggregateIterate(self IN OUT SecondMaxImpl, value IN number) return number,
              member function ODCIAggregateTerminate(self IN SecondMaxImpl, returnValue OUT number, flags IN number) return number,
              member function ODCIAggregateMerge(self IN OUT SecondMaxImpl, ctx2 IN SecondMaxImpl) return number
            );
            create type body SecondMaxImpl is 
            static function ODCIAggregateInitialize(sctx IN OUT SecondMaxImpl) return number is 
            begin
              sctx := SecondMaxImpl(0, 0);
              return ODCIConst.Success;
            end;
            -- define other methods accordingly...
            CREATE FUNCTION SecondMax (input NUMBER) RETURN NUMBER 
            PARALLEL_ENABLE AGGREGATE USING SecondMaxImpl;
            

            serg Sergei Golubchik added a comment - - edited Oracle uses an object-oriented approach. An aggregate function is an object, the state is stored in the object, initialize/iterate/terminate/merge are the methods. create type SecondMaxImpl as object ( max NUMBER, -- highest value seen so far secmax NUMBER, -- second highest value seen so far static function ODCIAggregateInitialize(sctx IN OUT SecondMaxImpl) return number, member function ODCIAggregateIterate(self IN OUT SecondMaxImpl, value IN number) return number, member function ODCIAggregateTerminate(self IN SecondMaxImpl, returnValue OUT number, flags IN number) return number, member function ODCIAggregateMerge(self IN OUT SecondMaxImpl, ctx2 IN SecondMaxImpl) return number ); create type body SecondMaxImpl is static function ODCIAggregateInitialize(sctx IN OUT SecondMaxImpl) return number is begin sctx := SecondMaxImpl(0, 0); return ODCIConst.Success; end ; -- define other methods accordingly... CREATE FUNCTION SecondMax (input NUMBER) RETURN NUMBER PARALLEL_ENABLE AGGREGATE USING SecondMaxImpl;
            serg Sergei Golubchik added a comment - - edited

            Another approach is to use a pre-defined cursor that iterates the group. Like this:

            CREATE AGGREGATE FUNCTION avg(x DOUBLE) RETURNS DOUBLE
            BEGIN
              DECLARE count INT DEFAULT 0;
              DECLARE sum DOUBLE DEFAULT 0;
              DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' RETURN sum/count;
              LOOP
                FETCH GROUP NEXT ROW;
                SET count:=count+1;
                SET sum:=sum+x;
              END LOOP;
            END
            

            Note that GROUP keyword is reserved, so this syntax can be unambiguously distinguished from FETCH cursor_name.

            serg Sergei Golubchik added a comment - - edited Another approach is to use a pre-defined cursor that iterates the group. Like this: CREATE AGGREGATE FUNCTION avg (x DOUBLE ) RETURNS DOUBLE BEGIN DECLARE count INT DEFAULT 0; DECLARE sum DOUBLE DEFAULT 0; DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' RETURN sum / count ; LOOP FETCH GROUP NEXT ROW; SET count := count +1; SET sum := sum +x; END LOOP; END Note that GROUP keyword is reserved, so this syntax can be unambiguously distinguished from FETCH cursor_name .
            serg Sergei Golubchik added a comment - - edited

            Another possibility is to use a complex CREATE FUNCTION with many blocks:

            CREATE AGGREGATE FUNCTION avg(x DOUBLE) RETURNS DOUBLE
              DECLARE count INT DEFAULT 0;  -- note, state in declarations before the first BEGIN
              DECLARE sum DOUBLE DEFAULT 0;
            BEGIN                           -- this is called for every row in a group
              SET count:=count+1;
              SET sum:=sum+x;
            END;
            BEGIN                           -- this returns the result
              RETURN sum/count;
            END;
            

            but this is too confusing. BEGIN/END blocks need keywords or labels to specify what they do.

            serg Sergei Golubchik added a comment - - edited Another possibility is to use a complex CREATE FUNCTION with many blocks: CREATE AGGREGATE FUNCTION avg (x DOUBLE ) RETURNS DOUBLE DECLARE count INT DEFAULT 0; -- note, state in declarations before the first BEGIN DECLARE sum DOUBLE DEFAULT 0; BEGIN -- this is called for every row in a group SET count := count +1; SET sum := sum +x; END ; BEGIN -- this returns the result RETURN sum / count ; END ; but this is too confusing. BEGIN / END blocks need keywords or labels to specify what they do.
            serg Sergei Golubchik made changes -
            Description With {{CREATE FUNCTION}} one can create functions in SQL, but this syntax doesn't allow one to create an aggregate function (like {{SUM}}, {{AVG}}, etc). This task is to add support for aggregate stored functions.

            h2. Syntax

            is not decided yet. SQL Standard 2003 doesn't support aggregate storage functions (may be the newer standard does). [Oracle|http://docs.oracle.com/cd/B19306_01/appdev.102/b14289/dciaggfns.htm], [PostgreSQL|http://www.postgresql.org/docs/8.3/static/sql-createaggregate.html], [HSQL|http://hsqldb.org/doc/guide/sqlroutines-chapt.html#src_aggregate_functions] (may be more) all implement aggregate stored functions using their own incompatible syntax extensions.

            The syntax should at least allow for
            * the code to return the function value
            * the code to be called per row in a group, it accumulates data, does not return a value
            * storage that is preserved between calls within a group

            and possibly
            * code to invoke between groups to reset the storage
            * code to *remove* data from the group (useful for window functions)
            With {{CREATE FUNCTION}} one can create functions in SQL, but this syntax doesn't allow one to create an aggregate function (like {{SUM}}, {{AVG}}, etc). This task is to add support for aggregate stored functions.

            h2. Syntax

            is not decided yet. SQL Standard 2003 doesn't support aggregate stored functions (may be the newer standard does). [Oracle|http://docs.oracle.com/cd/B19306_01/appdev.102/b14289/dciaggfns.htm], [PostgreSQL|http://www.postgresql.org/docs/8.3/static/sql-createaggregate.html], [HSQL|http://hsqldb.org/doc/guide/sqlroutines-chapt.html#src_aggregate_functions] (may be more) all implement aggregate stored functions using their own incompatible syntax extensions. SQL Server and DB2 do not support creating new aggregate stored functions in SQL.

            The syntax should at least allow for
            * the code to return the function value
            * the code to be called per row in a group, it accumulates data, does not return a value
            * storage that is preserved between calls within a group

            and possibly
            * code to invoke between groups to reset the storage
            * code to *remove* data from the group (useful for window functions)
            varun_raiko varun gupta added a comment -

            Can you give more details on the argument x, what i understand that it is one of the attributes of the table, and we need to have an aggregate function for the attribute x

            varun_raiko varun gupta added a comment - Can you give more details on the argument x, what i understand that it is one of the attributes of the table, and we need to have an aggregate function for the attribute x

            x is the function argument. Like in

            SELECT avg(val) FROM table1 GROUP BY grn;
            

            serg Sergei Golubchik added a comment - x is the function argument. Like in SELECT avg (val) FROM table1 GROUP BY grn;
            varun_raiko varun gupta added a comment -

            Yeah understand it is the argument to the function.
            When we write
            SELECT avg(val) FROM table1 GROUP BY grn;
            val is one of the columns of our table(attribute). Am I correct?

            varun_raiko varun gupta added a comment - Yeah understand it is the argument to the function. When we write SELECT avg(val) FROM table1 GROUP BY grn; val is one of the columns of our table(attribute). Am I correct?

            val should be an arbitrary expression.

            sanja Oleksandr Byelkin added a comment - val should be an arbitrary expression.
            serg Sergei Golubchik added a comment - - edited

            Window function support

            To work efficiently over windows, an aggregate function should be able to remove a row from a group. This can be implemented in all syntax variants discussed above. In all cases row-removal functionality is optional, it is an optimization, but a aggregate function can work over windows without it too.

            Adding support for row removal is not part of this MDEV-7773, but it should be possible to add that functionality later, MDEV-7773 should be compatible with this future extension.

            HSQL

            The second argument IN becomes an ENUM(ADD,REMOVE,RESULT)

            PostgreSQL

            The definition supports a new clause REMOVEFUNC=

            Oracle

            The object gets a new method remove

            Cursor approach

            A new syntax FETCH GROUP REMOVED ROW. A value is generated when both cursors ("next" and "removed") have returned SQLSTATE 02000. Note: This is not enough, in addition return needs not to return, but that would be too confusing, wouldn't it?

            Many blocks approach

            One more block, that is invoked to remove a row

            serg Sergei Golubchik added a comment - - edited Window function support To work efficiently over windows, an aggregate function should be able to remove a row from a group. This can be implemented in all syntax variants discussed above. In all cases row-removal functionality is optional, it is an optimization, but a aggregate function can work over windows without it too. Adding support for row removal is not part of this MDEV-7773 , but it should be possible to add that functionality later, MDEV-7773 should be compatible with this future extension. HSQL The second argument IN becomes an ENUM(ADD,REMOVE,RESULT) PostgreSQL The definition supports a new clause REMOVEFUNC= Oracle The object gets a new method remove Cursor approach A new syntax FETCH GROUP REMOVED ROW . A value is generated when both cursors ("next" and "removed") have returned SQLSTATE 02000. Note: This is not enough, in addition return needs not to return, but that would be too confusing, wouldn't it? Many blocks approach One more block, that is invoked to remove a row
            serg Sergei Golubchik added a comment - - edited

            The complete example of the AVGOW (average over window) function using the cursor approach:

            CREATE AGGREGATE FUNCTION avgow(x DOUBLE) RETURNS DOUBLE
            BEGIN
              DECLARE count INT DEFAULT 0;
              DECLARE sum DOUBLE DEFAULT 0;
              DECLARE loop_done INT;
              DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET loop_done=1;
              LOOP
                SET loop_done=0;
                WHILE loop_done = 0
                  FETCH GROUP REMOVED ROW;
                  SET count:=count-1;
                  SET sum:=sum-x;
                END WHILE;
                SET loop_done=0;
                WHILE loop_done = 0
                  FETCH GROUP NEXT ROW;
                  SET count:=count+1;
                  SET sum:=sum+x;
                END WHILE;
                RETURN IF(count,sum/count,NULL);
              END LOOP;
            END
            

            It is more complex than necessary, but shows all the features. Note, that

            • there are two cursors, not one (the presence of a second cursor makes this function “window-aware” and enables two-cursor optimization).
            • after return execution continues! The function is restarted for every window partition, but it is not restarted for every frame inside a window.
            • The function exhausts the REMOVED cursor first, then the NEXT cursor. This order is required. There is no logical reason for that, it is how our window functions work at the moment. This limitation can (and should) be removed, but let's do it as the next step.
            • This function is universal, it works both for GROUP BY and for OVER WINDOW. In the former case the “frame” is equal to the whole group, so the function is restarted after every return (in other words, it has the usual semantics).
            serg Sergei Golubchik added a comment - - edited The complete example of the AVGOW (average over window) function using the cursor approach: CREATE AGGREGATE FUNCTION avgow(x DOUBLE ) RETURNS DOUBLE BEGIN DECLARE count INT DEFAULT 0; DECLARE sum DOUBLE DEFAULT 0; DECLARE loop_done INT ; DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET loop_done=1; LOOP SET loop_done=0; WHILE loop_done = 0 FETCH GROUP REMOVED ROW; SET count := count -1; SET sum := sum -x; END WHILE; SET loop_done=0; WHILE loop_done = 0 FETCH GROUP NEXT ROW; SET count := count +1; SET sum := sum +x; END WHILE; RETURN IF ( count , sum / count , NULL ); END LOOP; END It is more complex than necessary, but shows all the features. Note, that there are two cursors, not one (the presence of a second cursor makes this function “window-aware” and enables two-cursor optimization). after return execution continues! The function is restarted for every window partition, but it is not restarted for every frame inside a window. The function exhausts the REMOVED cursor first, then the NEXT cursor. This order is required . There is no logical reason for that, it is how our window functions work at the moment. This limitation can (and should) be removed, but let's do it as the next step. This function is universal, it works both for GROUP BY and for OVER WINDOW . In the former case the “frame” is equal to the whole group, so the function is restarted after every return (in other words, it has the usual semantics).

            Another problem of the cursor approach — if we ever will support non-SQL stored procedures (like Antony Curtis work), this will unlikely to be able to use cursors. At least, not generally for all languages.

            serg Sergei Golubchik added a comment - Another problem of the cursor approach — if we ever will support non-SQL stored procedures (like Antony Curtis work ), this will unlikely to be able to use cursors. At least, not generally for all languages.

            There should be ways to read cursors from any stored procedures (special functions or something like this) in any case we get control at that moment and do what we need. It is as difficult as for our procedures IMHO.

            sanja Oleksandr Byelkin added a comment - There should be ways to read cursors from any stored procedures (special functions or something like this) in any case we get control at that moment and do what we need. It is as difficult as for our procedures IMHO.
            bar Alexander Barkov added a comment - - edited

            The proposed syntax looks fine for simple cases.

            I'd propose to think towards another direction though:
            implement aggregate functions as SQL classes!

            Internally all aggregate functions are implemented as classes in the Item_sum hierarchy and they have various methods and members. To be able to support all features that a built-in aggregate function supports (like WINDOW functionality, DISTINCT, etc) we could implement stored functions as classes as well, using the SQL standard definitions for user-defined types.

            See the "<user-defined type definition>" in the SQL standard.

            A skeleton for a class definition would look like this:

            -- Define an aggregate function handler
            -- Derive it from some top-level built-in abstract class MariaDBAggregateFunctionHandler
            CREATE TYPE handler_avgow UNDER MariaDBAggregateFunctionHandler
            AS
            (
             count INT DEFAULT 0,  # member 1
             sum DOUBLE DEFAULT 0  # member 2
            )
            OVERRIDING METHOD val_real() RETURNS DOUBLE
            OVERRIDING METHOD fix_length_and_dec() RETURNS BOOLEAN
            OVERRIDING METHOD reset_and_add() RETURNS BOOLEAN
            ...
            ;
            

            After a class is defined, we can create methods using the standard syntax:
            (in SQL method cannot be defined "inline", so there will be separate statements for methods:

            -- Define aggregate handler methods
            CREATE METHOD val_real() FOR handler_avgov
             BEGIN
               RETURN IF(count,sum/count,NULL);
             END;
             CREATE METHOD fix_length_and_dec() ...;
             ...
            

            Now, when we have a class with its methods created with the standard syntax, we can create an aggregate function using non-standard syntax, for example like this:

            -- Define the function itself
            CREATE AGGREGATE FUNCTION avgow(x DOUBLE) RETURNS DOUBLE
            USING handler_avgrow;
            

            For non-SQL definitions we could use some syntax (perhaps it already exists in the SQL standard, I saw something like EXTERNAL NAME '...', which looks very related) to load implementations from external shared libraries. This syntax could include <language clause>.
            How SQL and non-SQL definitions would relate:

            • The CREATE TYPE statement will be the same for an SQL and non-SQL implementation.
            • The CREATE METHOD statement will be different for SQL and non-SQL!!! (explicit SQL routine definition vs external module reference)
            • The CREATE AGGREGATE FUNCTION statement will be the same for SQL and non-SQL

            Advantages of the class-based conception:

            • A great step towards the SQL standard. This CREATE TYPE useful per se, to create user-defined data types.
            • Splits the code logically. You don't have to care about the order of different SQL code pieces, like cursors in the above example. This code will reside in different methods, so the server will make sure to execute them in proper order. Also, you don't have to care about signals. All together this should give better performance, and better user-defined SQL code readability.
            • SQL and Non-SQL have very close interface (only CREATE METHOD differs)
            • Classes can be used to add SQL code for other user-defined purposes (not only for aggregate functions), e.g.:
              • Hybrid type functions, which need not only val_xxx() (RETURN in terms of SQL), but also fix_length_and_dec() to decide their return type depending on the parameters passed.
              • PROCEDURE in SQL (rewrite the SELECT data set into a different data set)
              • SQL code wrappers around pluggable modules for various server components (for example parser or optimizer), engines, and all other plugin types. A user defined parser could derive from the built-in parser to call the derived parse method then additionally store some statistics somewhere using SQL. An SQL wrapper for an engine could override some engine methods (using SQL) to disable certain components (e.g. there is a request to disable everything in ConnectSE, except ODBC). So there could be two instances of ConnectSE: the standard full-featured one with full functionality available for authorized users, and a derived ODBC-only one for non-authorized users.
            bar Alexander Barkov added a comment - - edited The proposed syntax looks fine for simple cases. I'd propose to think towards another direction though: implement aggregate functions as SQL classes! Internally all aggregate functions are implemented as classes in the Item_sum hierarchy and they have various methods and members. To be able to support all features that a built-in aggregate function supports (like WINDOW functionality, DISTINCT, etc) we could implement stored functions as classes as well, using the SQL standard definitions for user-defined types. See the "<user-defined type definition>" in the SQL standard. A skeleton for a class definition would look like this: -- Define an aggregate function handler -- Derive it from some top-level built-in abstract class MariaDBAggregateFunctionHandler CREATE TYPE handler_avgow UNDER MariaDBAggregateFunctionHandler AS ( count INT DEFAULT 0, # member 1 sum DOUBLE DEFAULT 0 # member 2 ) OVERRIDING METHOD val_real() RETURNS DOUBLE OVERRIDING METHOD fix_length_and_dec() RETURNS BOOLEAN OVERRIDING METHOD reset_and_add() RETURNS BOOLEAN ... ; After a class is defined, we can create methods using the standard syntax: (in SQL method cannot be defined "inline", so there will be separate statements for methods: -- Define aggregate handler methods CREATE METHOD val_real() FOR handler_avgov BEGIN RETURN IF ( count , sum / count , NULL ); END ; CREATE METHOD fix_length_and_dec() ...; ... Now, when we have a class with its methods created with the standard syntax, we can create an aggregate function using non-standard syntax, for example like this: -- Define the function itself CREATE AGGREGATE FUNCTION avgow(x DOUBLE ) RETURNS DOUBLE USING handler_avgrow; For non-SQL definitions we could use some syntax (perhaps it already exists in the SQL standard, I saw something like EXTERNAL NAME '...', which looks very related) to load implementations from external shared libraries. This syntax could include <language clause>. How SQL and non-SQL definitions would relate: The CREATE TYPE statement will be the same for an SQL and non-SQL implementation. The CREATE METHOD statement will be different for SQL and non-SQL!!! (explicit SQL routine definition vs external module reference) The CREATE AGGREGATE FUNCTION statement will be the same for SQL and non-SQL Advantages of the class-based conception: A great step towards the SQL standard. This CREATE TYPE useful per se, to create user-defined data types. Splits the code logically. You don't have to care about the order of different SQL code pieces, like cursors in the above example. This code will reside in different methods, so the server will make sure to execute them in proper order. Also, you don't have to care about signals. All together this should give better performance, and better user-defined SQL code readability. SQL and Non-SQL have very close interface (only CREATE METHOD differs) Classes can be used to add SQL code for other user-defined purposes (not only for aggregate functions), e.g.: Hybrid type functions, which need not only val_xxx() (RETURN in terms of SQL), but also fix_length_and_dec() to decide their return type depending on the parameters passed. PROCEDURE in SQL (rewrite the SELECT data set into a different data set) SQL code wrappers around pluggable modules for various server components (for example parser or optimizer), engines, and all other plugin types. A user defined parser could derive from the built-in parser to call the derived parse method then additionally store some statistics somewhere using SQL. An SQL wrapper for an engine could override some engine methods (using SQL) to disable certain components (e.g. there is a request to disable everything in ConnectSE, except ODBC). So there could be two instances of ConnectSE: the standard full-featured one with full functionality available for authorized users, and a derived ODBC-only one for non-authorized users.

            This is very similar to Oracle approach (see above).

            But this is way too verbose and complex in my opinion. It could be simplified
            as:

            CREATE AGGREGATE FUNCTION avgow
            AS
            (
             count INT DEFAULT 0,  -- member 1
             sum DOUBLE DEFAULT 0  -- member 2
            )
            OVERRIDING METHOD val_real() RETURNS DOUBLE
            OVERRIDING METHOD fix_length_and_dec() RETURNS BOOLEAN
            OVERRIDING METHOD reset_and_add() RETURNS BOOLEAN
            ...
            ;
            CREATE METHOD val_real() FOR avgow
             BEGIN
               RETURN IF(count,sum/count,NULL);
             END;
            CREATE METHOD fix_length_and_dec() ...;
             ...
            

            or even as

            CREATE AGGREGATE FUNCTION avgow
            AS
            (
             count INT DEFAULT 0,  -- member 1
             sum DOUBLE DEFAULT 0  -- member 2
            );
            CREATE METHOD val_real() FOR avgow
             BEGIN
               RETURN IF(count,sum/count,NULL);
             END;
            CREATE METHOD fix_length_and_dec() ...;
             ...
            

            But in certain cases it is more limited than cursor-based approach, it doesn't
            allow one to implement a median or a percentile function, because you cannot
            have temporary tables or arrays in the AS ( ... ) clause. Still, both
            approaches can be combined:

            CREATE AGGREGATE FUNCTION avgow(x DOUBLE) RETURNS DOUBLE
            BEGIN
              -- ... aggreagate function body, as above
            END;
            CREATE METHOD fix_length_and_dec() ... FOR avgow...
             ...
            

            And the beauty of this is that it perfectly applies to non-aggregate functions
            too:

            CREATE FUNCTION foo(x DOUBLE) RETURNS DECIMAL(3,1)
            BEGIN
              -- ...
            END;
            CREATE METHOD fix_length_and_dec() ... FOR foo...
             ...
            

            serg Sergei Golubchik added a comment - This is very similar to Oracle approach (see above). But this is way too verbose and complex in my opinion. It could be simplified as: CREATE AGGREGATE FUNCTION avgow AS ( count INT DEFAULT 0, -- member 1 sum DOUBLE DEFAULT 0 -- member 2 ) OVERRIDING METHOD val_real() RETURNS DOUBLE OVERRIDING METHOD fix_length_and_dec() RETURNS BOOLEAN OVERRIDING METHOD reset_and_add() RETURNS BOOLEAN ... ; CREATE METHOD val_real() FOR avgow BEGIN RETURN IF ( count , sum / count , NULL ); END ; CREATE METHOD fix_length_and_dec() ...; ... or even as CREATE AGGREGATE FUNCTION avgow AS ( count INT DEFAULT 0, -- member 1 sum DOUBLE DEFAULT 0 -- member 2 ); CREATE METHOD val_real() FOR avgow BEGIN RETURN IF ( count , sum / count , NULL ); END ; CREATE METHOD fix_length_and_dec() ...; ... But in certain cases it is more limited than cursor-based approach, it doesn't allow one to implement a median or a percentile function, because you cannot have temporary tables or arrays in the AS ( ... ) clause. Still, both approaches can be combined: CREATE AGGREGATE FUNCTION avgow(x DOUBLE ) RETURNS DOUBLE BEGIN -- ... aggreagate function body, as above END ; CREATE METHOD fix_length_and_dec() ... FOR avgow... ... And the beauty of this is that it perfectly applies to non-aggregate functions too: CREATE FUNCTION foo(x DOUBLE ) RETURNS DECIMAL (3,1) BEGIN -- ... END ; CREATE METHOD fix_length_and_dec() ... FOR foo... ...

            Hm, may be omitting the OVERRIDING clause is not a good idea — it might allow someone from a concurrent connection to invoke the function before it's fully defined. And that might be dangerous, if, for example, one of the overriding methods does some kind of the access checks.

            But OVERRIDING can be made optional. In the above case one will specify OVERRIDING, but often one will be able to omit it.

            Or, may be, it'll be simpler to stick to the standard here and require the OVERRIDING clause?

            serg Sergei Golubchik added a comment - Hm, may be omitting the OVERRIDING clause is not a good idea — it might allow someone from a concurrent connection to invoke the function before it's fully defined. And that might be dangerous, if, for example, one of the overriding methods does some kind of the access checks. But OVERRIDING can be made optional. In the above case one will specify OVERRIDING , but often one will be able to omit it. Or, may be, it'll be simpler to stick to the standard here and require the OVERRIDING clause?
            bar Alexander Barkov added a comment - - edited

            Good idea to build CREATE TYPE inside CREATE [AGGREGATE] FUNCTION.
            We should probably try to define method bodies inline (without having separate CREATE METHOD statements).
            This would solve the problem with concurrent connections.

            bar Alexander Barkov added a comment - - edited Good idea to build CREATE TYPE inside CREATE [AGGREGATE] FUNCTION. We should probably try to define method bodies inline (without having separate CREATE METHOD statements). This would solve the problem with concurrent connections.
            svoj Sergey Vojtovich made changes -
            Assignee Vicentiu Ciorbaru [ cvicentiu ]
            svoj Sergey Vojtovich made changes -
            Priority Minor [ 4 ] Major [ 3 ]
            svoj Sergey Vojtovich made changes -
            Fix Version/s 10.2 [ 14601 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.2 [ 14601 ]
            varun Varun Gupta (Inactive) made changes -
            Assignee Vicentiu Ciorbaru [ cvicentiu ] Varun Gupta [ varun ]
            varun Varun Gupta (Inactive) made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            dthompson David Thompson (Inactive) made changes -
            ratzpo Rasmus Johansson (Inactive) made changes -
            Sprint 10.3.1-1 [ 164 ]
            ratzpo Rasmus Johansson (Inactive) made changes -
            Rank Ranked lower
            serg Sergei Golubchik made changes -
            varun Varun Gupta (Inactive) made changes -
            Assignee Varun Gupta [ varun ] Vicentiu Ciorbaru [ cvicentiu ]
            Status In Progress [ 3 ] In Review [ 10002 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Labels gsoc15 gsoc16 10.3-beta gsoc15 gsoc16
            serg Sergei Golubchik made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Epic Link PT-57 [ 62149 ]
            ratzpo Rasmus Johansson (Inactive) made changes -
            Sprint 10.3.1-1 [ 164 ] 10.3.1-1, 10.3.1-2 [ 164, 174 ]
            ratzpo Rasmus Johansson (Inactive) made changes -
            Rank Ranked higher

            The current test case seems to corrupt the procs table:

            delimiter |;
             
            create aggregate function agg_sum(x INT) returns double
            begin
            declare z double default 0;
            declare continue handler for not found return z;
             
            loop
            fetch group next row;
            set z = z + x;
            end loop;
            end|
             
            delimiter ;|
             
            create table t1 (id int, salary int);
             
            INSERT INTO t1 VALUES (1, 100), (2, 40), (3, 6);
             
            show create function agg_sum;
            alter function agg_sum aggregate none;
            show create function agg_sum; 
            

            Final query fails with:

            # This fails with 'show create function agg_sum' failed: 1457: Failed to load routine test.agg_sum. The table mysql.proc is missing, corrupt, or contains bad data (internal code -6)
            

            cvicentiu Vicențiu Ciorbaru added a comment - The current test case seems to corrupt the procs table: delimiter |;   create aggregate function agg_sum(x INT ) returns double begin declare z double default 0; declare continue handler for not found return z;   loop fetch group next row; set z = z + x; end loop; end |   delimiter ;|   create table t1 (id int , salary int );   INSERT INTO t1 VALUES (1, 100), (2, 40), (3, 6);   show create function agg_sum; alter function agg_sum aggregate none; show create function agg_sum; Final query fails with: # This fails with 'show create function agg_sum' failed: 1457: Failed to load routine test.agg_sum. The table mysql.proc is missing, corrupt, or contains bad data (internal code -6)
            serg Sergei Golubchik made changes -
            Sprint 10.3.1-1, 10.3.1-2 [ 164, 174 ] 10.3.1-1, 10.3.1-2, 10.3.3-1 [ 164, 174, 200 ]

            Just a few minor dead-code removal and history cleanup, then we can push.

            cvicentiu Vicențiu Ciorbaru added a comment - Just a few minor dead-code removal and history cleanup, then we can push.
            cvicentiu Vicențiu Ciorbaru made changes -
            Assignee Vicentiu Ciorbaru [ cvicentiu ] Varun Gupta [ varun ]
            Status In Review [ 10002 ] Stalled [ 10000 ]

            A final rebase was required, resolved the conflicts and have sent it to cvicentiu for review.
            Have pushed it to bb-10.3-varun for final testing

            varun Varun Gupta (Inactive) added a comment - A final rebase was required, resolved the conflicts and have sent it to cvicentiu for review. Have pushed it to bb-10.3-varun for final testing
            elenst Elena Stepanova made changes -

            I haven't got anything of significance in a round of tests on bb-10.3-varun 7d8c5ee4039.
            Testing of correctness of results is limited due to numerous existing problems in this area, unrelated to the new functionality. I haven't got any functionality-specific crashes.
            After fixing MDEV-14520 and addressing all reviews, please go ahead and push to 10.3. If reviews require significant changes in the code, please notify so that I re-run the tests.

            Please make sure the functionality is documented in the KB, if not done yet.

            Among other things, I think it should be specifically mentioned in the documentation that unlike built-in functions, the custom ones don't provide automatic type recognition and act according to the types specified in the function itself. For a simple example, consider the following

            create or replace table t1 (i int, c char(8), ci char(8));
            insert into t1 values (10,'foo','10'),(2,'bar','2');
             
            delimiter $
            create or replace aggregate function agg_max1(x INT) returns INT
            begin
              declare m INT default NULL;
              declare continue handler for not found return m;
              loop
               fetch group next row;
               if (x is not null) and (m is null or m < x) then
                  set m= x;
               end if;
              end loop;
            end $
             
            create or replace aggregate function agg_max2(x BLOB) returns BLOB
            begin
              declare m BLOB default NULL;
              declare continue handler for not found return m;
              loop
               fetch group next row;
               if (x is not null) and (m is null or m < x) then
                  set m= x;
               end if;
              end loop;
            end $
            delimiter ;
            

            Here, column `i` is integer, column `ci` is text which contains integer values, and column `c` is text.
            The in-built function MAX will treat each of them accordingly – the result will be based on integer comparison for `i`, on text comparison for `ci` and `c`.

            MariaDB [test]> select max(i), max(ci), max(c) from t1;
            +--------+---------+--------+
            | max(i) | max(ci) | max(c) |
            +--------+---------+--------+
            |     10 | 2       | foo    |
            +--------+---------+--------+
            1 row in set (0.00 sec)
            

            Definitions of functions agg_max1 and agg_max2 are identical, except that agg_max1 works with integers and agg_max2 with blobs.

            On the same data, agg_max2 will do text comparison for all columns, regardless their initial type

            MariaDB [test]> select agg_max2(i), agg_max2(ci), agg_max2(c) from t1;
            +-------------+--------------+-------------+
            | agg_max2(i) | agg_max2(ci) | agg_max2(c) |
            +-------------+--------------+-------------+
            | 2           | 2            | foo         |
            +-------------+--------------+-------------+
            1 row in set (0.00 sec)
            

            Function agg_max1 will do integer comparison for anything that looks like integer – in this case, for `i` and for `ci`:

            MariaDB [test]> select agg_max1(i), agg_max1(ci) from t1;
            +-------------+--------------+
            | agg_max1(i) | agg_max1(ci) |
            +-------------+--------------+
            |          10 |           10 |
            +-------------+--------------+
            1 row in set (0.00 sec)
            

            And for something that doesn't look like integer, it doesn't work at all:

            MariaDB [test]> set sql_mode= 'STRICT_ALL_TABLES';
            Query OK, 0 rows affected (0.02 sec)
             
            MariaDB [test]> select agg_max1(c) from t1;
            ERROR 1366 (22007): Incorrect integer value: 'foo' for column 'x' at row 1
            MariaDB [test]> set sql_mode= '';
            Query OK, 0 rows affected (0.00 sec)
             
            MariaDB [test]> select agg_max1(c) from t1;
            +-------------+
            | agg_max1(c) |
            +-------------+
            |           0 |
            +-------------+
            1 row in set, 3 warnings (0.00 sec)
            

            Given that functions must define explicit types, I don't see a bug here, but it might be something users don't expect, so better to document it.

            elenst Elena Stepanova added a comment - I haven't got anything of significance in a round of tests on bb-10.3-varun 7d8c5ee4039. Testing of correctness of results is limited due to numerous existing problems in this area, unrelated to the new functionality. I haven't got any functionality-specific crashes. After fixing MDEV-14520 and addressing all reviews, please go ahead and push to 10.3. If reviews require significant changes in the code, please notify so that I re-run the tests. Please make sure the functionality is documented in the KB, if not done yet. Among other things, I think it should be specifically mentioned in the documentation that unlike built-in functions, the custom ones don't provide automatic type recognition and act according to the types specified in the function itself. For a simple example, consider the following create or replace table t1 (i int , c char (8), ci char (8)); insert into t1 values (10, 'foo' , '10' ),(2, 'bar' , '2' );   delimiter $ create or replace aggregate function agg_max1(x INT ) returns INT begin declare m INT default NULL ; declare continue handler for not found return m; loop fetch group next row; if (x is not null ) and (m is null or m < x) then set m= x; end if ; end loop; end $   create or replace aggregate function agg_max2(x BLOB) returns BLOB begin declare m BLOB default NULL ; declare continue handler for not found return m; loop fetch group next row; if (x is not null ) and (m is null or m < x) then set m= x; end if ; end loop; end $ delimiter ; Here, column `i` is integer, column `ci` is text which contains integer values, and column `c` is text. The in-built function MAX will treat each of them accordingly – the result will be based on integer comparison for `i` , on text comparison for `ci` and `c` . MariaDB [test]> select max (i), max (ci), max (c) from t1; + --------+---------+--------+ | max (i) | max (ci) | max (c) | + --------+---------+--------+ | 10 | 2 | foo | + --------+---------+--------+ 1 row in set (0.00 sec) Definitions of functions agg_max1 and agg_max2 are identical, except that agg_max1 works with integers and agg_max2 with blobs. On the same data, agg_max2 will do text comparison for all columns, regardless their initial type MariaDB [test]> select agg_max2(i), agg_max2(ci), agg_max2(c) from t1; + -------------+--------------+-------------+ | agg_max2(i) | agg_max2(ci) | agg_max2(c) | + -------------+--------------+-------------+ | 2 | 2 | foo | + -------------+--------------+-------------+ 1 row in set (0.00 sec) Function agg_max1 will do integer comparison for anything that looks like integer – in this case, for `i` and for `ci` : MariaDB [test]> select agg_max1(i), agg_max1(ci) from t1; + -------------+--------------+ | agg_max1(i) | agg_max1(ci) | + -------------+--------------+ | 10 | 10 | + -------------+--------------+ 1 row in set (0.00 sec) And for something that doesn't look like integer, it doesn't work at all: MariaDB [test]> set sql_mode= 'STRICT_ALL_TABLES' ; Query OK, 0 rows affected (0.02 sec)   MariaDB [test]> select agg_max1(c) from t1; ERROR 1366 (22007): Incorrect integer value: 'foo' for column 'x' at row 1 MariaDB [test]> set sql_mode= '' ; Query OK, 0 rows affected (0.00 sec)   MariaDB [test]> select agg_max1(c) from t1; + -------------+ | agg_max1(c) | + -------------+ | 0 | + -------------+ 1 row in set , 3 warnings (0.00 sec) Given that functions must define explicit types, I don't see a bug here, but it might be something users don't expect, so better to document it.
            elenst Elena Stepanova made changes -
            varun Varun Gupta (Inactive) made changes -
            Fix Version/s 10.3.3 [ 22644 ]
            Fix Version/s 10.3 [ 22126 ]
            Resolution Fixed [ 1 ]
            Status Stalled [ 10000 ] Closed [ 6 ]
            proust Christian Proust made changes -
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 64604 ] MariaDB v4 [ 132581 ]
            serg Sergei Golubchik made changes -
            serg Sergei Golubchik made changes -

            People

              varun Varun Gupta (Inactive)
              serg Sergei Golubchik
              Votes:
              3 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.