[MDEV-7773] Aggregate stored functions Created: 2015-03-13 Updated: 2022-06-15 Resolved: 2017-12-04 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Data Definition - Procedure |
| Fix Version/s: | 10.3.3 |
| Type: | Task | Priority: | Critical |
| Reporter: | Sergei Golubchik | Assignee: | Varun Gupta (Inactive) |
| Resolution: | Fixed | Votes: | 3 |
| Labels: | 10.3-beta, gsoc15, gsoc16 | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Sprint: | 10.3.1-1, 10.3.1-2, 10.3.3-1 | ||||||||||||||||||||||||||||||||
| Description |
|
With CREATE FUNCTION one can create functions in SQL, but this syntax doesn't allow one to create an aggregate function (like SUM, AVG, etc). This task is to add support for aggregate stored functions. Syntaxis not decided yet. SQL Standard 2003 doesn't support aggregate stored functions (may be the newer standard does). Oracle, PostgreSQL, HSQL (may be more) all implement aggregate stored functions using their own incompatible syntax extensions. SQL Server and DB2 do not support creating new aggregate stored functions in SQL. The syntax should at least allow for
and possibly
|
| Comments |
| Comment by Sergei Golubchik [ 2016-02-19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
A possible approach is to have only one function that does everything — accumulates and returns the result. This is what HSQL has:
here the first argument — x — is the actual argument of the aggregate function (called as SELECT udagv(x) ...), the second — flag — tells where to accumulate or return the result, the last two — acc and counter — are the storage, their value is preserved between calls and automatically reset to NULL when a new group starts. The function then does something like
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2016-02-19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
PostgreSQL assembles an aggregate function out of two regular functions. The state, like in the HSQL case, is an INOUT argument for these functions. The syntax looks like
it also allows to specify the initial value for the state and if the function is MIN or MAX (that is, can be answered by taking the first or last value from the index). | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2016-02-19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Oracle uses an object-oriented approach. An aggregate function is an object, the state is stored in the object, initialize/iterate/terminate/merge are the methods.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2016-02-19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Another approach is to use a pre-defined cursor that iterates the group. Like this:
Note that GROUP keyword is reserved, so this syntax can be unambiguously distinguished from FETCH cursor_name. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2016-02-19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Another possibility is to use a complex CREATE FUNCTION with many blocks:
but this is too confusing. BEGIN/END blocks need keywords or labels to specify what they do. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by varun gupta [ 2016-02-29 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Can you give more details on the argument x, what i understand that it is one of the attributes of the table, and we need to have an aggregate function for the attribute x | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2016-02-29 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
x is the function argument. Like in
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by varun gupta [ 2016-02-29 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Yeah understand it is the argument to the function. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Oleksandr Byelkin [ 2016-03-30 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
val should be an arbitrary expression. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2016-04-08 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Window function supportTo work efficiently over windows, an aggregate function should be able to remove a row from a group. This can be implemented in all syntax variants discussed above. In all cases row-removal functionality is optional, it is an optimization, but a aggregate function can work over windows without it too. Adding support for row removal is not part of this HSQLThe second argument IN becomes an ENUM(ADD,REMOVE,RESULT) PostgreSQLThe definition supports a new clause REMOVEFUNC= OracleThe object gets a new method remove Cursor approachA new syntax FETCH GROUP REMOVED ROW. A value is generated when both cursors ("next" and "removed") have returned SQLSTATE 02000. Note: This is not enough, in addition return needs not to return, but that would be too confusing, wouldn't it? Many blocks approachOne more block, that is invoked to remove a row | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2016-04-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The complete example of the AVGOW (average over window) function using the cursor approach:
It is more complex than necessary, but shows all the features. Note, that
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2016-04-29 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Another problem of the cursor approach — if we ever will support non-SQL stored procedures (like Antony Curtis work), this will unlikely to be able to use cursors. At least, not generally for all languages. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Oleksandr Byelkin [ 2016-04-29 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
There should be ways to read cursors from any stored procedures (special functions or something like this) in any case we get control at that moment and do what we need. It is as difficult as for our procedures IMHO. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2016-05-01 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The proposed syntax looks fine for simple cases. I'd propose to think towards another direction though: Internally all aggregate functions are implemented as classes in the Item_sum hierarchy and they have various methods and members. To be able to support all features that a built-in aggregate function supports (like WINDOW functionality, DISTINCT, etc) we could implement stored functions as classes as well, using the SQL standard definitions for user-defined types. See the "<user-defined type definition>" in the SQL standard. A skeleton for a class definition would look like this:
After a class is defined, we can create methods using the standard syntax:
Now, when we have a class with its methods created with the standard syntax, we can create an aggregate function using non-standard syntax, for example like this:
For non-SQL definitions we could use some syntax (perhaps it already exists in the SQL standard, I saw something like EXTERNAL NAME '...', which looks very related) to load implementations from external shared libraries. This syntax could include <language clause>.
Advantages of the class-based conception:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2016-05-06 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
This is very similar to Oracle approach (see above). But this is way too verbose and complex in my opinion. It could be simplified
or even as
But in certain cases it is more limited than cursor-based approach, it doesn't
And the beauty of this is that it perfectly applies to non-aggregate functions
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2016-05-06 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hm, may be omitting the OVERRIDING clause is not a good idea — it might allow someone from a concurrent connection to invoke the function before it's fully defined. And that might be dangerous, if, for example, one of the overriding methods does some kind of the access checks. But OVERRIDING can be made optional. In the above case one will specify OVERRIDING, but often one will be able to omit it. Or, may be, it'll be simpler to stick to the standard here and require the OVERRIDING clause? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2016-05-06 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Good idea to build CREATE TYPE inside CREATE [AGGREGATE] FUNCTION. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Vicențiu Ciorbaru [ 2017-07-10 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The current test case seems to corrupt the procs table:
Final query fails with:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Vicențiu Ciorbaru [ 2017-10-28 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Just a few minor dead-code removal and history cleanup, then we can push. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Varun Gupta (Inactive) [ 2017-11-26 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
A final rebase was required, resolved the conflicts and have sent it to cvicentiu for review. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2017-11-29 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I haven't got anything of significance in a round of tests on bb-10.3-varun 7d8c5ee4039. Please make sure the functionality is documented in the KB, if not done yet. Among other things, I think it should be specifically mentioned in the documentation that unlike built-in functions, the custom ones don't provide automatic type recognition and act according to the types specified in the function itself. For a simple example, consider the following
Here, column `i` is integer, column `ci` is text which contains integer values, and column `c` is text.
Definitions of functions agg_max1 and agg_max2 are identical, except that agg_max1 works with integers and agg_max2 with blobs. On the same data, agg_max2 will do text comparison for all columns, regardless their initial type
Function agg_max1 will do integer comparison for anything that looks like integer – in this case, for `i` and for `ci`:
And for something that doesn't look like integer, it doesn't work at all:
Given that functions must define explicit types, I don't see a bug here, but it might be something users don't expect, so better to document it. |