[MDEV-9525] Window functions: Support for aggregate_func(DISTINCT ...) OVER (...) Created: 2016-02-06 Updated: 2019-12-09 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | Optimizer - Window functions |
| Fix Version/s: | None |
| Type: | Task | Priority: | Critical |
| Reporter: | Sergei Petrunia | Assignee: | Unassigned |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||
| Description |
|
Should we support DISTINCT for aggregate functions that are computed over a window? PostgreSQL doesn't support this:
http://www.postgresql.org/docs/9.5/static/sql-expressions.html says "Aggregate window functions, unlike normal aggregate functions, do not allow DISTINCT or ORDER BY to be used within the function argument list" SQL Server doesn't support it: Sybase documentation says:
http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc38151.1510/html/iqrefbb/BCGEEBHA.htm Oracle 12.1 gives this error
Apache Drill DOESNT support it: |
| Comments |
| Comment by Sergei Petrunia [ 2016-02-06 ] | |||
|
MySQL/MariaDB's algorithm for aggregate_func(DISTINCT x) uses Unique class to remove duplicates. This approach cannot be easily extended to handle the case where rows move in and out of the window frame. Consider a query
Normally, window function code will sort the query output by (a, b). What if we sorted by (a,b,c) instead? We would be able to compute Consider some value of c=$c1. As soon as the first row with c=$c1 goes into window frame, we can compute the current agg_func value $val. All subsequent rows with c=$c1 have the same value agg_func(...)=$val. | |||
| Comment by Sergei Petrunia [ 2016-02-22 ] | |||
|
The idea in the above comment is incorrect.
what to do about rows that have the same value of c but different value of b? The sorting is done by a,b,c, so these rows won't be adjacent to one another. | |||
| Comment by Sergei Petrunia [ 2016-02-22 ] | |||
|
http://www.vldb.org/pvldb/vol8/p1058-leis.pdf is a paper that is not relevant to this specific task but it says this on p. 1060:
Note: distinct aggregates are always evaluated on the entire partition. | |||
| Comment by Sergei Petrunia [ 2016-03-15 ] | |||
|
The standard says this:
| |||
| Comment by Sergei Petrunia [ 2016-03-15 ] | |||
|
So, if DISTINCT is used, then ORDER BY is not allowed. Then, frame definition is not allowed either, and we have
| |||
| Comment by Elena Stepanova [ 2017-01-24 ] | |||
|
Are you still planning to do it in 10.2? If not, let's change the fix version. |