[MDEV-9525] Window functions: Support for aggregate_func(DISTINCT ...) OVER (...) Created: 2016-02-06  Updated: 2019-12-09

Status: Open
Project: MariaDB Server
Component/s: Optimizer - Window functions
Fix Version/s: None

Type: Task Priority: Critical
Reporter: Sergei Petrunia Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Blocks
blocks MCOL-645 MariaDB WF Add DISTINCT to Window Fun... Closed
PartOf
is part of MDEV-6115 window functions as in the SQL standard Closed
is part of MDEV-12987 complete window function support for ... Open

 Description   

Should we support DISTINCT for aggregate functions that are computed over a window?

PostgreSQL doesn't support this:

test=#   select a, count(distinct b) over (order by a) from t1 ;
ERROR:  DISTINCT is not implemented for window functions
LINE 1: select a, count(distinct b) over (order by a) from t1 ;
                  ^

http://www.postgresql.org/docs/9.5/static/sql-expressions.html says "Aggregate window functions, unlike normal aggregate functions, do not allow DISTINCT or ORDER BY to be used within the function argument list"

SQL Server doesn't support it:
https://connect.microsoft.com/SQLServer/feedback/details/254393/over-clause-enhancement-request-distinct-clause-for-aggregate-functions
(verified with SQL Server 2008 and 2014)
but one can find many requests for them to support this.

Sybase documentation says:

No aggregation functions in the rank query are allowed to specify DISTINCT

http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc38151.1510/html/iqrefbb/BCGEEBHA.htm
Does this mean they don't support it?

Oracle 12.1 gives this error

ORA-30487: ORDER BY not allowed here

Apache Drill DOESNT support it:
https://issues.apache.org/jira/browse/DRILL-3182



 Comments   
Comment by Sergei Petrunia [ 2016-02-06 ]

MySQL/MariaDB's algorithm for aggregate_func(DISTINCT x) uses Unique class to remove duplicates. This approach cannot be easily extended to handle the case where rows move in and out of the window frame.

Consider a query

SELECT agg_func(DISTINCT c) OVER (PARTITION BY a ORDER BY b ) FROM ...

Normally, window function code will sort the query output by (a, b). What if we sorted by (a,b,c) instead? We would be able to compute
agg_func(DISTINCT c) easily:

Consider some value of c=$c1. As soon as the first row with c=$c1 goes into window frame, we can compute the current agg_func value $val. All subsequent rows with c=$c1 have the same value agg_func(...)=$val.

Comment by Sergei Petrunia [ 2016-02-22 ]

The idea in the above comment is incorrect.

All subsequent rows with c=$c1 have the same value agg_func(...)=$val.

what to do about rows that have the same value of c but different value of b? The sorting is done by a,b,c, so these rows won't be adjacent to one another.

Comment by Sergei Petrunia [ 2016-02-22 ]

http://www.vldb.org/pvldb/vol8/p1058-leis.pdf is a paper that is not relevant to this specific task but it says this on p. 1060:

Finally, it is
important to note that the framing clause only affects some
window functions, namely intra-window navigation functions
(first value, last value, nth value), and non-distinct
aggregate functions (min, max, count, sum, avg). The remaining
window functions (row number, rank, lead, . . . ) and distinct
aggregates are always evaluated on the entire partition.

Note: distinct aggregates are always evaluated on the entire partition.

Comment by Sergei Petrunia [ 2016-03-15 ]

The standard says this:

13) If the window ordering clause or the window framing clause of the window structure descriptor that
describes the <window name or specification> is present, then no <aggregate function> simply contained
in <window function> shall specify DISTINCT or <ordered set function>.

Comment by Sergei Petrunia [ 2016-03-15 ]

So, if DISTINCT is used, then ORDER BY is not allowed. Then, frame definition is not allowed either, and we have

  • all rows are peers
  • the window frame is always the same, and always includes all rows in the partition.
Comment by Elena Stepanova [ 2017-01-24 ]

Are you still planning to do it in 10.2? If not, let's change the fix version.

Generated at Thu Feb 08 07:35:19 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.