Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-9525

Window functions: Support for aggregate_func(DISTINCT ...) OVER (...)

Details

    Description

      Should we support DISTINCT for aggregate functions that are computed over a window?

      PostgreSQL doesn't support this:

      test=#   select a, count(distinct b) over (order by a) from t1 ;
      ERROR:  DISTINCT is not implemented for window functions
      LINE 1: select a, count(distinct b) over (order by a) from t1 ;
                        ^
      

      http://www.postgresql.org/docs/9.5/static/sql-expressions.html says "Aggregate window functions, unlike normal aggregate functions, do not allow DISTINCT or ORDER BY to be used within the function argument list"

      SQL Server doesn't support it:
      https://connect.microsoft.com/SQLServer/feedback/details/254393/over-clause-enhancement-request-distinct-clause-for-aggregate-functions
      (verified with SQL Server 2008 and 2014)
      but one can find many requests for them to support this.

      Sybase documentation says:

      No aggregation functions in the rank query are allowed to specify DISTINCT

      http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc38151.1510/html/iqrefbb/BCGEEBHA.htm
      Does this mean they don't support it?

      Oracle 12.1 gives this error

      ORA-30487: ORDER BY not allowed here
      

      Apache Drill DOESNT support it:
      https://issues.apache.org/jira/browse/DRILL-3182

      Attachments

        Issue Links

          Activity

            psergei Sergei Petrunia created issue -
            psergei Sergei Petrunia made changes -
            Field Original Value New Value
            psergei Sergei Petrunia made changes -
            Description Should we support DISTINCT for aggregate functions that are computed over a window?

            PostgreSQL doesn't support this:
            {noformat}
            test=# select a, count(distinct b) over (order by a) from t1 ;
            ERROR: DISTINCT is not implemented for window functions
            LINE 1: select a, count(distinct b) over (order by a) from t1 ;
                              ^
            {noformat}

            SQL Server doesn't support it:
            https://connect.microsoft.com/SQLServer/feedback/details/254393/over-clause-enhancement-request-distinct-clause-for-aggregate-functions
            (verified with SQL Server 2008 and 2014)
            but one can find many requests for them to support this.

            Sybase documentation says:
            " No aggregation functions in the rank query are allowed to specify DISTINCT."
            http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc38151.1510/html/iqrefbb/BCGEEBHA.htm
            Does this mean they don't support it?

            Apache Drill supports it:
            https://issues.apache.org/jira/browse/DRILL-3182
            Should we support DISTINCT for aggregate functions that are computed over a window?

            PostgreSQL doesn't support this:
            {noformat}
            test=# select a, count(distinct b) over (order by a) from t1 ;
            ERROR: DISTINCT is not implemented for window functions
            LINE 1: select a, count(distinct b) over (order by a) from t1 ;
                              ^
            {noformat}

            SQL Server doesn't support it:
            https://connect.microsoft.com/SQLServer/feedback/details/254393/over-clause-enhancement-request-distinct-clause-for-aggregate-functions
            (verified with SQL Server 2008 and 2014)
            but one can find many requests for them to support this.

            Sybase documentation says:
            " No aggregation functions in the rank query are allowed to specify DISTINCT."
            http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc38151.1510/html/iqrefbb/BCGEEBHA.htm
            Does this mean they don't support it?

            Oracle 12.1 gives this error
            {noformat}
            ORA-30487: ORDER BY not allowed here
            {noformat}

            Apache Drill supports it:
            https://issues.apache.org/jira/browse/DRILL-3182
            psergei Sergei Petrunia made changes -
            Description Should we support DISTINCT for aggregate functions that are computed over a window?

            PostgreSQL doesn't support this:
            {noformat}
            test=# select a, count(distinct b) over (order by a) from t1 ;
            ERROR: DISTINCT is not implemented for window functions
            LINE 1: select a, count(distinct b) over (order by a) from t1 ;
                              ^
            {noformat}

            SQL Server doesn't support it:
            https://connect.microsoft.com/SQLServer/feedback/details/254393/over-clause-enhancement-request-distinct-clause-for-aggregate-functions
            (verified with SQL Server 2008 and 2014)
            but one can find many requests for them to support this.

            Sybase documentation says:
            " No aggregation functions in the rank query are allowed to specify DISTINCT."
            http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc38151.1510/html/iqrefbb/BCGEEBHA.htm
            Does this mean they don't support it?

            Oracle 12.1 gives this error
            {noformat}
            ORA-30487: ORDER BY not allowed here
            {noformat}

            Apache Drill supports it:
            https://issues.apache.org/jira/browse/DRILL-3182
            Should we support DISTINCT for aggregate functions that are computed over a window?

            PostgreSQL doesn't support this:
            {noformat}
            test=# select a, count(distinct b) over (order by a) from t1 ;
            ERROR: DISTINCT is not implemented for window functions
            LINE 1: select a, count(distinct b) over (order by a) from t1 ;
                              ^
            {noformat}

            SQL Server doesn't support it:
            https://connect.microsoft.com/SQLServer/feedback/details/254393/over-clause-enhancement-request-distinct-clause-for-aggregate-functions
            (verified with SQL Server 2008 and 2014)
            but one can find many requests for them to support this.

            Sybase documentation says:
            bq.No aggregation functions in the rank query are allowed to specify DISTINCT
            http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc38151.1510/html/iqrefbb/BCGEEBHA.htm
            Does this mean they don't support it?

            Oracle 12.1 gives this error
            {noformat}
            ORA-30487: ORDER BY not allowed here
            {noformat}

            Apache Drill supports it:
            https://issues.apache.org/jira/browse/DRILL-3182

            MySQL/MariaDB's algorithm for aggregate_func(DISTINCT x) uses Unique class to remove duplicates. This approach cannot be easily extended to handle the case where rows move in and out of the window frame.

            Consider a query

            SELECT agg_func(DISTINCT c) OVER (PARTITION BY a ORDER BY b ) FROM ...

            Normally, window function code will sort the query output by (a, b). What if we sorted by (a,b,c) instead? We would be able to compute
            agg_func(DISTINCT c) easily:

            Consider some value of c=$c1. As soon as the first row with c=$c1 goes into window frame, we can compute the current agg_func value $val. All subsequent rows with c=$c1 have the same value agg_func(...)=$val.

            psergei Sergei Petrunia added a comment - MySQL/MariaDB's algorithm for aggregate_func(DISTINCT x) uses Unique class to remove duplicates. This approach cannot be easily extended to handle the case where rows move in and out of the window frame. Consider a query SELECT agg_func(DISTINCT c) OVER (PARTITION BY a ORDER BY b ) FROM ... Normally, window function code will sort the query output by (a, b). What if we sorted by (a,b,c) instead? We would be able to compute agg_func(DISTINCT c) easily: Consider some value of c=$c1. As soon as the first row with c=$c1 goes into window frame, we can compute the current agg_func value $val. All subsequent rows with c=$c1 have the same value agg_func(...)=$val.

            The idea in the above comment is incorrect.

            All subsequent rows with c=$c1 have the same value agg_func(...)=$val.

            what to do about rows that have the same value of c but different value of b? The sorting is done by a,b,c, so these rows won't be adjacent to one another.

            psergei Sergei Petrunia added a comment - The idea in the above comment is incorrect. All subsequent rows with c=$c1 have the same value agg_func(...)=$val. what to do about rows that have the same value of c but different value of b ? The sorting is done by a,b,c , so these rows won't be adjacent to one another.

            http://www.vldb.org/pvldb/vol8/p1058-leis.pdf is a paper that is not relevant to this specific task but it says this on p. 1060:

            Finally, it is
            important to note that the framing clause only affects some
            window functions, namely intra-window navigation functions
            (first value, last value, nth value), and non-distinct
            aggregate functions (min, max, count, sum, avg). The remaining
            window functions (row number, rank, lead, . . . ) and distinct
            aggregates are always evaluated on the entire partition.

            Note: distinct aggregates are always evaluated on the entire partition.

            psergei Sergei Petrunia added a comment - http://www.vldb.org/pvldb/vol8/p1058-leis.pdf is a paper that is not relevant to this specific task but it says this on p. 1060: Finally, it is important to note that the framing clause only affects some window functions, namely intra-window navigation functions (first value, last value, nth value), and non-distinct aggregate functions (min, max, count, sum, avg). The remaining window functions (row number, rank, lead, . . . ) and distinct aggregates are always evaluated on the entire partition. Note: distinct aggregates are always evaluated on the entire partition.
            psergei Sergei Petrunia made changes -
            Description Should we support DISTINCT for aggregate functions that are computed over a window?

            PostgreSQL doesn't support this:
            {noformat}
            test=# select a, count(distinct b) over (order by a) from t1 ;
            ERROR: DISTINCT is not implemented for window functions
            LINE 1: select a, count(distinct b) over (order by a) from t1 ;
                              ^
            {noformat}

            SQL Server doesn't support it:
            https://connect.microsoft.com/SQLServer/feedback/details/254393/over-clause-enhancement-request-distinct-clause-for-aggregate-functions
            (verified with SQL Server 2008 and 2014)
            but one can find many requests for them to support this.

            Sybase documentation says:
            bq.No aggregation functions in the rank query are allowed to specify DISTINCT
            http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc38151.1510/html/iqrefbb/BCGEEBHA.htm
            Does this mean they don't support it?

            Oracle 12.1 gives this error
            {noformat}
            ORA-30487: ORDER BY not allowed here
            {noformat}

            Apache Drill supports it:
            https://issues.apache.org/jira/browse/DRILL-3182
            Should we support DISTINCT for aggregate functions that are computed over a window?

            PostgreSQL doesn't support this:
            {noformat}
            test=# select a, count(distinct b) over (order by a) from t1 ;
            ERROR: DISTINCT is not implemented for window functions
            LINE 1: select a, count(distinct b) over (order by a) from t1 ;
                              ^
            {noformat}
            http://www.postgresql.org/docs/9.5/static/sql-expressions.html says "Aggregate window functions, unlike normal aggregate functions, do not allow DISTINCT or ORDER BY to be used within the function argument list"

            SQL Server doesn't support it:
            https://connect.microsoft.com/SQLServer/feedback/details/254393/over-clause-enhancement-request-distinct-clause-for-aggregate-functions
            (verified with SQL Server 2008 and 2014)
            but one can find many requests for them to support this.

            Sybase documentation says:
            bq.No aggregation functions in the rank query are allowed to specify DISTINCT
            http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc38151.1510/html/iqrefbb/BCGEEBHA.htm
            Does this mean they don't support it?

            Oracle 12.1 gives this error
            {noformat}
            ORA-30487: ORDER BY not allowed here
            {noformat}

            Apache Drill supports it:
            https://issues.apache.org/jira/browse/DRILL-3182

            The standard says this:

            13) If the window ordering clause or the window framing clause of the window structure descriptor that
            describes the <window name or specification> is present, then no <aggregate function> simply contained
            in <window function> shall specify DISTINCT or <ordered set function>.
            

            psergei Sergei Petrunia added a comment - The standard says this: 13) If the window ordering clause or the window framing clause of the window structure descriptor that describes the <window name or specification> is present, then no <aggregate function> simply contained in <window function> shall specify DISTINCT or <ordered set function>.
            psergei Sergei Petrunia added a comment - - edited

            So, if DISTINCT is used, then ORDER BY is not allowed. Then, frame definition is not allowed either, and we have

            • all rows are peers
            • the window frame is always the same, and always includes all rows in the partition.
            psergei Sergei Petrunia added a comment - - edited So, if DISTINCT is used, then ORDER BY is not allowed. Then, frame definition is not allowed either, and we have all rows are peers the window frame is always the same, and always includes all rows in the partition.
            psergei Sergei Petrunia made changes -
            Description Should we support DISTINCT for aggregate functions that are computed over a window?

            PostgreSQL doesn't support this:
            {noformat}
            test=# select a, count(distinct b) over (order by a) from t1 ;
            ERROR: DISTINCT is not implemented for window functions
            LINE 1: select a, count(distinct b) over (order by a) from t1 ;
                              ^
            {noformat}
            http://www.postgresql.org/docs/9.5/static/sql-expressions.html says "Aggregate window functions, unlike normal aggregate functions, do not allow DISTINCT or ORDER BY to be used within the function argument list"

            SQL Server doesn't support it:
            https://connect.microsoft.com/SQLServer/feedback/details/254393/over-clause-enhancement-request-distinct-clause-for-aggregate-functions
            (verified with SQL Server 2008 and 2014)
            but one can find many requests for them to support this.

            Sybase documentation says:
            bq.No aggregation functions in the rank query are allowed to specify DISTINCT
            http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc38151.1510/html/iqrefbb/BCGEEBHA.htm
            Does this mean they don't support it?

            Oracle 12.1 gives this error
            {noformat}
            ORA-30487: ORDER BY not allowed here
            {noformat}

            Apache Drill supports it:
            https://issues.apache.org/jira/browse/DRILL-3182
            Should we support DISTINCT for aggregate functions that are computed over a window?

            PostgreSQL doesn't support this:
            {noformat}
            test=# select a, count(distinct b) over (order by a) from t1 ;
            ERROR: DISTINCT is not implemented for window functions
            LINE 1: select a, count(distinct b) over (order by a) from t1 ;
                              ^
            {noformat}
            http://www.postgresql.org/docs/9.5/static/sql-expressions.html says "Aggregate window functions, unlike normal aggregate functions, do not allow DISTINCT or ORDER BY to be used within the function argument list"

            SQL Server doesn't support it:
            https://connect.microsoft.com/SQLServer/feedback/details/254393/over-clause-enhancement-request-distinct-clause-for-aggregate-functions
            (verified with SQL Server 2008 and 2014)
            but one can find many requests for them to support this.

            Sybase documentation says:
            bq.No aggregation functions in the rank query are allowed to specify DISTINCT
            http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc38151.1510/html/iqrefbb/BCGEEBHA.htm
            Does this mean they don't support it?

            Oracle 12.1 gives this error
            {noformat}
            ORA-30487: ORDER BY not allowed here
            {noformat}

            Apache Drill DOESNT support it:
            https://issues.apache.org/jira/browse/DRILL-3182
            elenst Elena Stepanova made changes -
            Component/s Optimizer - Window functions [ 13502 ]
            Component/s Optimizer [ 10200 ]

            Are you still planning to do it in 10.2? If not, let's change the fix version.

            elenst Elena Stepanova added a comment - Are you still planning to do it in 10.2? If not, let's change the fix version.
            dthompson David Thompson (Inactive) made changes -
            dthompson David Thompson (Inactive) made changes -
            serg Sergei Golubchik made changes -
            Fix Version/s 10.2 [ 14601 ]
            julien.fritsch Julien Fritsch made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 73882 ] MariaDB v4 [ 130435 ]

            People

              Unassigned Unassigned
              psergei Sergei Petrunia
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.