[MDEV-8091] Simple window functions - Jira

Vicențiu Ciorbaru created issue - 2015-05-02 22:14

Vicențiu Ciorbaru added a comment - 2015-05-02 22:15 - edited

psergey igor sanja

Vicențiu Ciorbaru added a comment - 2015-05-02 22:15 - edited psergey igor sanja

Vicențiu Ciorbaru made changes - 2015-05-02 22:17

Field	Original Value	New Value
Description	There is a class of window functions that can be computed on the fly, after ordering. These functions are: * rank * dense_rank * row_number These functions can be computed directly. In order to do this we must: # Sort the rows. # Detect partition boundaries (on the fly as well) # Given partition boundaries, compute the corresponding function value, for each row. A second set of functions: * percent_rank * ntile Can be computed given two passes. The extra information that we require is the number of rows in the partition. In order to find the number of rows, we must first detect partition boundaries and then we can compute the number of rows per partition. Key pieces for implementing this: * Make use of the filesort interface, since we do not need temporary tables for these functions. * It is a very similar use case to the GROUP BY statement. An important task is figuring out partition boundaries. The classes used for computing GROUP BY, might prove useful. CC: Sergei Petrunia, Igor Babaev, Oleksandr Byelkin	There is a class of window functions that can be computed on the fly, after ordering. These functions are: * rank * dense_rank * row_number These functions can be computed directly. In order to do this we must: # Sort the rows. # Detect partition boundaries (on the fly as well) # Given partition boundaries, compute the corresponding function value, for each row. A second set of functions: * percent_rank * ntile Can be computed given two passes. The extra information that we require is the number of rows in the partition. In order to find the number of rows, we must first detect partition boundaries and then we can compute the number of rows per partition. Key pieces for implementing this: * Make use of the filesort interface, since we do not need temporary tables for these functions. * It is a very similar use case to the GROUP BY statement. An important task is figuring out partition boundaries. The classes used for computing GROUP BY, might prove useful.

Sergei Petrunia made changes - 2015-05-04 19:17

Link

This issue relates to ~~MDEV-6115~~ [ ~~MDEV-6115~~ ]

Rasmus Johansson (Inactive) made changes - 2015-05-18 17:51

Workflow

MariaDB v2 [ 60755 ]

MariaDB v3 [ 64832 ]

Rasmus Johansson (Inactive) made changes - 2015-11-17 17:51

Sprint

10.2.0-1 [ 21 ]

Rasmus Johansson (Inactive) made changes - 2015-11-17 17:51

Rank

Ranked higher

Rasmus Johansson (Inactive) made changes - 2015-11-17 23:27

Sprint

10.2.0-1 [ 21 ]

Rasmus Johansson (Inactive) made changes - 2015-11-17 23:27

Rank

Ranked higher

Sergei Petrunia added a comment - 2015-11-26 00:48

Overview of the solution sketch that was pushed into 10.1-window branch:

JOIN::exec has a piece of code that detects that

the select uses one table
all windowing functions have the same ORDER BY clause
all windowing functions allow for streaming computation

if this is the case

it runs filesort() to sort the source table in the required ordering
then, end_send() has a code that calls func->advance_window() for
all window function items

then

+void Item_window_func::advance_window() {

+  int changed = test_if_group_changed(partition_fields);

+  if (changed > -1) {

+    window_func->clear();

+  }

+  window_func->add();

+}

and this computes the window function. It is done on the fly.

Sergei Petrunia added a comment - 2015-11-26 00:48 Overview of the solution sketch that was pushed into 10.1-window branch: JOIN::exec has a piece of code that detects that the select uses one table all windowing functions have the same ORDER BY clause all windowing functions allow for streaming computation if this is the case it runs filesort() to sort the source table in the required ordering then, end_send() has a code that calls func->advance_window() for all window function items then +void Item_window_func::advance_window() { + int changed = test_if_group_changed(partition_fields); + + if (changed > -1) { + window_func->clear(); + } + window_func->add(); +} and this computes the window function. It is done on the fly.

Rasmus Johansson (Inactive) made changes - 2016-01-27 11:21

Sprint

10.2.0-5 [ 32 ]

Rasmus Johansson (Inactive) made changes - 2016-01-27 11:21

Rank

Ranked higher

Vicențiu Ciorbaru made changes - 2016-02-23 14:59

Sprint

10.2.0-5 [ 32 ]

10.2.0-5, 10.2.0-6 [ 32, 37 ]

Rasmus Johansson (Inactive) made changes - 2016-03-03 08:14

Sprint

10.2.0-5, 10.2.0-6 [ 32, 37 ]

10.2.0-5, 10.2.0-6, 10.2.0-7 [ 32, 37, 39 ]

Rasmus Johansson (Inactive) made changes - 2016-03-03 08:14

Rank

Ranked lower

Vicențiu Ciorbaru made changes - 2016-03-03 08:37

Status

Open [ 1 ]

In Progress [ 3 ]

Sergei Petrunia made changes - 2016-03-08 16:34

Description

There is a class of window functions that can be computed on the fly, after ordering. These functions are:
* rank
* dense_rank
* row_number

These functions can be computed directly. In order to do this we must:
# Sort the rows.
# Detect partition boundaries (on the fly as well)
# Given partition boundaries, compute the corresponding function value, for each row.

A second set of functions:
* percent_rank
* ntile
Can be computed given two passes. The extra information that we require is the number of rows in the partition. In order to find the number of rows, we must first detect partition boundaries and then we can compute the number of rows per partition.

Key pieces for implementing this:
* Make use of the filesort interface, since we do not need temporary tables for these functions.
* It is a very similar use case to the GROUP BY statement. An important task is figuring out partition boundaries. The classes used for computing GROUP BY, might prove useful.

There is a class of window functions that can be computed on the fly, after ordering. These functions are:
* -rank- (DONE)
* -dense_rank- (DONE)
* -row_number- (DONE)
* first_value

These functions can be computed directly. In order to do this we must:
# Sort the rows.
# Detect partition boundaries (on the fly as well)
# Given partition boundaries, compute the corresponding function value, for each row.

A second set of functions:
* percent_rank
* cume_dist
* ntile
* lag
* lead
* nth_value
* last_value
Can be computed given two passes. The extra information that we require is the number of rows in the partition. In order to find the number of rows, we must first detect partition boundaries and then we can compute the number of rows per partition.

Key pieces for implementing this:
* Make use of the filesort interface, since we do not need temporary tables for these functions.
* It is a very similar use case to the GROUP BY statement. An important task is figuring out partition boundaries. The classes used for computing GROUP BY, might prove useful.

Sergei Petrunia made changes - 2016-03-08 16:35

Description

There is a class of window functions that can be computed on the fly, after ordering. These functions are:
* -rank- (DONE)
* -dense_rank- (DONE)
* -row_number- (DONE)
* first_value

These functions can be computed directly. In order to do this we must:
# Sort the rows.
# Detect partition boundaries (on the fly as well)
# Given partition boundaries, compute the corresponding function value, for each row.

A second set of functions:
* percent_rank
* cume_dist
* ntile
* lag
* lead
* nth_value
* last_value
Can be computed given two passes. The extra information that we require is the number of rows in the partition. In order to find the number of rows, we must first detect partition boundaries and then we can compute the number of rows per partition.

Key pieces for implementing this:
* Make use of the filesort interface, since we do not need temporary tables for these functions.
* It is a very similar use case to the GROUP BY statement. An important task is figuring out partition boundaries. The classes used for computing GROUP BY, might prove useful.

There is a class of window functions that can be computed on the fly, after ordering. These functions are:
* -rank- (DONE)
* -dense_rank- (DONE)
* -row_number- (DONE)
* first_value

These functions can be computed directly. In order to do this we must:
# Sort the rows.
# Detect partition boundaries (on the fly as well)
# Given partition boundaries, compute the corresponding function value, for each row.

A second set of functions:
* percent_rank (test implementation needs review)
* cume_dist
* ntile
* lag
* lead
* nth_value
* last_value
Can be computed given two passes. The extra information that we require is the number of rows in the partition. In order to find the number of rows, we must first detect partition boundaries and then we can compute the number of rows per partition.

Key pieces for implementing this:
* Make use of the filesort interface, since we do not need temporary tables for these functions.
* It is a very similar use case to the GROUP BY statement. An important task is figuring out partition boundaries. The classes used for computing GROUP BY, might prove useful.

Sergei Petrunia made changes - 2016-03-08 16:53

Description

There is a class of window functions that can be computed on the fly, after ordering. These functions are:
* -rank- (DONE)
* -dense_rank- (DONE)
* -row_number- (DONE)
* first_value

These functions can be computed directly. In order to do this we must:
# Sort the rows.
# Detect partition boundaries (on the fly as well)
# Given partition boundaries, compute the corresponding function value, for each row.

A second set of functions:
* percent_rank (test implementation needs review)
* cume_dist
* ntile
* lag
* lead
* nth_value
* last_value
Can be computed given two passes. The extra information that we require is the number of rows in the partition. In order to find the number of rows, we must first detect partition boundaries and then we can compute the number of rows per partition.

Key pieces for implementing this:
* Make use of the filesort interface, since we do not need temporary tables for these functions.
* It is a very similar use case to the GROUP BY statement. An important task is figuring out partition boundaries. The classes used for computing GROUP BY, might prove useful.

There is a class of window functions that can be computed on the fly, after ordering. These functions are:
* -rank- (DONE)
* -dense_rank- (DONE)
* -row_number- (DONE)
* first_value

These functions can be computed directly. In order to do this we must:
# Sort the rows.
# Detect partition boundaries (on the fly as well)
# Given partition boundaries, compute the corresponding function value, for each row.

A second set of functions:
* percent_rank (test implementation needs review)
* cume_dist (test implementation needs review)
* ntile
* lag
* lead
* nth_value
* last_value
Can be computed given two passes. The extra information that we require is the number of rows in the partition. In order to find the number of rows, we must first detect partition boundaries and then we can compute the number of rows per partition.

Key pieces for implementing this:
* Make use of the filesort interface, since we do not need temporary tables for these functions.
* It is a very similar use case to the GROUP BY statement. An important task is figuring out partition boundaries. The classes used for computing GROUP BY, might prove useful.

Sergei Petrunia made changes - 2016-03-08 16:57

Description

There is a class of window functions that can be computed on the fly, after ordering. These functions are:
* -rank- (DONE)
* -dense_rank- (DONE)
* -row_number- (DONE)
* first_value

These functions can be computed directly. In order to do this we must:
# Sort the rows.
# Detect partition boundaries (on the fly as well)
# Given partition boundaries, compute the corresponding function value, for each row.

A second set of functions:
* percent_rank (test implementation needs review)
* cume_dist (test implementation needs review)
* ntile
* lag
* lead
* nth_value
* last_value
Can be computed given two passes. The extra information that we require is the number of rows in the partition. In order to find the number of rows, we must first detect partition boundaries and then we can compute the number of rows per partition.

Key pieces for implementing this:
* Make use of the filesort interface, since we do not need temporary tables for these functions.
* It is a very similar use case to the GROUP BY statement. An important task is figuring out partition boundaries. The classes used for computing GROUP BY, might prove useful.

There is a class of window functions that can be computed on the fly, after ordering. These functions are:
* -rank- (DONE)
* -dense_rank- (DONE)
* -row_number- (DONE)
* first_value

These functions can be computed directly. In order to do this we must:
# Sort the rows.
# Detect partition boundaries (on the fly as well)
# Given partition boundaries, compute the corresponding function value, for each row.

A second set of functions:
* percent_rank (test implementation needs review)
* cume_dist (test implementation needs review)
* ntile
* nth_value
* last_value
Can be computed given two passes. The extra information that we require is the number of rows in the partition. In order to find the number of rows, we must first detect partition boundaries and then we can compute the number of rows per partition.

TODO: how are these functions classified?
* lag
* lead

Key pieces for implementing this:
* Make use of the filesort interface, since we do not need temporary tables for these functions.
* It is a very similar use case to the GROUP BY statement. An important task is figuring out partition boundaries. The classes used for computing GROUP BY, might prove useful.

Rasmus Johansson (Inactive) made changes - 2016-03-09 13:04

Sprint

10.2.0-5, 10.2.0-6, 10.2.0-7 [ 32, 37, 39 ]

10.2.0-5, 10.2.0-6, 10.2.0-7, 10.2.0-8 [ 32, 37, 39, 41 ]

Rasmus Johansson (Inactive) made changes - 2016-03-09 13:04

Rank

Ranked lower

Sergei Petrunia made changes - 2016-03-14 19:25

Description

There is a class of window functions that can be computed on the fly, after ordering. These functions are:
* -rank- (DONE)
* -dense_rank- (DONE)
* -row_number- (DONE)
* first_value

These functions can be computed directly. In order to do this we must:
# Sort the rows.
# Detect partition boundaries (on the fly as well)
# Given partition boundaries, compute the corresponding function value, for each row.

A second set of functions:
* percent_rank (test implementation needs review)
* cume_dist (test implementation needs review)
* ntile
* nth_value
* last_value
Can be computed given two passes. The extra information that we require is the number of rows in the partition. In order to find the number of rows, we must first detect partition boundaries and then we can compute the number of rows per partition.

TODO: how are these functions classified?
* lag
* lead

Key pieces for implementing this:
* Make use of the filesort interface, since we do not need temporary tables for these functions.
* It is a very similar use case to the GROUP BY statement. An important task is figuring out partition boundaries. The classes used for computing GROUP BY, might prove useful.

There is a class of window functions that can be computed on the fly, after ordering. These functions are:
* -rank- (DONE)
* -dense_rank- (DONE)
* -row_number- (DONE)
* first_value

These functions can be computed directly. In order to do this we must:
# Sort the rows.
# Detect partition boundaries (on the fly as well)
# Given partition boundaries, compute the corresponding function value, for each row.

Two-pass window functions:
* percent_rank (test implementation needs review)
* cume_dist (test implementation needs review)
* ntile
* nth_value
* last_value
these require two passes over partition to compute. The extra information that we require is the number of rows in the partition. In order to find the number of rows, we must first detect partition boundaries and then we can compute the number of rows per partition.

TODO: how are these functions classified?
* lag
* lead

Key pieces for implementing this:
* Make use of the filesort interface, since we do not need temporary tables for these functions.
* It is a very similar use case to the GROUP BY statement. An important task is figuring out partition boundaries. The classes used for computing GROUP BY, might prove useful.

Sergei Petrunia made changes - 2016-03-14 19:29

Description

There is a class of window functions that can be computed on the fly, after ordering. These functions are:
* -rank- (DONE)
* -dense_rank- (DONE)
* -row_number- (DONE)
* first_value

These functions can be computed directly. In order to do this we must:
# Sort the rows.
# Detect partition boundaries (on the fly as well)
# Given partition boundaries, compute the corresponding function value, for each row.

Two-pass window functions:
* percent_rank (test implementation needs review)
* cume_dist (test implementation needs review)
* ntile
* nth_value
* last_value
these require two passes over partition to compute. The extra information that we require is the number of rows in the partition. In order to find the number of rows, we must first detect partition boundaries and then we can compute the number of rows per partition.

TODO: how are these functions classified?
* lag
* lead

Key pieces for implementing this:
* Make use of the filesort interface, since we do not need temporary tables for these functions.
* It is a very similar use case to the GROUP BY statement. An important task is figuring out partition boundaries. The classes used for computing GROUP BY, might prove useful.

There is a class of window functions that can be computed on the fly, after ordering. These functions are:
* -rank- (DONE)
* -dense_rank- (DONE)
* -row_number- (DONE)
* first_value

These functions can be computed directly. In order to do this we must:
# Sort the rows.
# Detect partition boundaries (on the fly as well)
# Given partition boundaries, compute the corresponding function value, for each row.

Two-pass window functions:
* percent_rank (test implementation needs review)
* cume_dist (test implementation needs review)
* ntile
* nth_value
* last_value
these require two passes over partition to compute. The extra information that we require is the number of rows in the partition. In order to find the number of rows, we must first detect partition boundaries and then we can compute the number of rows per partition.

Two-cursor window functions:
* lag
* lead
these require an additional cursor that is traveling n rows ahead/behind the current_row cursor.

Key pieces for implementing this:
* Make use of the filesort interface, since we do not need temporary tables for these functions.
* It is a very similar use case to the GROUP BY statement. An important task is figuring out partition boundaries. The classes used for computing GROUP BY, might prove useful.

Sergei Petrunia made changes - 2016-03-15 21:17

Description

There is a class of window functions that can be computed on the fly, after ordering. These functions are:
* -rank- (DONE)
* -dense_rank- (DONE)
* -row_number- (DONE)
* first_value

These functions can be computed directly. In order to do this we must:
# Sort the rows.
# Detect partition boundaries (on the fly as well)
# Given partition boundaries, compute the corresponding function value, for each row.

Two-pass window functions:
* percent_rank (test implementation needs review)
* cume_dist (test implementation needs review)
* ntile
* nth_value
* last_value
these require two passes over partition to compute. The extra information that we require is the number of rows in the partition. In order to find the number of rows, we must first detect partition boundaries and then we can compute the number of rows per partition.

Two-cursor window functions:
* lag
* lead
these require an additional cursor that is traveling n rows ahead/behind the current_row cursor.

Key pieces for implementing this:
* Make use of the filesort interface, since we do not need temporary tables for these functions.
* It is a very similar use case to the GROUP BY statement. An important task is figuring out partition boundaries. The classes used for computing GROUP BY, might prove useful.

There is a class of window functions that can be computed on the fly, after ordering. These functions are:
* -rank- (DONE)
* -dense_rank- (DONE)
* -row_number- (DONE)
* first_value (this is frame-based)

These functions can be computed directly. In order to do this we must:
# Sort the rows.
# Detect partition boundaries (on the fly as well)
# Given partition boundaries, compute the corresponding function value, for each row.

Two-pass window functions:
* percent_rank (test implementation needs review)
* cume_dist (test implementation needs review)
* ntile
* nth_value (this is frame-based)
* last_value (this is frame-based)
these require two passes over partition to compute. The extra information that we require is the number of rows in the partition. In order to find the number of rows, we must first detect partition boundaries and then we can compute the number of rows per partition.

Two-cursor window functions:
* lag
* lead
these require an additional cursor that is traveling n rows ahead/behind the current_row cursor.

Key pieces for implementing this:
* Make use of the filesort interface, since we do not need temporary tables for these functions.
* It is a very similar use case to the GROUP BY statement. An important task is figuring out partition boundaries. The classes used for computing GROUP BY, might prove useful.

Sergei Petrunia made changes - 2016-03-16 20:18

Description

There is a class of window functions that can be computed on the fly, after ordering. These functions are:
* -rank- (DONE)
* -dense_rank- (DONE)
* -row_number- (DONE)
* first_value (this is frame-based)

These functions can be computed directly. In order to do this we must:
# Sort the rows.
# Detect partition boundaries (on the fly as well)
# Given partition boundaries, compute the corresponding function value, for each row.

Two-pass window functions:
* percent_rank (test implementation needs review)
* cume_dist (test implementation needs review)
* ntile
* nth_value (this is frame-based)
* last_value (this is frame-based)
these require two passes over partition to compute. The extra information that we require is the number of rows in the partition. In order to find the number of rows, we must first detect partition boundaries and then we can compute the number of rows per partition.

Two-cursor window functions:
* lag
* lead
these require an additional cursor that is traveling n rows ahead/behind the current_row cursor.

Key pieces for implementing this:
* Make use of the filesort interface, since we do not need temporary tables for these functions.
* It is a very similar use case to the GROUP BY statement. An important task is figuring out partition boundaries. The classes used for computing GROUP BY, might prove useful.

There is a class of window functions that can be computed on the fly, after ordering. These functions are:
* -rank- (DONE)
* -dense_rank- (DONE)
* -row_number- (DONE)
* first_value (this is frame-based)

These functions can be computed directly. In order to do this we must:
# Sort the rows.
# Detect partition boundaries (on the fly as well)
# Given partition boundaries, compute the corresponding function value, for each row.

Two-pass window functions:
* percent_rank (test implementation needs review)
* cume_dist (see ~~MDEV-9746~~)
* ntile
* nth_value (this is frame-based)
* last_value (this is frame-based)
these require two passes over partition to compute. The extra information that we require is the number of rows in the partition. In order to find the number of rows, we must first detect partition boundaries and then we can compute the number of rows per partition.

Two-cursor window functions:
* lag
* lead
these require an additional cursor that is traveling n rows ahead/behind the current_row cursor.

Key pieces for implementing this:
* Make use of the filesort interface, since we do not need temporary tables for these functions.
* It is a very similar use case to the GROUP BY statement. An important task is figuring out partition boundaries. The classes used for computing GROUP BY, might prove useful.

Rasmus Johansson (Inactive) made changes - 2016-03-23 10:52

Sprint

10.2.0-5, 10.2.0-6, 10.2.0-7, 10.2.0-8 [ 32, 37, 39, 41 ]

10.2.0-5, 10.2.0-6, 10.2.0-7, 10.2.0-8, 10.2.0-9 [ 32, 37, 39, 41, 43 ]

Rasmus Johansson (Inactive) made changes - 2016-03-23 10:52

Rank

Ranked higher

Rasmus Johansson (Inactive) made changes - 2016-03-23 10:54

Sprint

10.2.0-5, 10.2.0-6, 10.2.0-7, 10.2.0-8, 10.2.0-9 [ 32, 37, 39, 41, 43 ]

10.2.0-5, 10.2.0-6, 10.2.0-7, 10.2.0-8 [ 32, 37, 39, 41 ]

Rasmus Johansson (Inactive) made changes - 2016-03-23 10:54

Rank

Ranked higher

Sergei Petrunia made changes - 2016-04-10 18:39

Description

There is a class of window functions that can be computed on the fly, after ordering. These functions are:
* -rank- (DONE)
* -dense_rank- (DONE)
* -row_number- (DONE)
* first_value (this is frame-based)

These functions can be computed directly. In order to do this we must:
# Sort the rows.
# Detect partition boundaries (on the fly as well)
# Given partition boundaries, compute the corresponding function value, for each row.

Two-pass window functions:
* percent_rank (test implementation needs review)
* cume_dist (see ~~MDEV-9746~~)
* ntile
* nth_value (this is frame-based)
* last_value (this is frame-based)
these require two passes over partition to compute. The extra information that we require is the number of rows in the partition. In order to find the number of rows, we must first detect partition boundaries and then we can compute the number of rows per partition.

Two-cursor window functions:
* lag
* lead
these require an additional cursor that is traveling n rows ahead/behind the current_row cursor.

Key pieces for implementing this:
* Make use of the filesort interface, since we do not need temporary tables for these functions.
* It is a very similar use case to the GROUP BY statement. An important task is figuring out partition boundaries. The classes used for computing GROUP BY, might prove useful.

There is a class of window functions that can be computed on the fly, after ordering. These functions are:
* -rank- (DONE)
* -dense_rank- (DONE)
* -row_number- (DONE)
* first_value (this is frame-based)

These functions can be computed directly. In order to do this we must:
# Sort the rows.
# Detect partition boundaries (on the fly as well)
# Given partition boundaries, compute the corresponding function value, for each row.

Two-pass window functions:
* -percent_rank (test implementation needs review)-
* -cume_dist (see ~~MDEV-9746~~)-
* -ntile-
* nth_value (this is frame-based)
* last_value (this is frame-based)
these require two passes over partition to compute. The extra information that we require is the number of rows in the partition. In order to find the number of rows, we must first detect partition boundaries and then we can compute the number of rows per partition.

Two-cursor window functions:
* lag
* lead
these require an additional cursor that is traveling n rows ahead/behind the current_row cursor.

Key pieces for implementing this:
* Make use of the filesort interface, since we do not need temporary tables for these functions.
* It is a very similar use case to the GROUP BY statement. An important task is figuring out partition boundaries. The classes used for computing GROUP BY, might prove useful.

Elena Stepanova made changes - 2016-04-14 20:16

Component/s

Optimizer - Window functions [ 13502 ]

Rasmus Johansson (Inactive) made changes - 2016-08-24 10:10

Sprint

10.2.0-5, 10.2.0-6, 10.2.0-7, 10.2.0-8 [ 32, 37, 39, 41 ]

10.2.0-5, 10.2.0-6, 10.2.0-7, 10.2.0-8, 10.2.2-1 [ 32, 37, 39, 41, 89 ]

Rasmus Johansson (Inactive) made changes - 2016-08-24 10:10

Rank

Ranked higher

Rasmus Johansson (Inactive) made changes - 2016-08-31 07:26

Sprint

10.2.0-5, 10.2.0-6, 10.2.0-7, 10.2.0-8, 10.2.2-1 [ 32, 37, 39, 41, 89 ]

10.2.0-5, 10.2.0-6, 10.2.0-7, 10.2.0-8, 10.2.2-1, 10.2.2-2 [ 32, 37, 39, 41, 89, 92 ]

Rasmus Johansson (Inactive) made changes - 2016-08-31 07:26

Rank

Ranked lower

Rasmus Johansson (Inactive) made changes - 2016-09-08 11:00

Sprint

10.2.0-5, 10.2.0-6, 10.2.0-7, 10.2.0-8, 10.2.2-1, 10.2.2-2 [ 32, 37, 39, 41, 89, 92 ]

10.2.0-5, 10.2.0-6, 10.2.0-7, 10.2.0-8, 10.2.2-1, 10.2.2-2, 10.2.2-3 [ 32, 37, 39, 41, 89, 92, 94 ]

Rasmus Johansson (Inactive) made changes - 2016-09-14 13:17

Sprint

10.2.0-5, 10.2.0-6, 10.2.0-7, 10.2.0-8, 10.2.2-1, 10.2.2-2, 10.2.2-3 [ 32, 37, 39, 41, 89, 92, 94 ]

10.2.0-5, 10.2.0-6, 10.2.0-7, 10.2.0-8, 10.2.2-1, 10.2.2-2, 10.2.2-3, 10.2.2-4 [ 32, 37, 39, 41, 89, 92, 94, 96 ]

Vicențiu Ciorbaru made changes - 2016-09-21 15:02

Description

There is a class of window functions that can be computed on the fly, after ordering. These functions are:
* -rank- (DONE)
* -dense_rank- (DONE)
* -row_number- (DONE)
* first_value (this is frame-based)

These functions can be computed directly. In order to do this we must:
# Sort the rows.
# Detect partition boundaries (on the fly as well)
# Given partition boundaries, compute the corresponding function value, for each row.

Two-pass window functions:
* -percent_rank (test implementation needs review)-
* -cume_dist (see ~~MDEV-9746~~)-
* -ntile-
* nth_value (this is frame-based)
* last_value (this is frame-based)
these require two passes over partition to compute. The extra information that we require is the number of rows in the partition. In order to find the number of rows, we must first detect partition boundaries and then we can compute the number of rows per partition.

Two-cursor window functions:
* lag
* lead
these require an additional cursor that is traveling n rows ahead/behind the current_row cursor.

Key pieces for implementing this:
* Make use of the filesort interface, since we do not need temporary tables for these functions.
* It is a very similar use case to the GROUP BY statement. An important task is figuring out partition boundaries. The classes used for computing GROUP BY, might prove useful.

There is a class of window functions that can be computed on the fly, after ordering. These functions are:
* -rank- (DONE)
* -dense_rank- (DONE)
* -row_number- (DONE)
* -first_value (this is frame-based)- (DONE)

These functions can be computed directly. In order to do this we must:
# Sort the rows.
# Detect partition boundaries (on the fly as well)
# Given partition boundaries, compute the corresponding function value, for each row.

Two-pass window functions:
* -percent_rank (test implementation needs review)-
* -cume_dist (see ~~MDEV-9746~~)-
* -ntile-
* nth_value (this is frame-based)
* last_value (this is frame-based)
these require two passes over partition to compute. The extra information that we require is the number of rows in the partition. In order to find the number of rows, we must first detect partition boundaries and then we can compute the number of rows per partition.

Two-cursor window functions:
* lag
* lead
these require an additional cursor that is traveling n rows ahead/behind the current_row cursor.

Key pieces for implementing this:
* Make use of the filesort interface, since we do not need temporary tables for these functions.
* It is a very similar use case to the GROUP BY statement. An important task is figuring out partition boundaries. The classes used for computing GROUP BY, might prove useful.

Vicențiu Ciorbaru made changes - 2016-09-21 15:03

Description

There is a class of window functions that can be computed on the fly, after ordering. These functions are:
* -rank- (DONE)
* -dense_rank- (DONE)
* -row_number- (DONE)
* -first_value (this is frame-based)- (DONE)

These functions can be computed directly. In order to do this we must:
# Sort the rows.
# Detect partition boundaries (on the fly as well)
# Given partition boundaries, compute the corresponding function value, for each row.

Two-pass window functions:
* -percent_rank (test implementation needs review)-
* -cume_dist (see ~~MDEV-9746~~)-
* -ntile-
* nth_value (this is frame-based)
* last_value (this is frame-based)
these require two passes over partition to compute. The extra information that we require is the number of rows in the partition. In order to find the number of rows, we must first detect partition boundaries and then we can compute the number of rows per partition.

Two-cursor window functions:
* lag
* lead
these require an additional cursor that is traveling n rows ahead/behind the current_row cursor.

Key pieces for implementing this:
* Make use of the filesort interface, since we do not need temporary tables for these functions.
* It is a very similar use case to the GROUP BY statement. An important task is figuring out partition boundaries. The classes used for computing GROUP BY, might prove useful.

There is a class of window functions that can be computed on the fly, after ordering. These functions are:
* -rank- (DONE)
* -dense_rank- (DONE)
* -row_number- (DONE)
* -first_value (this is frame-based)- (DONE)

These functions can be computed directly. In order to do this we must:
# Sort the rows.
# Detect partition boundaries (on the fly as well)
# Given partition boundaries, compute the corresponding function value, for each row.

Two-pass window functions:
* -percent_rank (test implementation needs review)-
* -cume_dist (see ~~MDEV-9746~~)-
* -ntile-
* nth_value (this is frame-based)
* -last_value (this is frame-based)- (DONE)
these require two passes over partition to compute. The extra information that we require is the number of rows in the partition. In order to find the number of rows, we must first detect partition boundaries and then we can compute the number of rows per partition.

Two-cursor window functions:
* lag
* lead
these require an additional cursor that is traveling n rows ahead/behind the current_row cursor.

Key pieces for implementing this:
* Make use of the filesort interface, since we do not need temporary tables for these functions.
* It is a very similar use case to the GROUP BY statement. An important task is figuring out partition boundaries. The classes used for computing GROUP BY, might prove useful.

Vicențiu Ciorbaru made changes - 2016-09-24 17:15

Fix Version/s		10.2.2 [ 22013 ]
Fix Version/s	10.2 [ 14601 ]
Resolution		Fixed [ 1 ]
Status	In Progress [ 3 ]	Closed [ 6 ]

Vicențiu Ciorbaru made changes - 2016-09-24 17:17

Description

There is a class of window functions that can be computed on the fly, after ordering. These functions are:
* -rank- (DONE)
* -dense_rank- (DONE)
* -row_number- (DONE)
* -first_value (this is frame-based)- (DONE)

These functions can be computed directly. In order to do this we must:
# Sort the rows.
# Detect partition boundaries (on the fly as well)
# Given partition boundaries, compute the corresponding function value, for each row.

Two-pass window functions:
* -percent_rank (test implementation needs review)-
* -cume_dist (see ~~MDEV-9746~~)-
* -ntile-
* nth_value (this is frame-based)
* -last_value (this is frame-based)- (DONE)
these require two passes over partition to compute. The extra information that we require is the number of rows in the partition. In order to find the number of rows, we must first detect partition boundaries and then we can compute the number of rows per partition.

Two-cursor window functions:
* lag
* lead
these require an additional cursor that is traveling n rows ahead/behind the current_row cursor.

Key pieces for implementing this:
* Make use of the filesort interface, since we do not need temporary tables for these functions.
* It is a very similar use case to the GROUP BY statement. An important task is figuring out partition boundaries. The classes used for computing GROUP BY, might prove useful.

There is a class of window functions that can be computed on the fly, after ordering. These functions are:
* -rank- (DONE)
* -dense_rank- (DONE)
* -row_number- (DONE)
* -first_value (this is frame-based)- (DONE)

These functions can be computed directly. In order to do this we must:
# Sort the rows.
# Detect partition boundaries (on the fly as well)
# Given partition boundaries, compute the corresponding function value, for each row.

Two-pass window functions:
* -percent_rank (test implementation needs review)-
* -cume_dist (see ~~MDEV-9746~~)-
* -ntile-
* -nth_value (this is frame-based)- (DONE)
* -last_value (this is frame-based)- (DONE)
these require two passes over partition to compute. The extra information that we require is the number of rows in the partition. In order to find the number of rows, we must first detect partition boundaries and then we can compute the number of rows per partition.

Two-cursor window functions:
* -lag- (DONE)
* -lead- (DONE)
these require an additional cursor that is traveling n rows ahead/behind the current_row cursor.

Key pieces for implementing this:
* Make use of the filesort interface, since we do not need temporary tables for these functions.
* It is a very similar use case to the GROUP BY statement. An important task is figuring out partition boundaries. The classes used for computing GROUP BY, might prove useful.

Juan Telleria added a comment - 2017-04-06 08:07 - edited

¿Is possible to use Window Functions over columns which contain Aggregate Functions (For example: count(Column_Name)?

SELECT
count(Column_Name) AS MyCount,
PERCENT_RANK() OVER (MyCount)
FROM
myTable;

Juan Telleria added a comment - 2017-04-06 08:07 - edited ¿Is possible to use Window Functions over columns which contain Aggregate Functions (For example: count(Column_Name)? SELECT count(Column_Name) AS MyCount, PERCENT_RANK() OVER (MyCount) FROM myTable;

Sergei Golubchik made changes - 2021-12-06 21:22

Workflow

MariaDB v3 [ 64832 ]

MariaDB v4 [ 132604 ]

MariaDB Server

Simple window functions

Details

Description

Attachments

Issue Links

Sub-Tasks

Activity

People

Dates

Git Integration