[MDEV-9848] Window functions: reuse sorting and/or scanning - Jira

Details

Type: Task
Status: Open (View Workflow)
Priority: Major
Resolution: Unresolved
Fix Version/s: None
Component/s: Optimizer - Window functions
Labels:
None

Description

The standard says this:

In general, two <window function>s are computed independently, each one
performing its own sort of its data, even if they use the same data and the same <sort specification list>
...
Nevertheless, the user may desire that two <window function>s be computed using the same ordering
...
Two <window function>s are computed using the same (possibly non-deterministic) window ordering of the
rows if any of the following are true:

(3 long clauses)

This is a good reason to implement an optimization:

if window functions W1 and W2 use compatible sortings,

  they must share the filesort() call

Let's call the above *filesort sharing*.

The second step would be to share the "scan through the filesort result and compute the window function" step. Sharing the scan between window functions ought to give a speedup, because

we can compute values of all window functions and call h->update_row once (instead of many times)
It is more cache-friendly to do things in one pass, not in many.

A question: should the scan be shared across all window functions or only those that have identical PARTITION BY clause?

Attachments

Issue Links

is part of

MDEV-6115 window functions as in the SQL standard

Closed

Activity

Ascending order - Click to sort in descending order

Sergei Petrunia created issue - 2016-03-31 16:09

Sergei Petrunia made changes - 2016-03-31 16:09

Field	Original Value	New Value
Link		This issue is part of ~~MDEV-6115~~ [ ~~MDEV-6115~~ ]

Sergei Petrunia made changes - 2016-04-06 17:45

Description

{quote}
In general, two <window function>s are computed independently, each one
performing its own sort of its data, even if they use the same data and the same <sort specification list>
...
Nevertheless, the user may desire that two <window function>s be computed using the same ordering
...
Two <window function>s are computed using the same (possibly non-deterministic) window ordering of the
rows if any of the following are true:
- (3 long clauses)
{quote}

The standard says this:
{quote}
In general, two <window function>s are computed independently, each one
performing its own sort of its data, even if they use the same data and the same <sort specification list>
...
Nevertheless, the user may desire that two <window function>s be computed using the same ordering
...
Two <window function>s are computed using the same (possibly non-deterministic) window ordering of the
rows if any of the following are true:
- (3 long clauses)
{quote}

This is a good reason to implement an optimization:
{noformat}
if window functions W1 and W2 use compatible sortings,
they must share the filesort() call
{noformat}

The second step would be to share the "scan through the filesort result and compute the window function" step. Sharing the scan between window functions ought to give a speedup, because
* we can compute values of all window functions and call {{h->update_row}} once (instead of many times)
* It is more cache-friendly to do things in one pass, not in many.

A question: should the scan be shared across all window functions or only those that have identical PARTITION BY clause?

Sergei Petrunia made changes - 2016-04-06 17:46

Description

The standard says this:
{quote}
In general, two <window function>s are computed independently, each one
performing its own sort of its data, even if they use the same data and the same <sort specification list>
...
Nevertheless, the user may desire that two <window function>s be computed using the same ordering
...
Two <window function>s are computed using the same (possibly non-deterministic) window ordering of the
rows if any of the following are true:
- (3 long clauses)
{quote}

This is a good reason to implement an optimization:
{noformat}
if window functions W1 and W2 use compatible sortings,
they must share the filesort() call
{noformat}
Let's call the above **filesort sharing**.

The second step would be to share the "scan through the filesort result and compute the window function" step. Sharing the scan between window functions ought to give a speedup, because
* we can compute values of all window functions and call {{h->update_row}} once (instead of many times)
* It is more cache-friendly to do things in one pass, not in many.

A question: should the scan be shared across all window functions or only those that have identical PARTITION BY clause?

Elena Stepanova made changes - 2016-04-14 20:09

Component/s

Optimizer - Window functions [ 13502 ]

Sergei Golubchik made changes - 2021-12-06 21:20

Workflow

MariaDB v3 [ 74724 ]

MariaDB v4 [ 130458 ]

People

Assignee:: Unassigned

Reporter:: Sergei Petrunia

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2016-03-31 16:09

Updated:: 2016-04-14 20:09

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration