[MDEV-9676] RANGE-type frames for window functions Created: 2016-03-02  Updated: 2016-04-14  Resolved: 2016-03-14

Status: Closed
Project: MariaDB Server
Component/s: Optimizer - Window functions
Fix Version/s: N/A

Type: Task Priority: Major
Reporter: Sergei Petrunia Assignee: Sergei Petrunia
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
PartOf
is part of MDEV-6115 window functions as in the SQL standard Closed
Relates
relates to MDEV-9526 Compute Aggregate functions as window... Closed
relates to MDEV-9727 Window functions: datetime arithmetic... Open
Sprint: 10.2.0-8

 Description   

This is about supporting RANGE-type frames for window functions.

By support, I mean being able to provide Frame_xxxx range bound classes which are able to "follow" the current row. When the current row moves, range bounds move accordingly, calling sum_item->add() or sum_item->remove() for the rows that go in or out of the frame.

Bound types

  • UNBOUNDED FOLLOWING and UNBOUNDED PRECEDING have the same meaning as with ROWS-type frames.
  • CURRENT ROW has different meaning
  • n PRECEDING|FOLLOWING have different meanings

"RANGE ... n PRECEDING|FOLLOWING" bound

Location of the frame bound depends on the ordering parameter value of the current row.
We need to be able to do addition/subtraction on the current row value.

The declared type of SK shall be numeric, datetime, or interval.
The declared type of UVS shall be numeric if the declared type of SK is numeric;
otherwise, it shall be an interval type that may be added to or subtracted from the declared
type of SK according to the Syntax Rules of Subclause 6.30, “<datetime value expression>”,
and Subclause 6.32, “<interval value expression>”...

Q: is numeric just INT_RESULT in MySQL terms, or REAL_RESULT and DECIMAL_RESULT are allowed also?
A: yes, DECIMAL and DOUBLE are allowed. Anything that allows addition.

Q: what should we use for addition? (Items or add manually?)
A: We must use a different addition operation, depending on the datatypes of constants involved. So, it's better to use Items.

Q: what is "interval type" in MySQL codebase? Am I correct that date addition/subtraction is only used inside DATE_ADD() ?
A: Yes. DATE_ADD and DATE_SUB have date[time] INTERVALs as parameters. Intervals are not supported outside these functions.

RANGE ... CURRENT ROW bound

The standard draft says for bound #1:

If WFB1 specifies CURRENT ROW, then remove from WF all rows that are not peers of the current row and that precede the current row in the window ordering defined by WD.

That is, the cursor should stop as soon as it reaches the first peer of the current row (the first peer itself is not included).

For bound #2:

If WFB2 specifies CURRENT ROW, then remove from WF all rows following the current row in the ordering defined by WD that are not peers of the current row.

That is, the cursor should be ahead of the current row, and stop as soon as it reaches the first non-peer of the current row.

"Current row" problem

(this is orthoginal to CURRENT ROW frame bound) Consider

 SUM(col2) OVER (ORDER BY col1 RANGE BETWEEN 10 PRECEDING AND 5 FOLLOWING)

Suppose, we're moving to compute the sum function value for the next row.

1. We get the rowid for the next current_row
2. We need to read current_row
2.1 We compute top_bound_value = (current_row.col1-10) and this tells us where the first range bound should be.
2.2 We move the top bound to be at the last row that still has
row.col1 < top_bound_value, calling SUM->remove() for rows that go out of the frame.
2.3 We do the same for bottom_bound.
2.4 Now, we want to update the value for the current_row

The problem with operation 2.4 is that actions made in steps 2.2 and 2.3 have touched temp.table's handler object.
If we want to update the current row, we need to read it again.
We can do that by calling temp_table->file->rnd_pos() but that's one extra row read per each row.



 Comments   
Comment by Sergei Petrunia [ 2016-03-14 ]

A note about handling NULLs and "PRECEDING|FOLLOWING n".

The standard says about Bound#1 and PRECEDING:

If WFB1 specifies <window frame preceding> ...
If VSK (=current_row.sort_key) is not the null value, then...
If NULLS FIRST is specified or implied, then remove from WF all
rows R2 such that the value of SK in row R2 is the null value.

That is, for "RANGE BETWEEN n PRECEDING AND ...", and non-null
current_row.sort_key, rows with NULL sort keys are not included.

This is achieved automatically when we compute

current_row.sort_key - n_val

and compare that with current_cursor_row.sort_key. NULL is less than any
non-NULL value, so rows with NULL sort_key are never included.

what if current_row.sort_key IS NULL? The standard has a passage about

If VSK is the null value and if NULLS LAST ...

but we are not using NULLS LAST. I interpret this as "NULL values stay in the
frame when current_row.sort_key IS NULL".

This happens automatically. We compute (NULL - n_val)= NULL, then we compare
that to other rows, NULLs compare as equals during sorting, so all rows with
NULLs are included.

Comment by Sergei Petrunia [ 2016-03-14 ]

For Bound#1 and "FOLLOWING n" frame bound, it says:

If VSK (=current_row.sort_key) is not the null value, then:
a) If NULLS FIRST is specified or implied, then remove from WF all
rows R2 such that the value of SK in row R2 is the null value.

This again, is achieved automatically when we compute

current_row.sort_key + n_val

the result is a non-NULL value which is greater than NULL value, so NULLs are
excluded.

Comment by Sergei Petrunia [ 2016-03-14 ]

For window bound #2, it says:

If WFB2 specifies <window frame preceding>...
If VSK is the null value and if NULLS FIRST is specified or implied, then
remove from WF all rows R2 such that the value of SK in row R2 is not
the null value.

We compute

NULL - n_val

we get NULL as the range bound. This excludes all non-null rows from the frame.

The same is said for bound# and FOLLOWING n:

If VSK is the null value and if NULLS FIRST is specified or implied, then
remove from WF all rows R2 such that the value of SK in row R2 is not
the null value.

Comment by Sergei Petrunia [ 2016-03-14 ]

All expected functionality is in the feature tree.

Generated at Thu Feb 08 07:36:28 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.