One can use regular aggregate functions as window function. Something like
When the window frame moves, we have
- A position where rows enter the fame
- A position where rows leave the frame
- A position where the current row is.
That is, we need to maintain three positions.
Note that we're scanning the file by scanning filesort's result (a sequence of rowids). That is, these three positions are positions in filesort's result.
The difference from computing an aggregate function in GROUP BY context is the use of window frame. Rows enter the window frame (this is not new) and leave the window frame (this is new).
Aggregate functions have code to update the function value after a row has been added. There is no code to update the function value after a row has been removed. Some functions can update their value after a row has been removed. These are
SUM COUNT (need to check for NULL and then remove) AVG (this is SUM and COUNT) BIT_OR (just make every bit a counter) BIT_AND (can be reduced to BIT_OR) BIT_XOR (removing the value is the same as adding it another time)
I am not sure if it's possible to use removal
VAR_POP()(Implemented without removal so far) VARIANCE()(Implemented without removal so far) VAR_SAMP()(Implemented without removal so far)
These use a recursive formula to update the current value. It is possible reverse the computation and do the removal, but I am not sure whether this computation is numerically stable.
These functions are just SQRT() of their VARIANCE() counterpart:
STD()(Implemented without removal so far) STDDEV()(Implemented without removal so far) STDDEV_POP()(Implemented without removal so far) STDDEV_SAMP()(Implemented without removal so far)
The result is accumulated in String Item_func_group_concat::result. Generally, this allows for removal.
There is a problem with @@group_concat_max_len. When the length of the string exceeds this value,
then a warning is produced, and the new value IS NOT ADDED to the string. Thus, the information about the new value is lost, and we will not be able to produce a correct result after remove() frees space.
Idea from Igor: if we run out of space at the end of the buffer, start re-using free space at the front. So that the buffer is kind-of circular.
MIN()(Implemented without removal) MAX()(Implemented without removal)
These functions do not support removal. (If we remove the value that is equal to min, what should be the next value?). Possible options
1. Use a custom solution that works for just MIN and MAX (something like a priority queue that spills to a Unique object?)
2. Use a generic O(N^2) algorithm.