Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6115

window functions as in the SQL standard

Details

    Description

      Spec draft

      Place in the optimizer

      Window functions are evaluated on the result set after the WHERE, GROUP BY and HAVING have been applied.

      • If we are operating after a temptable-based grouping operation, we can read its output (and apply HAVING on the fly)
      • if we are operating after a Join operation, or non-temptable based grouping, we will need to store the output of previous operation in a temptable and then work from there.

      Basic idea about operation

      Let's start with one window function:

      func_name(arg) OVER (PARTITION BY part_expr  ORDER BY order_expr  $window_spec)

      Window function is evaluated for each row of the result set. It is a function of row's partition, we need to look at the rows of the resultset that are ordered according to order_expr. $window_spec specifies at which rows we need to look. Sometimes it's certain preceding rows, sometimes it's certain following rows (and sometimes it's both?)

      Execution can be done as follows:

      • sort resultset rows by (part_expr, order_expr).
      • scan each partition (either in reverse or forward order) and compute the value of window function on the fly.

      Unresolved questions:

      • How to handle the context. If a window function depends on preceding and following rows, how do we keep track of those?
      • Whether we should use files or temp.tables for storing the sorted resultset (we will always have to store it, because filesort doesn't support pull-based reading of its resultset)

      Multiple window functions

      A query may use multiple window functions which use different PARTITION BY / ORDER BY expressions:

      SELECT 
         window_func1 OVER (PARTITION BY part_expr1 ORDER BY order_expr1)
         window_func2 OVER (PARTITION BY part_expr2 ORDER BY order_expr2)
      FROM ... WHERE ... etc

      The query should be evaluated as follows:

      • sort the result according to part_expr1/order_expr1, compute values of window_func1
      • sort the result according to part_expr2/order_expr2, compute values of window_func2

      Links

      Attachments

        Issue Links

          Activity

            Sergey,

            create_sort_index() creates a special index. After it has been created you can read records one by one in the order that you passed as a parameter to create_sort_index(). The records are read with a special read function.
            When reading records you can catch the end of the last group. To do this you use the values of the group fields in a special buffer.
            You can see all this for any test case that uses preliminary sorting for the group by operation.

            igor Igor Babaev (Inactive) added a comment - Sergey, create_sort_index() creates a special index. After it has been created you can read records one by one in the order that you passed as a parameter to create_sort_index(). The records are read with a special read function. When reading records you can catch the end of the last group. To do this you use the values of the group fields in a special buffer. You can see all this for any test case that uses preliminary sorting for the group by operation.

            Filed

            • MDEV-9526: Compute Aggregate functions as window functions
            • MDEV-9525: Window functions: Support for aggregate_func(DISTINCT ...) OVER (...)
            psergei Sergei Petrunia added a comment - Filed MDEV-9526 : Compute Aggregate functions as window functions MDEV-9525 : Window functions: Support for aggregate_func(DISTINCT ...) OVER (...)

            My recollection of the discussion on the last call:

            1. High-level and low-level cursors

            Igor> We need multiple level of cursors for reading the file in order.

            • Low-level cursor will simply read rowids
            • Higher-level cursor will actually fetch the table row.
              This will be useful to doing other changes, read on.

            2. De-couple cursor movement from action

            Igor> currently, classes derived from Frame_cursor directly make calls to Item_sum->add|remove. I want cursor movement and action separated.
            SergeiP> Don't understand the point of separation. There is only one action, no plans for other actions.

            3. Use of row numbers for partition bound detection.

            Igor> it is possible to use row number to tell when partition bound ends.
            We have

            • frame start
            • current row
            • frame end

            onle of these cursors is always ahead of the other two. The idea is:

            • let all cursors remember how many rows they have scanned.
            • Let the cursor that is always ahead remember row_number of the partition bound when it sees it.
            • this allows two other cursors to avoid doing PARTITION BY field comparisons.
            psergei Sergei Petrunia added a comment - My recollection of the discussion on the last call: 1. High-level and low-level cursors Igor> We need multiple level of cursors for reading the file in order. Low-level cursor will simply read rowids Higher-level cursor will actually fetch the table row. This will be useful to doing other changes, read on. 2. De-couple cursor movement from action Igor> currently, classes derived from Frame_cursor directly make calls to Item_sum->add|remove. I want cursor movement and action separated. SergeiP> Don't understand the point of separation. There is only one action, no plans for other actions. 3. Use of row numbers for partition bound detection. Igor> it is possible to use row number to tell when partition bound ends. We have frame start current row frame end onle of these cursors is always ahead of the other two. The idea is: let all cursors remember how many rows they have scanned. Let the cursor that is always ahead remember row_number of the partition bound when it sees it. this allows two other cursors to avoid doing PARTITION BY field comparisons.

            Take-aways from yesterday discussion

            Cursors

            It seems like the latest cursors are a step in the right direction. Need to continue

            Window frame exclusion

            SergeyP> SQL:2008 specifies that one can use "Window frame exclusion" for window functions. Our sql_yacc.yy supports it (see opt_window_frame_exclusion).
            Q: How to actually support this?
            Igor> No idea.
            SergeyP> will need to extend "streaming aggregates" to be "arbitrary insert/removal aggregates".

            (Btw, PostgreSQL doesn't support frame exclusion).

            Further tasks

            SergeiP

            • code cleanups as requested by Igor
            • Improve frame bound cursors
            • Support RANGE-type frame bound cursors

            Vicentiu

            • Finish CUME_DIST (as a model two-pass window function)
            • Implement AVG, BIT_OR, BIT_AND, BIT_XOR - aggregates that support streaming.

            Igor

            • "Sort" window frames so that we know which window functions can
              share the sorting (the standard actually requires this, for the purpose of
              handling non-determistic cases)
            psergei Sergei Petrunia added a comment - Take-aways from yesterday discussion Cursors It seems like the latest cursors are a step in the right direction. Need to continue Window frame exclusion SergeyP> SQL:2008 specifies that one can use "Window frame exclusion" for window functions. Our sql_yacc.yy supports it (see opt_window_frame_exclusion). Q: How to actually support this? Igor> No idea. SergeyP> will need to extend "streaming aggregates" to be "arbitrary insert/removal aggregates". (Btw, PostgreSQL doesn't support frame exclusion). Further tasks SergeiP code cleanups as requested by Igor Improve frame bound cursors Support RANGE-type frame bound cursors Vicentiu Finish CUME_DIST (as a model two-pass window function) Implement AVG, BIT_OR, BIT_AND, BIT_XOR - aggregates that support streaming. Igor "Sort" window frames so that we know which window functions can share the sorting (the standard actually requires this, for the purpose of handling non-determistic cases)

            Filed MDEV-9676: RANGE-type frames for window functions

            psergei Sergei Petrunia added a comment - Filed MDEV-9676 : RANGE-type frames for window functions

            People

              igor Igor Babaev (Inactive)
              serg Sergei Golubchik
              Votes:
              17 Vote for this issue
              Watchers:
              24 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.