Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-9848

Window functions: reuse sorting and/or scanning

Details

    Description

      The standard says this:

      In general, two <window function>s are computed independently, each one
      performing its own sort of its data, even if they use the same data and the same <sort specification list>
      ...
      Nevertheless, the user may desire that two <window function>s be computed using the same ordering
      ...
      Two <window function>s are computed using the same (possibly non-deterministic) window ordering of the
      rows if any of the following are true:

      • (3 long clauses)

      This is a good reason to implement an optimization:

      if window functions W1 and W2 use compatible sortings, 
        they must share the filesort() call
      

      Let's call the above *filesort sharing*.

      The second step would be to share the "scan through the filesort result and compute the window function" step. Sharing the scan between window functions ought to give a speedup, because

      • we can compute values of all window functions and call h->update_row once (instead of many times)
      • It is more cache-friendly to do things in one pass, not in many.

      A question: should the scan be shared across all window functions or only those that have identical PARTITION BY clause?

      Attachments

        Issue Links

          Activity

            psergei Sergei Petrunia created issue -
            psergei Sergei Petrunia made changes -
            Field Original Value New Value
            psergei Sergei Petrunia made changes -
            Description {quote}
            In general, two <window function>s are computed independently, each one
            performing its own sort of its data, even if they use the same data and the same <sort specification list>
            ...
            Nevertheless, the user may desire that two <window function>s be computed using the same ordering
            ...
            Two <window function>s are computed using the same (possibly non-deterministic) window ordering of the
            rows if any of the following are true:
            - (3 long clauses)
            {quote}

            The standard says this:
            {quote}
            In general, two <window function>s are computed independently, each one
            performing its own sort of its data, even if they use the same data and the same <sort specification list>
            ...
            Nevertheless, the user may desire that two <window function>s be computed using the same ordering
            ...
            Two <window function>s are computed using the same (possibly non-deterministic) window ordering of the
            rows if any of the following are true:
            - (3 long clauses)
            {quote}

            This is a good reason to implement an optimization:
            {noformat}
            if window functions W1 and W2 use compatible sortings,
              they must share the filesort() call
            {noformat}

            The second step would be to share the "scan through the filesort result and compute the window function" step. Sharing the scan between window functions ought to give a speedup, because
            * we can compute values of all window functions and call {{h->update_row}} once (instead of many times)
            * It is more cache-friendly to do things in one pass, not in many.

            A question: should the scan be shared across all window functions or only those that have identical PARTITION BY clause?
            psergei Sergei Petrunia made changes -
            Description The standard says this:
            {quote}
            In general, two <window function>s are computed independently, each one
            performing its own sort of its data, even if they use the same data and the same <sort specification list>
            ...
            Nevertheless, the user may desire that two <window function>s be computed using the same ordering
            ...
            Two <window function>s are computed using the same (possibly non-deterministic) window ordering of the
            rows if any of the following are true:
            - (3 long clauses)
            {quote}

            This is a good reason to implement an optimization:
            {noformat}
            if window functions W1 and W2 use compatible sortings,
              they must share the filesort() call
            {noformat}

            The second step would be to share the "scan through the filesort result and compute the window function" step. Sharing the scan between window functions ought to give a speedup, because
            * we can compute values of all window functions and call {{h->update_row}} once (instead of many times)
            * It is more cache-friendly to do things in one pass, not in many.

            A question: should the scan be shared across all window functions or only those that have identical PARTITION BY clause?
            The standard says this:
            {quote}
            In general, two <window function>s are computed independently, each one
            performing its own sort of its data, even if they use the same data and the same <sort specification list>
            ...
            Nevertheless, the user may desire that two <window function>s be computed using the same ordering
            ...
            Two <window function>s are computed using the same (possibly non-deterministic) window ordering of the
            rows if any of the following are true:
            - (3 long clauses)
            {quote}

            This is a good reason to implement an optimization:
            {noformat}
            if window functions W1 and W2 use compatible sortings,
              they must share the filesort() call
            {noformat}
            Let's call the above **filesort sharing**.

            The second step would be to share the "scan through the filesort result and compute the window function" step. Sharing the scan between window functions ought to give a speedup, because
            * we can compute values of all window functions and call {{h->update_row}} once (instead of many times)
            * It is more cache-friendly to do things in one pass, not in many.

            A question: should the scan be shared across all window functions or only those that have identical PARTITION BY clause?
            elenst Elena Stepanova made changes -
            Component/s Optimizer - Window functions [ 13502 ]
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 74724 ] MariaDB v4 [ 130458 ]

            People

              Unassigned Unassigned
              psergei Sergei Petrunia
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.