Details

    • Task
    • Status: Open (View Workflow)
    • Minor
    • Resolution: Unresolved
    • None
    • None

    Description

      The goal of this task is to use non-covering indexes for efficient ORDER BY ... LIMIT.
      Consider the query:

      select * from t1 order by c_1, ... C_k, c_k+1, ..., c_m limit L;

      In certain cases there may be an index that has a prefix that coincides with a prefix
      of the ORDER BY clause:

      create index i1 on t1 (c_1, ... c_k, [c_j...]);

      Where c_j are columns from t1 after c_k in order not coinciding
      with the GROUP BY clause.

      If the index is scanned it will produce groups of records with the same
      prefix (c_1 ... c_k). Given a limit L, the top L records sorted by the full
      GROUP BY clause are among all groups that fit inside L, plus the records in
      the last group where the limit L was reached. Let the total number of records
      of interest is L+G, where G is the number of remaining records in the last
      group after the limit L.

      Algorithm (v2):
      ----------------
      Repeat until L+G records are scanned:

      • Scan index i1, and put its keys into a buffer until it is full, or
        we have scanned the first L+G keys.
      • Sort the rowids of each key (this is the physical order of the
        corresponding records on disk).
      • Fetch the corresponding table records in physical order into a sort buffer SB.
      • Sort FB according to the GROUP BY fields (c_1, ... c_m)
      • If there are more keys to scan, flush SB to disk for subsequent sort-merge
        If there is more than one on-disk sorted buffers, merge them.
        Output the sorted rows.

      Implementation:
      ---------------
      = Execution
      The DS-MRR execution code already performs the step of scanning an index,
      ordering the rowids of the keys and fetching table data in physical order.
      In order to reuse DS-MRR for the above algorithm it is needed to:

      • Create a range access method for the range (-inf, +inf) for the common prefix
        between the chosen index and the GROUP BY clause. If the WHERE clause allows
        a range to be constructed, then reuse this range.
      • Modify DS-MRR so that it can stop scanning the index when a certain condition
        is met. This can be done by reusing the function pointer
        st_range_seq_if::skip_index_tuple which is called by
        Mrr_simple_index_reader::get_next()
      • The condition when DS-MRR stops is, when it read L keys, remember the L-th key.
        Stop reading keys once the next key is different from the L-th key.
        Notice that L is the number of produced records after applying the WHERE clause.
        There already exists such a counter, used by generic LIMIT execution. Reuse
        this counter.

      The filesort function can utilize a range scan to supply its input for sorting.
      Therefore once we make filesort use the modified DS-MRR scan, filesort will
      resort the fetched data and will take care of overflow on disk and merging the
      sorted chunks of data that were flushed to disk.

      = Applicability test and optimization

      • there must be an index that has a common prefix with the ORDER BY clause,
      • there is no suitable index that covers the whole ORDER BY clause,
      • LIMIT is sufficiently small

      Attachments

        Activity

          That algo looks like it would work properly, my only concern is we have to make sure that we ensure that if there are multiple values of X that are the same that we grab all of them so that the end resulting dataset is accurate.

          botanicvelious Matthew Lagoe (Inactive) added a comment - That algo looks like it would work properly, my only concern is we have to make sure that we ensure that if there are multiple values of X that are the same that we grab all of them so that the end resulting dataset is accurate.

          The previous two ideas how to use a an index that is a partial prefix of the ORDER BY are:

          Algorithm (v1):
          ----------------

          • create a bounded priority queue PQ with bound N
          • for each group of keys in IDX, with prefix (c_1, ..., c_k) such that
            the count of the first key in the group is <= N
          • fetch the record corresponding to that key, and put it in PQ, using the
            non-indexed columns (c_k+1 ... c_m) of ORDER BY as a key in the queue
          • once all keys in the group have been scanned, and their records inserted
            into PQ, output all records from PQ in order, until PQ is empty or a total
            of N records have been output.

          The above algorithm has the following properties:

          • PQ will get at most as big as the number of keys in the biggest group of keys
          • the first result will be output as early as we reach the end of the first
            group (no need to wait till all data is sorted)
          • there is just one pass over the index

          Original algorithm (v0):
          ------------------------

          • If index exists for column X and there is a limit on the query then use the index and sort the list, then return the first G values, continue past G until you hit a different value for X then the value that was at G
          • Use this smaller subset of data for then ordering the data based on Y
          • Order the data again based on Z
          • Return the sorted data like normal
          timour Timour Katchaounov (Inactive) added a comment - The previous two ideas how to use a an index that is a partial prefix of the ORDER BY are: Algorithm (v1): ---------------- create a bounded priority queue PQ with bound N for each group of keys in IDX, with prefix (c_1, ..., c_k) such that the count of the first key in the group is <= N fetch the record corresponding to that key, and put it in PQ, using the non-indexed columns (c_k+1 ... c_m) of ORDER BY as a key in the queue once all keys in the group have been scanned, and their records inserted into PQ, output all records from PQ in order, until PQ is empty or a total of N records have been output. The above algorithm has the following properties: PQ will get at most as big as the number of keys in the biggest group of keys the first result will be output as early as we reach the end of the first group (no need to wait till all data is sorted) there is just one pass over the index Original algorithm (v0): ------------------------ If index exists for column X and there is a limit on the query then use the index and sort the list, then return the first G values, continue past G until you hit a different value for X then the value that was at G Use this smaller subset of data for then ordering the data based on Y Order the data again based on Z Return the sorted data like normal

          People

            Unassigned Unassigned
            botanicvelious Matthew Lagoe (Inactive)
            Votes:
            6 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.