Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-4026

Order by speed optimization

    XMLWordPrintable

Details

    • Task
    • Status: Open (View Workflow)
    • Minor
    • Resolution: Unresolved
    • None
    • None

    Description

      The goal of this task is to use non-covering indexes for efficient ORDER BY ... LIMIT.
      Consider the query:

      select * from t1 order by c_1, ... C_k, c_k+1, ..., c_m limit L;

      In certain cases there may be an index that has a prefix that coincides with a prefix
      of the ORDER BY clause:

      create index i1 on t1 (c_1, ... c_k, [c_j...]);

      Where c_j are columns from t1 after c_k in order not coinciding
      with the GROUP BY clause.

      If the index is scanned it will produce groups of records with the same
      prefix (c_1 ... c_k). Given a limit L, the top L records sorted by the full
      GROUP BY clause are among all groups that fit inside L, plus the records in
      the last group where the limit L was reached. Let the total number of records
      of interest is L+G, where G is the number of remaining records in the last
      group after the limit L.

      Algorithm (v2):
      ----------------
      Repeat until L+G records are scanned:

      • Scan index i1, and put its keys into a buffer until it is full, or
        we have scanned the first L+G keys.
      • Sort the rowids of each key (this is the physical order of the
        corresponding records on disk).
      • Fetch the corresponding table records in physical order into a sort buffer SB.
      • Sort FB according to the GROUP BY fields (c_1, ... c_m)
      • If there are more keys to scan, flush SB to disk for subsequent sort-merge
        If there is more than one on-disk sorted buffers, merge them.
        Output the sorted rows.

      Implementation:
      ---------------
      = Execution
      The DS-MRR execution code already performs the step of scanning an index,
      ordering the rowids of the keys and fetching table data in physical order.
      In order to reuse DS-MRR for the above algorithm it is needed to:

      • Create a range access method for the range (-inf, +inf) for the common prefix
        between the chosen index and the GROUP BY clause. If the WHERE clause allows
        a range to be constructed, then reuse this range.
      • Modify DS-MRR so that it can stop scanning the index when a certain condition
        is met. This can be done by reusing the function pointer
        st_range_seq_if::skip_index_tuple which is called by
        Mrr_simple_index_reader::get_next()
      • The condition when DS-MRR stops is, when it read L keys, remember the L-th key.
        Stop reading keys once the next key is different from the L-th key.
        Notice that L is the number of produced records after applying the WHERE clause.
        There already exists such a counter, used by generic LIMIT execution. Reuse
        this counter.

      The filesort function can utilize a range scan to supply its input for sorting.
      Therefore once we make filesort use the modified DS-MRR scan, filesort will
      resort the fetched data and will take care of overflow on disk and merging the
      sorted chunks of data that were flushed to disk.

      = Applicability test and optimization

      • there must be an index that has a common prefix with the ORDER BY clause,
      • there is no suitable index that covers the whole ORDER BY clause,
      • LIMIT is sufficiently small

      Attachments

        Activity

          People

            Unassigned Unassigned
            botanicvelious Matthew Lagoe (Inactive)
            Votes:
            6 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.