[MDEV-4026] Order by speed optimization Created: 2013-01-13  Updated: 2015-10-30

Status: Open
Project: MariaDB Server
Component/s: None
Fix Version/s: None

Type: Task Priority: Minor
Reporter: Matthew Lagoe (Inactive) Assignee: Unassigned
Resolution: Unresolved Votes: 6
Labels: optimizer


 Description   

The goal of this task is to use non-covering indexes for efficient ORDER BY ... LIMIT.
Consider the query:

select * from t1 order by c_1, ... C_k, c_k+1, ..., c_m limit L;

In certain cases there may be an index that has a prefix that coincides with a prefix
of the ORDER BY clause:

create index i1 on t1 (c_1, ... c_k, [c_j...]);

Where c_j are columns from t1 after c_k in order not coinciding
with the GROUP BY clause.

If the index is scanned it will produce groups of records with the same
prefix (c_1 ... c_k). Given a limit L, the top L records sorted by the full
GROUP BY clause are among all groups that fit inside L, plus the records in
the last group where the limit L was reached. Let the total number of records
of interest is L+G, where G is the number of remaining records in the last
group after the limit L.

Algorithm (v2):
----------------
Repeat until L+G records are scanned:

  • Scan index i1, and put its keys into a buffer until it is full, or
    we have scanned the first L+G keys.
  • Sort the rowids of each key (this is the physical order of the
    corresponding records on disk).
  • Fetch the corresponding table records in physical order into a sort buffer SB.
  • Sort FB according to the GROUP BY fields (c_1, ... c_m)
  • If there are more keys to scan, flush SB to disk for subsequent sort-merge
    If there is more than one on-disk sorted buffers, merge them.
    Output the sorted rows.

Implementation:
---------------
= Execution
The DS-MRR execution code already performs the step of scanning an index,
ordering the rowids of the keys and fetching table data in physical order.
In order to reuse DS-MRR for the above algorithm it is needed to:

  • Create a range access method for the range (-inf, +inf) for the common prefix
    between the chosen index and the GROUP BY clause. If the WHERE clause allows
    a range to be constructed, then reuse this range.
  • Modify DS-MRR so that it can stop scanning the index when a certain condition
    is met. This can be done by reusing the function pointer
    st_range_seq_if::skip_index_tuple which is called by
    Mrr_simple_index_reader::get_next()
  • The condition when DS-MRR stops is, when it read L keys, remember the L-th key.
    Stop reading keys once the next key is different from the L-th key.
    Notice that L is the number of produced records after applying the WHERE clause.
    There already exists such a counter, used by generic LIMIT execution. Reuse
    this counter.

The filesort function can utilize a range scan to supply its input for sorting.
Therefore once we make filesort use the modified DS-MRR scan, filesort will
resort the fetched data and will take care of overflow on disk and merging the
sorted chunks of data that were flushed to disk.

= Applicability test and optimization

  • there must be an index that has a common prefix with the ORDER BY clause,
  • there is no suitable index that covers the whole ORDER BY clause,
  • LIMIT is sufficiently small


 Comments   
Comment by Matthew Lagoe (Inactive) [ 2013-02-06 ]

That algo looks like it would work properly, my only concern is we have to make sure that we ensure that if there are multiple values of X that are the same that we grab all of them so that the end resulting dataset is accurate.

Comment by Timour Katchaounov (Inactive) [ 2013-02-08 ]

The previous two ideas how to use a an index that is a partial prefix of the ORDER BY are:

Algorithm (v1):
----------------

  • create a bounded priority queue PQ with bound N
  • for each group of keys in IDX, with prefix (c_1, ..., c_k) such that
    the count of the first key in the group is <= N
  • fetch the record corresponding to that key, and put it in PQ, using the
    non-indexed columns (c_k+1 ... c_m) of ORDER BY as a key in the queue
  • once all keys in the group have been scanned, and their records inserted
    into PQ, output all records from PQ in order, until PQ is empty or a total
    of N records have been output.

The above algorithm has the following properties:

  • PQ will get at most as big as the number of keys in the biggest group of keys
  • the first result will be output as early as we reach the end of the first
    group (no need to wait till all data is sorted)
  • there is just one pass over the index

Original algorithm (v0):
------------------------

  • If index exists for column X and there is a limit on the query then use the index and sort the list, then return the first G values, continue past G until you hit a different value for X then the value that was at G
  • Use this smaller subset of data for then ordering the data based on Y
  • Order the data again based on Z
  • Return the sorted data like normal
Generated at Thu Feb 08 06:53:09 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.