Status: Open (View Workflow)
The goal of this task is to use non-covering indexes for efficient ORDER BY ... LIMIT.
Consider the query:
select * from t1 order by c_1, ... C_k, c_k+1, ..., c_m limit L;
In certain cases there may be an index that has a prefix that coincides with a prefix
of the ORDER BY clause:
create index i1 on t1 (c_1, ... c_k, [c_j...]);
Where c_j are columns from t1 after c_k in order not coinciding
with the GROUP BY clause.
If the index is scanned it will produce groups of records with the same
prefix (c_1 ... c_k). Given a limit L, the top L records sorted by the full
GROUP BY clause are among all groups that fit inside L, plus the records in
the last group where the limit L was reached. Let the total number of records
of interest is L+G, where G is the number of remaining records in the last
group after the limit L.
Repeat until L+G records are scanned:
- Scan index i1, and put its keys into a buffer until it is full, or
we have scanned the first L+G keys.
- Sort the rowids of each key (this is the physical order of the
corresponding records on disk).
- Fetch the corresponding table records in physical order into a sort buffer SB.
- Sort FB according to the GROUP BY fields (c_1, ... c_m)
- If there are more keys to scan, flush SB to disk for subsequent sort-merge
If there is more than one on-disk sorted buffers, merge them.
Output the sorted rows.
The DS-MRR execution code already performs the step of scanning an index,
ordering the rowids of the keys and fetching table data in physical order.
In order to reuse DS-MRR for the above algorithm it is needed to:
- Create a range access method for the range (-inf, +inf) for the common prefix
between the chosen index and the GROUP BY clause. If the WHERE clause allows
a range to be constructed, then reuse this range.
- Modify DS-MRR so that it can stop scanning the index when a certain condition
is met. This can be done by reusing the function pointer
st_range_seq_if::skip_index_tuple which is called by
- The condition when DS-MRR stops is, when it read L keys, remember the L-th key.
Stop reading keys once the next key is different from the L-th key.
Notice that L is the number of produced records after applying the WHERE clause.
There already exists such a counter, used by generic LIMIT execution. Reuse
The filesort function can utilize a range scan to supply its input for sorting.
Therefore once we make filesort use the modified DS-MRR scan, filesort will
resort the fetched data and will take care of overflow on disk and merging the
sorted chunks of data that were flushed to disk.
= Applicability test and optimization
- there must be an index that has a common prefix with the ORDER BY clause,
- there is no suitable index that covers the whole ORDER BY clause,
- LIMIT is sufficiently small