As of 1.4.1 CS uses two-phase sorting. Here are the phases:
- Presort partial runs of data.
- Merge the presorted partial runs produced during the 1st phase.
Here is more detailed explanation of how sorting works as of 1.4.1
CS gets a portion of records data from previous steps of the query execution(RGData instance from the ring buffer of RGData-s) and produces a sorting run out of it using existing sorting class LimitedOrderBy. If the query contains LIMIT then we apply it at this phase. This allows to significantly reduce the data set cardinality. If the query contains LIMIT + OFFSET then CS builds a sorted run of the records that is up to LIMIT+OFFSET size. CS does this step in parallel dividing the whole data set into k runs where k is governed by a session variable - columnstore_orderby_threads. At this phase CS tries to preallocate memory in QUEUE_RESERVE_SIZE batches.
CS merges and sorts k presorted partial runs produced by a previous phase in a single thread. If the query contains DISTINCT keyword CS rebuilds a hash map to preserve uniqueness.
We want to make 2nd phase also parallel using range partitioning of the presorted runs produced by the 1st phase. After 1st phase finishes we know the distribution of the sorting key values thus can divide the thread key values run into regions - buckets. Every 2nd phase thread takes values from corresponding region buckets (contains the same values region) from every 1st phase sorted run. Then all 2nd phase threads sorts its runs in parallel. In the end we put the sorted regions in a requested order(ascending/descending) of the key values into output stream.
The sorting must also has on disk spill capability.