[MCOL-3758] Parallel sorting 2nd phase and on disk spill capability. Created: 2020-02-04 Updated: 2023-12-15 |
|
| Status: | Stalled |
| Project: | MariaDB ColumnStore |
| Component/s: | ExeMgr |
| Affects Version/s: | None |
| Fix Version/s: | 23.10 |
| Type: | New Feature | Priority: | Major |
| Reporter: | Roman | Assignee: | Roman |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | gsoc20 | ||
| Issue Links: |
|
||||||||
| Description |
|
As of 1.4.1 CS uses two-phase sorting. Here are the phases:
Here is more detailed explanation of how sorting works as of 1.4.1 CS gets a portion of records data from previous steps of the query execution(RGData instance from the ring buffer of RGData-s) and produces a sorting run out of it using existing sorting class LimitedOrderBy. If the query contains LIMIT then we apply it at this phase. This allows to significantly reduce the data set cardinality. If the query contains LIMIT + OFFSET then CS builds a sorted run of the records that is up to LIMIT+OFFSET size. CS does this step in parallel dividing the whole data set into k runs where k is governed by a session variable - columnstore_orderby_threads. At this phase CS tries to preallocate memory in QUEUE_RESERVE_SIZE batches. We want to make 2nd phase also parallel using range partitioning of the presorted runs produced by the 1st phase. After 1st phase finishes we know the distribution of the sorting key values thus can divide the thread key values run into regions - buckets. Every 2nd phase thread takes values from corresponding region buckets (contains the same values region) from every 1st phase sorted run. Then all 2nd phase threads sorts its runs in parallel. In the end we put the sorted regions in a requested order(ascending/descending) of the key values into output stream. The sorting must also has on disk spill capability. |
| Comments |
| Comment by wei wang [ 2020-03-03 ] |
|
Hello ,I want to solve this issue as GSoC project 2020. Could you please give me some advice on how to start on it? Thank you. |
| Comment by Roman [ 2020-03-04 ] |
|
Greetings Wei and thanks for your interest. I would suggest you to build a developer version of ColumnStore with parallel sorting unit test binary first. As I do my routing under Ubuntu 18 the procedure works the best for U18. 1) Clone MariaDB (MDB) server repo using the tag mariadb-10.5.1 [1] 1. git@github.com:MariaDB/server.git Let me know if you have any questions or problems with this step. |
| Comment by Vicențiu Ciorbaru [ 2020-03-23 ] |
|
shimurara I also recommend you send any questions directly to the maria-developers mailing list. You have a much higher chance of any questions getting noticed there. |
| Comment by JiraAutomate [ 2023-12-15 ] |
|
Automated message: |