Details
-
Task
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Won't Do
-
None
-
None
Description
Definition
INPUT t: Table to be sorted
INPUT p: Table to hold projection
INPUT column_list: define the list of columns to order the table
INPUT target_partition_rows: define the number of rows in reduction job
target_partition_rows can be compute via sort_buffer_size/ unpacked record size
A) Found a range partitioning definition to divide the table in memory processable chunks
A-1) Check for cardinality of the distribution at every level
Iterate column in column list
SELECT count/count(distinct c1) p_rows FROM t
....
if p_rows < target_partition_rows
– this depth is valid candidate for chunk definition
else
– continue with next column
SELECT count/count(distinct concat(c1,c2)) p_rows FROM t
A-2) check for distribution in each chunk a the previous column depth
CREATE range_meta SELECT c1, count rows FROM t GROUP BY by c1
SELECT max(rows) biggest_distribution from range_meta
If biggest distribution > target_partition_rows
CANCEL The projection or use a disk base storage engine for chunk sorting
A-3) Using widowing function on range_meta to found the max min of each chunk
OVER (PARTITION BY SUM(rows)/target_partition_rows)
B) Process each partition via job queue
https://github.com/adrpar/mysql_query_queue
CREATE TEMPORARY TABLE part SELECT * FROM t WHERE c1 BETWEEN r1 AND r2
INSERT INTO proj SELECT * FROM part ORDER by c1,c2...cx
Attachments
Issue Links
- blocks
-
MCOL-727 Make MCS projection aware
-
- Closed
-