Details
-
New Feature
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
23.10.2
-
None
-
None
Description
MCS currently uses a 'map' of columnar data parts(Extents) called ExtentMap. Every segment file extent has a fixed size 8 000 000 columnar values. Given the fact the size is fixed the number of extents linearly depends on the size of data ingested.
To address this the number of values in an extent must become a variable value but a multiple of 8 000 000.
Here is the list of the sub-tasks for this project:
- despite the fact that configuration can set the size of the extent once initially for a cluster the number 8 000 000 is hardcoded in multiple places. One has to find and remove the hardcoded values.
- UDF functions to merge two partitions manually
- automation to be run periodically to merge extents automatically leveraging UDF functions