Details
-
New Feature
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Done
-
None
-
None
-
None
-
2025-6, 2025-7
Description
cpimport is a standalone binary to do batch data ingestion. It has multiple modes of operation. Mode one is when cpimport reads data either locally or from S3, breaks data into batches and tries to distribute them equally across all dbroots(units of storage layer available in the cluster that are described in /etc/columnstore/Columnstore).
There is a `-q` parameter of cpimport that controls the size of the batch. As of now the max size of the batch is hardcoded to 10 000 that is a way smaller than a logical storage unit size called Extent that is 8 000 000. If a user supplies a pre-sorted data the orientation of data will be mostly lost. The max batch size must be equal to the size of the Extent.
Also recommend a change to documentation about presorted ranges and using less than 8,000,000 row breaking up extent ranges
Attachments
Issue Links
- causes
-
MCOL-6109 Parametrize batch size for LOAD DATA INFILE and INSERT..SELECT
-
- Open
-
- mentioned on