[MCOL-6033] Increase cpimport batch size up to the current max size of an Extent that is 8 000 000 - Jira

XML

Word

Printable

Details

Type: New Feature
Status: Closed (View Workflow)
Priority: Major
Resolution: Done
Affects Version/s: None
Fix Version/s: 23.10.6
Component/s: None
Labels:
None

Epic Link:
Performance
Sprint:
2025-6, 2025-7

Description

cpimport is a standalone binary to do batch data ingestion. It has multiple modes of operation. Mode one is when cpimport reads data either locally or from S3, breaks data into batches and tries to distribute them equally across all dbroots(units of storage layer available in the cluster that are described in /etc/columnstore/Columnstore).
There is a `-q` parameter of cpimport that controls the size of the batch. As of now the max size of the batch is hardcoded to 10 000 that is a way smaller than a logical storage unit size called Extent that is 8 000 000. If a user supplies a pre-sorted data the orientation of data will be mostly lost. The max batch size must be equal to the size of the Extent.

Also recommend a change to documentation about presorted ranges and using less than 8,000,000 row breaking up extent ranges

Attachments

Issue Links

causes

MCOL-6109 Parametrize batch size for LOAD DATA INFILE and INSERT..SELECT

Open

mentioned on

Commit - chore(cpimport): MCOL-6033 Change batch max size (#3652)

Activity

People

Assignee:: Kristina Pavlova

Reporter:: Roman

Assigned for Review:: Roman

Assigned for Testing:: Aleksei Bukhalov

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2025-06-02 15:23

Updated:: 2025-11-05 17:06

Resolved:: 2025-08-01 13:34

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.