Details
-
New Feature
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
cpimport is a binary that ingests data into MCS in an efficient manner reducing ingest timings significantly whilst preserving transaction isolation levels.
cpimport is relatively complex facility that reads data from local file/S3 parses it, converts and put into MCS-specific files.
cpimport is unable to read a big-sized single CSV file from disk in parallel.
Apache Arrow has a CSV read faciilty that can do parallel CSV read.
The goal of the project is to replace an existing homebrew CSV parser implemented in cpimport with the one from Apache Arrow.