Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-5688

Parallel CSV read leveraging Apache Arrow

    XMLWordPrintable

Details

    Description

      cpimport is a binary that ingests data into MCS in an efficient manner reducing ingest timings significantly whilst preserving transaction isolation levels.

      cpimport is relatively complex facility that reads data from local file/S3 parses it, converts and put into MCS-specific files.
      cpimport is unable to read a big-sized single CSV file from disk in parallel.
      Apache Arrow has a CSV read faciilty that can do parallel CSV read.
      The goal of the project is to replace an existing homebrew CSV parser implemented in cpimport with the one from Apache Arrow.

      Attachments

        Activity

          People

            leonid.fedorov Leonid Fedorov
            drrtuy Roman
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.