[MCOL-3928] Add Option To Read From a Lakehouse (Including Parquet/Arrow Support) Created: 2020-04-07  Updated: 2023-10-25

Status: Open
Project: MariaDB ColumnStore
Component/s: cpimport
Affects Version/s: 1.4
Fix Version/s: 23.10

Type: New Feature Priority: Major
Reporter: Todd Stoffel (Inactive) Assignee: Denis Khalikov
Resolution: Unresolved Votes: 1
Labels: SkySQL:Feature

Issue Links:
Relates
relates to MCOL-5505 Add support for Parquet file format i... In Testing
relates to MCOL-2209 mcsimport - parquet file format support Closed
relates to MCOL-5013 Support Load data from AWS S3 : UDF... Closed
Epic Link: Consolidate & Redevelop All Columnstore Tools (SDK, Adapters, Backup, Restore, mcsimport)

 Description   

Add the ability to connect to and read from a lakehouse (Hadoop, Databricks, Snowflake, etc) for import into ColumnStore.

Cpimport should be able to take a parquet (and later arrow) file and
1. read the schema
2. create a table
3. import the data



 Comments   
Comment by Saravana Krishnamurthy (Inactive) [ 2020-07-31 ]

Break this ticket into two:
1. Importing data from datalake: If the data is stored in Parquet or Avro, importing this data into columnstore data format using cpimport or mcsimport.
2. Querying data stored in data lake: If the data is exported/stored from external systems such as Kafka, we should be able to directly query this data stored in Parquet or Avro format from ColumnStore. Similar to the way Snowflake does today.

Generated at Thu Feb 08 02:46:30 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.