Details
-
New Feature
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
6.2.3
-
None
-
None
-
2021-17, 2022-22
Description
Columnstore should support a plain SQL syntax for loading data directly from s3 buckets.
Current implementation for 2208 is
usage:
AWS
set columnstore_s3_key='<s3_key>'; |
set columnstore_s3_secret='<s3_secret>'; |
set columnstore_s3_region='region'; |
|
CALL columnstore_info.load_from_s3("<bucket>", "<file_name>", "<db_name>", "<table_name>", "<terminated_by>", "<enclosed_by>", "<escaped_by>"); |
|
EXAMPLE
|
|
CALL columnstore_info.load_from_s3("s3://columnstore-test", "data1.csv", "d1", "t1", ",", "", "" ) |
or for Google Storage
set columnstore_s3_key='GOOGXXXxxxx';
|
set columnstore_s3_secret='XXXXXXXXXX;
|
CALL columnstore_info.load_from_s3("gs://columnstore-test", "test.tbl", "test", "gg", "|", "", "");
|
last three params are the same as cpimport -s , -E , -C
terminated_by:
|
is the delimiter between column values. mandatory
enclosed_by:
|
Enclosed by character if field values are enclosed. optional . can be "" empty string
escaped_by:
|
Escape character used in conjunction with 'enclosed by'.optional . can be "" empty string
character, or as part of NULL escape sequence ('\N');
default is '\'
enclosed_by and escaped_by can be set blank to use defaults
EXAMPLE
CALL columnstore_info.load_from_s3("s3://avorovich2", "data1.csv", "d1", "t1", ",", "", "" )
--------
future options maybe (not implemented in 2208)
mariadb> LOAD DATA S3 FROM 's3://blah.blah/mydata.dat' into table xyz FIELDS TERMINATED BY '|'; |
This command should invoke the backend columnstore api and an s3 client to stream data directly from a remote source into cpimport.
This gives us a rapid data ingestion technique that is compatible with SkySQL and provides UAC (Unlike our previous utility - mcsimport).
Attachments
Issue Links
- duplicates
-
MCOL-2038 mcsimport load time is significantly slower than cpimport load time
- Closed
-
MCOL-2040 mcsimport load is executed with worst compression ratio and more used disk space than mcs cpimport
- Closed
-
MCOL-2080 mcsimport hangs towards mcs system with DBRoot problems
- Closed
-
MCOL-2226 Improve performance of mcsimport - make it multi threaded
- Closed
- includes
-
MCOL-5271 Google cloud data load
- Closed
- is blocked by
-
MDEV-28395 LOAD DATA transfer plugins
- Open
- is duplicated by
-
MCOL-5032 cpimport via UDAF(user defined function) mvp
- Closed
- relates to
-
MCOL-3928 Add Option To Read From a Lakehouse (Including Parquet/Arrow Support)
- Open
-
MCOL-5419 load from S3 | when columns dont match leads to background failure but session hangs forever
- Open
-
MCOL-5420 load from s3 | inserted count incorrect
- Open
-
MCOL-5506 columnstore_info.load_from_s3 returns: Error getting OID for table
- Closed
-
MXS-4618 Load data from S3
- Closed
-
MCOL-4139 Replace calpontsys schema name with new name
- Closed
-
MCOL-5506 columnstore_info.load_from_s3 returns: Error getting OID for table
- Closed
-
MCOL-5509 columnstore_info.load_from_s3 | Misleading Messages
- Closed
-
MCOL-5510 columnstore_info.load_from_s3 | Connection forever hangs if cpimport fails
- Closed
-
MXS-4653 Integrate LDI filter with ColumnStore bulk loading API
- Closed
- blocks
-
DOCS-3618 Loading...