Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-5013

Support Load data from AWS S3 : UDF : columnstore_info.load_from_s3

    XMLWordPrintable

Details

    • New Feature
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 6.2.3
    • 23.02.1
    • None
    • None
    • 2021-17, 2022-22

    Description

      Columnstore should support a plain SQL syntax for loading data directly from s3 buckets.

      Current implementation for 2208 is

      usage:

      AWS

      set columnstore_s3_key='<s3_key>';
      set columnstore_s3_secret='<s3_secret>';
      set columnstore_s3_region='region';
       
      CALL columnstore_info.load_from_s3("<bucket>", "<file_name>", "<db_name>", "<table_name>", "<terminated_by>",  "<enclosed_by>", "<escaped_by>");
       
      EXAMPLE
       
      CALL columnstore_info.load_from_s3("s3://columnstore-test", "data1.csv", "d1", "t1", ",", "", "" )
      

      or for Google Storage

      set columnstore_s3_key='GOOGXXXxxxx';
      set columnstore_s3_secret='XXXXXXXXXX;
      CALL columnstore_info.load_from_s3("gs://columnstore-test", "test.tbl", "test", "gg", "|", "", "");
      

      last three params are the same as cpimport -s , -E , -C

       terminated_by:  

      is the delimiter between column values. mandatory

       enclosed_by:

      Enclosed by character if field values are enclosed. optional . can be "" empty string

       escaped_by: 

      Escape character used in conjunction with 'enclosed by'.optional . can be "" empty string
      character, or as part of NULL escape sequence ('\N');
      default is '\'
      enclosed_by and escaped_by can be set blank to use defaults

      EXAMPLE

      CALL columnstore_info.load_from_s3("s3://avorovich2", "data1.csv", "d1", "t1", ",", "", "" )

      --------
      future options maybe (not implemented in 2208)

      mariadb> LOAD DATA S3 FROM 's3://blah.blah/mydata.dat' into table xyz FIELDS TERMINATED BY '|';
      

      This command should invoke the backend columnstore api and an s3 client to stream data directly from a remote source into cpimport.

      This gives us a rapid data ingestion technique that is compatible with SkySQL and provides UAC (Unlike our previous utility - mcsimport).

      Attachments

        1. create_table.sql
          0.1 kB
          Leonid Fedorov
        2. hangs-though-debug.log-shows-failed.png
          231 kB
          Allen Herrera
        3. test_data.csv
          0.0 kB
          Leonid Fedorov

        Issue Links

          Activity

            People

              leonid.fedorov Leonid Fedorov
              toddstoffel Todd Stoffel (Inactive)
              Alan Mologorsky Alan Mologorsky
              Daniel Lee Daniel Lee (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.