Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-3514

Make cpimport read from data in S3 buckets

Details

    • New Feature
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • None
    • 1.4.0
    • None
    • None
    • 2019-06, 2020-1, 2020-2

    Description

      cpimport needs new options to allow it to read a source file from an Amazon S3 bucket.

      Attachments

        Activity

          Implementation details...

          New options for cpimport:

                  -y      S3 Authentication Key (for S3 imports)
                  -K      S3 Authentication Secret (for S3 imports)
                  -t      S3 Bucket (for S3 imports)
                  -H      S3 Hostname (for S3 imports, Amazon's S3 default)
                  -g      S3 Region (for S3 imports)
          

          The hostname only needs to be supplied if the S3 server is not Amazon's.

          It will then use the path/filename to retrieve the file from the S3 bucket into memory and apply it. You will need enough RAM spare to take the entire CSV file.

          LinuxJedi Andrew Hutchings (Inactive) added a comment - Implementation details... New options for cpimport: -y S3 Authentication Key (for S3 imports) -K S3 Authentication Secret (for S3 imports) -t S3 Bucket (for S3 imports) -H S3 Hostname (for S3 imports, Amazon's S3 default) -g S3 Region (for S3 imports) The hostname only needs to be supplied if the S3 server is not Amazon's. It will then use the path/filename to retrieve the file from the S3 bucket into memory and apply it. You will need enough RAM spare to take the entire CSV file.

          Build tested: 1.4.0-1

          [dlee@master centos7]$ cat gitversionInfo.txt
          engine commit:
          1f47534

          Running test on multi-node (1um2pm) returned an error

          /usr/local/mariadb/columnstore/bin/cpimport mytest lineitem lineitem.tbl -y [mykey] -K [mysecret] -t dleeqatest -g us-west-2

          2019-09-27 18:25:08 (9124) ERR : Could not open Input file lineitem.tbl

          It worked in single node stack:

          /usr/local/mariadb/columnstore/bin/cpimport mytest lineitem lineitem.tbl -y [mykey] -K [mysecret] -t dleeqatest -g us-west-2
          Locale is : C

          Using table OID 3017 as the default JOB ID
          Input file will be read from S3 Bucket : dleeqatest, file/object : /usr/local/mariadb/columnstore/data/bulk/tmpjob/3017_D20190927_T185039_S235758_Job_3017.xml
          Job description file : /usr/local/mariadb/columnstore/data/bulk/tmpjob/3017_D20190927_T185039_S235758_Job_3017.xml
          Log file for this job: /usr/local/mariadb/columnstore/data/bulk/log/Job_3017.log
          2019-09-27 18:50:39 (16701) INFO : successfully loaded job file /usr/local/mariadb/columnstore/data/bulk/tmpjob/3017_D20190927_T185039_S235758_Job_3017.xml
          2019-09-27 18:50:39 (16701) INFO : Job file loaded, run time for this step : 0.21343 seconds
          2019-09-27 18:50:39 (16701) INFO : PreProcessing check starts
          2019-09-27 18:50:55 (16701) INFO : PreProcessing check completed
          2019-09-27 18:50:55 (16701) INFO : preProcess completed, run time for this step : 15.8133 seconds
          2019-09-27 18:50:55 (16701) INFO : No of Read Threads Spawned = 1
          2019-09-27 18:50:55 (16701) INFO : No of Parse Threads Spawned = 3
          2019-09-27 18:50:56 (16701) INFO : For table mytest.lineitem: 6005 rows processed and 6005 rows inserted.
          2019-09-27 18:50:57 (16701) INFO : Bulk load completed, total run time : 18.0366 seconds

          [root@localhost ~]# mcsmysql mytest
          Reading table information for completion of table and column names
          You can turn off this feature to get a quicker startup with -A

          Welcome to the MariaDB monitor. Commands end with ; or \g.
          Your MariaDB connection id is 13
          Server version: 10.4.8-3-MariaDB-log Source distribution

          Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

          Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

          MariaDB [mytest]> select count from lineitem;
          ----------

          count

          ----------

          6005

          ----------
          1 row in set (0.119 sec)

          dleeyh Daniel Lee (Inactive) added a comment - Build tested: 1.4.0-1 [dlee@master centos7] $ cat gitversionInfo.txt engine commit: 1f47534 Running test on multi-node (1um2pm) returned an error /usr/local/mariadb/columnstore/bin/cpimport mytest lineitem lineitem.tbl -y [mykey] -K [mysecret] -t dleeqatest -g us-west-2 2019-09-27 18:25:08 (9124) ERR : Could not open Input file lineitem.tbl It worked in single node stack: /usr/local/mariadb/columnstore/bin/cpimport mytest lineitem lineitem.tbl -y [mykey] -K [mysecret] -t dleeqatest -g us-west-2 Locale is : C Using table OID 3017 as the default JOB ID Input file will be read from S3 Bucket : dleeqatest, file/object : /usr/local/mariadb/columnstore/data/bulk/tmpjob/3017_D20190927_T185039_S235758_Job_3017.xml Job description file : /usr/local/mariadb/columnstore/data/bulk/tmpjob/3017_D20190927_T185039_S235758_Job_3017.xml Log file for this job: /usr/local/mariadb/columnstore/data/bulk/log/Job_3017.log 2019-09-27 18:50:39 (16701) INFO : successfully loaded job file /usr/local/mariadb/columnstore/data/bulk/tmpjob/3017_D20190927_T185039_S235758_Job_3017.xml 2019-09-27 18:50:39 (16701) INFO : Job file loaded, run time for this step : 0.21343 seconds 2019-09-27 18:50:39 (16701) INFO : PreProcessing check starts 2019-09-27 18:50:55 (16701) INFO : PreProcessing check completed 2019-09-27 18:50:55 (16701) INFO : preProcess completed, run time for this step : 15.8133 seconds 2019-09-27 18:50:55 (16701) INFO : No of Read Threads Spawned = 1 2019-09-27 18:50:55 (16701) INFO : No of Parse Threads Spawned = 3 2019-09-27 18:50:56 (16701) INFO : For table mytest.lineitem: 6005 rows processed and 6005 rows inserted. 2019-09-27 18:50:57 (16701) INFO : Bulk load completed, total run time : 18.0366 seconds [root@localhost ~] # mcsmysql mytest Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 13 Server version: 10.4.8-3-MariaDB-log Source distribution Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [mytest] > select count from lineitem; ---------- count ---------- 6005 ---------- 1 row in set (0.119 sec)

          Verified sub-tasks. Closing this ticket now.

          dleeyh Daniel Lee (Inactive) added a comment - Verified sub-tasks. Closing this ticket now.

          People

            dleeyh Daniel Lee (Inactive)
            LinuxJedi Andrew Hutchings (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.