Details
-
Bug
-
Status: Closed (View Workflow)
-
Minor
-
Resolution: Fixed
-
5.5.2
-
None
-
None
-
2021-4, 2021-5, 2021-6, 2021-7, 2021-8, 2021-9, 2021-10, 2021-11, 2021-12, 2021-13, 2021-14, 2021-15, 2021-16, 2021-17
Description
This first example uses the command line flags which can be found by running cpimport -h:
[root@ip-172-31-43-62 cpimport]# cpimport -d3 -e 1 -H s3.us-west-2.amazonaws.com -y FAKEAUTHKEY -K FAKEAUTHSECRET -t sample-columnstore-data -g us-west-2 bts flights all.csv -s ',' -E '"'
Debug level set to 3
|
Column delimiter : ,
|
ModuleType pm
|
InstallType 2
|
PM Count 1
|
CPI CmdLineArgs : /usr/bin/cpimport.bin -L /var/log/mariadb/columnstore/cpimport/ -e 1 -d 3 -s , -E " -P pm1-17231 -T SYSTEM -y AKIA4TJNISLRYEQX6IXL -K QJHhQ9/dHAav6UvFBEQ5QhyGgSqtD4fbyC5kL5YG -t sample-columnstore-data -g us-west-2 -H s3.us-west-2.amazonaws.com -u38ea2fef-adc0-4d82-b567-88dfadb788e4 bts flights all.csv
|
Locale = en_US.UTF-8
|
Debug level is set to 3
|
Column delimiter : ,
|
Enclosed by Character : "
|
Using table OID 3016 as the default JOB ID
|
Input file will be read from S3 Bucket : sample-columnstore-data, file/object : /var/log/mariadb/columnstore/data/bulk/tmpjob/3016_D20210303_T175823_S757723_Job_3016.xml
|
Job description file : /var/log/mariadb/columnstore/data/bulk/tmpjob/3016_D20210303_T175823_S757723_Job_3016.xml
|
Log file for this job: /var/log/mariadb/columnstore/cpimport/Job_3016.log
|
2021-03-03 17:58:23 (17231:140074827606144) INFO : successfully loaded job file /var/log/mariadb/columnstore/data/bulk/tmpjob/3016_D20210303_T175823_S757723_Job_3016.xml
|
2021-03-03 17:58:23 (17231:140074827606144) INFO : Job file loaded, run time for this step : 0.0347769 seconds
|
2021-03-03 17:58:23 (17231:140074827606144) INFO : PreProcessing check starts
|
2021-03-03 17:58:23 (17231:140074827606144) INFO : PreProcessing check completed
|
2021-03-03 17:58:23 (17231:140074827606144) INFO : preProcess completed, run time for this step : 0.180334 seconds
|
2021-03-03 17:58:23 (17231:140074827606144) INFO : No of Read Threads Spawned = 1
|
2021-03-03 17:58:23 (17231:140074827606144) INFO : No of Parse Threads Spawned = 3
|
2021-03-03 18:00:56 (17231:140073043490560) INFO : Dictionary cache flush: 483328, 483328, 491520, 491520, 499712, 499894, 475136, 481816, 507904, 508716, 516096, 516939, 524288, 525293, 532480, 533172, 540672, 541684,
|
2021-03-03 18:00:56 (17231:140073043490560) INFO : For table bts.flights: 38083735 rows processed and 38083735 rows inserted.
|
2021-03-03 18:00:56 (17231:140074827606144) INFO : Bulk load completed, total run time : 153.233 seconds
|
which takes 153.233 seconds.
The next example uses the method described here: https://mariadb.com/kb/en/columnstore-bulk-data-loading/#bulk-loading-from-aws-s3
[root@ip-172-31-43-62 cpimport]# aws s3 cp --quiet s3://sample-columnstore-data/all.csv - | cpimport -d 3 -e 1 bts flights -s ',' -E '"'
Debug level set to 3
|
Column delimiter : ,
|
ModuleType pm
|
InstallType 2
|
PM Count 1
|
CPI CmdLineArgs : /usr/bin/cpimport.bin -L /var/log/mariadb/columnstore/cpimport/ -e 1 -d 3 -s , -E " -P pm1-17281 -T SYSTEM -u9069b047-adf4-432f-87fe-62149c7302c7 bts flights
|
Locale = en_US.UTF-8
|
Debug level is set to 3
|
Column delimiter : ,
|
Enclosed by Character : "
|
Using table OID 3016 as the default JOB ID
|
Input file(s) will be read from : STDIN
|
Job description file : /var/log/mariadb/columnstore/data/bulk/tmpjob/3016_D20210303_T180157_S709597_Job_3016.xml
|
Log file for this job: /var/log/mariadb/columnstore/cpimport/Job_3016.log
|
2021-03-03 18:01:57 (17281:140276622887040) INFO : successfully loaded job file /var/log/mariadb/columnstore/data/bulk/tmpjob/3016_D20210303_T180157_S709597_Job_3016.xml
|
2021-03-03 18:01:57 (17281:140276622887040) INFO : Job file loaded, run time for this step : 0.0354359 seconds
|
2021-03-03 18:01:57 (17281:140276622887040) INFO : PreProcessing check starts
|
2021-03-03 18:01:58 (17281:140276622887040) INFO : PreProcessing check completed
|
2021-03-03 18:01:58 (17281:140276622887040) INFO : preProcess completed, run time for this step : 0.325933 seconds
|
2021-03-03 18:01:58 (17281:140276622887040) INFO : No of Read Threads Spawned = 1
|
2021-03-03 18:01:58 (17281:140275337512704) INFO : Reading input from STDIN to import into table bts.flights...
|
2021-03-03 18:01:58 (17281:140276622887040) INFO : No of Parse Threads Spawned = 3
|
2021-03-03 18:02:38 (17281:140275329120000) INFO : Dictionary cache flush: 483328, 699392, 874496, 1098752, 491520, 674816, 882688, 1123328, 500020, 683177, 899282, 1107259, 1275591, 714151, 897700, 1121671, 508919, 692684, 908058, 1091086, 517140, 717157, 916277, 1132055, 525585, 725868, 932899, 1140478, 533465, 732979, 924348, 1075005, 541975, 742210, 941112, 1083145,
|
2021-03-03 18:02:38 (17281:140275329120000) INFO : For table bts.flights: 38083735 rows processed and 38083735 rows inserted.
|
2021-03-03 18:02:39 (17281:140276622887040) INFO : Bulk load completed, total run time : 41.367 seconds
|
which takes 41.367 seconds.
We can see that Method 1 is almost 4x slower.