[MCOL-2038] mcsimport load time is significantly slower than cpimport load time Created: 2018-12-18  Updated: 2023-10-26  Resolved: 2022-05-04

Status: Closed
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: 1.2.2
Fix Version/s: Icebox

Type: Bug Priority: Major
Reporter: Zdravelina Sokolovska (Inactive) Assignee: Unassigned
Resolution: Won't Fix Votes: 0
Labels: None
Environment:

mcs single server; 64G memory; 8CPUs; CentOS7.5


Issue Links:
Duplicate
is duplicated by MCOL-5013 Support Load data from AWS S3 : UDF... Closed
Relates
relates to MCOL-2089 High CPU usage and slow performance a... Closed
Epic Link: Consolidate & Redevelop All Columnstore Tools (SDK, Adapters, Backup, Restore, mcsimport)

 Description   

mcsimport load time is significantly slower than cpimport load time
the greater the data volume to be imported the larger the delay in load time of mcsimport tool in comparison to the cpimport load time.

mcsimport was installed and executed locally on MCS in order to compare load time of the mcsimport tool to cpimport load time excluding the network delay

It would be expected not slower load time achieved by the mcsimport tool in comparison to
cpimport run in mode m1.

notes: tables were loaded one by one ;
raw data was stored locally on attached storage;
mcsimport is installed with mariadb-columnstore-api-cpp and mariadb-columnstore-tools packages;
cpimport comes with mariadb-columnstore installation
it's was run cpimport with mode m1;

used scripts:

PCPIMPORT=/usr/local/mariadb/columnstore/bin/cpimport
MCSIMPORT=/usr/local/mariadb/columnstore/tools/mcsimport/mcsimport
MCSIMPORTXML=/usr/local/mariadb/columnstore/etc/Columnstore.xml

 ${PCPIMPORT} -m1  ${LOAD_DB} ${i%.*}  insert-data-tables/data/${LOAD_DB}/${i}

 ${MCSIMPORT}    ${LOAD_DB}  ${tpc_ds_tbls[i]%.*} insert-data-tables/data/${LOAD_DB}/${tpc_ds_tbls[i]} -d "|" -c ${MCSIMPORTXML} 

load time results from loading 1G and 100G data:

Scale Factor Data Volume Load Time mcsimport Load Time cpimport
1 1G 0 hours 7 minutes 23 seconds 0 hours 1 minutes 4 seconds
100 100G 9 hours 5 minutes 16 seconds 0 hours 31 minutes 47 seconds

100M data, load times per table:

Table Name Row Count Load Time mcsimport [sec] Load Time cpimport [sec]
call_center 30 1.56082 1.16745
catalog_page 20400 1.53188 2.49096
catalog_returns 14404374 797.426 24.0058
catalog_sales 143997065 10590.8 342.157
customer 2000000 111.714 17.4019
customer_address 1000000 51.4268 12.0627
customer_demographics 1920800 30.6589 3.48936
date_dim 73049 7.2738 2.84914
household_demographics 7200 0.805241 0.424782
income_band 20 0.36258 0.303956
inventory 399330000 2162.55 581.433
item 204000 18.9072 7.44638
promotion 1000 1.87475 2.10844
reason 55 0.614517 0.674388
ship_mode 20 1.06706 1.14927
store 402 3.91851 4.05436
store_returns 28795080 1096.54 62.0299
store_sales 287997024 12569.7 637.249
time_dim 86400 3.56302 2.69203
warehouse 15 2.18175 2.94752
web_page 2040 1.4963 2.25162
web_returns 7197670 322.083 17.4413
web_sales 72001237 4934.32 174.369
web_site 24 3.64008 5.64348


 Comments   
Comment by Andrew Hutchings (Inactive) [ 2018-12-18 ]

Note to self: there are several bugs already open for this. I need to consolidate them.

Comment by Ben Thompson (Inactive) [ 2022-05-04 ]

Newer project reference MCOL-5013 to replace mcsimport

Generated at Thu Feb 08 02:33:15 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.