Details
-
New Feature
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Won't Do
-
1.1.2
-
None
-
MCS run on AWS, Multi Server ColumnStore System UM1-PM1 ;guest OS RHEL 7.4
Description
Load data with cpimport and colxml Job is significantly slower than forcing in parallel cpimport
it's observed that load 1TB data with cpimport mode m1 and colxml Job id
is done nearly twice slower than
loading tables with the same data in parallel simultaneously via script
Actually when use cpimport with colxml job it's observed that tables are loaded one after another;
While run cpimport with colxml job is customized with more usability trends
it would be nice to have similar load rates to the exec time of forcing manually cpimport in parallel
achieved results from 1TB data load:
load method load time
|
cpimport mode m1 with colxml [1TB]: 4 hours 1 min 40 sec
|
cpimport mode m1 without colxml, force parallel [1TB] : 2 hours 12 min 21 sec
|
used scripts :
Used script to load with cpimport with and colxml Job
|
${PCOLXML} $LOAD_DB -j${RAN}
|
${PCPIMPORT} -m1 -j${RAN}
|
|
|
Used script to run multiple cpimport jobs simultaneously
|
Loop over the tpc-ds tables : ${PCPIMPORT} -m1 ${LOAD_DB} ${i%.*} insert-data-tables/data/${LOAD_DB}/${i} &
|
|
|
PCPIMPORT=/home/mariadb-user/mariadb/columnstore/bin/cpimport
|
PCOLXML=/home/mariadb-user/mariadb/columnstore/bin/colxml
|
|
Updated with last load time results with Scale Factor 1000 [1TB data], obtained on Intel Packet with optane drives
| load method | load time |
|---|---|
| cpimport mode m1 with colxml jobs | 1 hours 41 min 29 sec |
| cpimport mode m1, force parallel | 0 hours 51 min 20 sec |