[MCOL-1192] Load data with cpimport and colxml Job is significantly slower than forcing in parallel cpimport Created: 2018-02-01  Updated: 2022-11-05  Resolved: 2022-11-05

Status: Closed
Project: MariaDB ColumnStore
Component/s: cpimport
Affects Version/s: 1.1.2
Fix Version/s: Icebox

Type: New Feature Priority: Major
Reporter: Zdravelina Sokolovska (Inactive) Assignee: Todd Stoffel (Inactive)
Resolution: Won't Do Votes: 0
Labels: None
Environment:

MCS run on AWS, Multi Server ColumnStore System UM1-PM1 ;guest OS RHEL 7.4



 Description   

Load data with cpimport and colxml Job is significantly slower than forcing in parallel cpimport

it's observed that load 1TB data with cpimport mode m1 and colxml Job id
is done nearly twice slower than
loading tables with the same data in parallel simultaneously via script

Actually when use cpimport with colxml job it's observed that tables are loaded one after another;
While run cpimport with colxml job is customized with more usability trends
it would be nice to have similar load rates to the exec time of forcing manually cpimport in parallel

achieved results from 1TB data load:

load method                                                                          load time
cpimport mode  m1 with  colxml [1TB]:                                  4 hours 1 min 40 sec
cpimport mode  m1 without colxml,  force parallel [1TB] :      2 hours 12 min 21 sec

used scripts :

Used script  to load with cpimport with and colxml  Job
${PCOLXML} $LOAD_DB -j${RAN}
${PCPIMPORT}  -m1 -j${RAN}
 
Used script  to run multiple cpimport jobs simultaneously
Loop over the tpc-ds tables : ${PCPIMPORT} -m1  ${LOAD_DB} ${i%.*}  insert-data-tables/data/${LOAD_DB}/${i}  &
 
PCPIMPORT=/home/mariadb-user/mariadb/columnstore/bin/cpimport
PCOLXML=/home/mariadb-user/mariadb/columnstore/bin/colxml

Updated with last load time results with Scale Factor 1000 [1TB data], obtained on Intel Packet with optane drives

load method load time
cpimport mode m1 with colxml jobs 1 hours 41 min 29 sec
cpimport mode m1, force parallel 0 hours 51 min 20 sec


 Comments   
Comment by Zdravelina Sokolovska (Inactive) [ 2018-07-02 ]

Updated: The slower load time with colxml job was observed again,
Scale Factor 1000 [1TB] on [PM1-PM1] Intel Packet Optane3x

load method load time
cpimport mode m1 with colxml job 1 hours 41 min 29 sec
cpimport mode m1, force parallel 0 hours 51 min 20 sec
Comment by Todd Stoffel (Inactive) [ 2022-11-05 ]

This item is being closed because it was well passed the expiration date with no activity. If you suspect this was done in error please create a new ticket.

Generated at Thu Feb 08 02:26:54 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.