Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-1192

Load data with cpimport and colxml Job is significantly slower than forcing in parallel cpimport

    XMLWordPrintable

Details

    • New Feature
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Won't Do
    • 1.1.2
    • Icebox
    • cpimport
    • None
    • MCS run on AWS, Multi Server ColumnStore System UM1-PM1 ;guest OS RHEL 7.4

    Description

      Load data with cpimport and colxml Job is significantly slower than forcing in parallel cpimport

      it's observed that load 1TB data with cpimport mode m1 and colxml Job id
      is done nearly twice slower than
      loading tables with the same data in parallel simultaneously via script

      Actually when use cpimport with colxml job it's observed that tables are loaded one after another;
      While run cpimport with colxml job is customized with more usability trends
      it would be nice to have similar load rates to the exec time of forcing manually cpimport in parallel

      achieved results from 1TB data load:

      load method                                                                          load time
      cpimport mode  m1 with  colxml [1TB]:                                  4 hours 1 min 40 sec
      cpimport mode  m1 without colxml,  force parallel [1TB] :      2 hours 12 min 21 sec
      

      used scripts :

      Used script  to load with cpimport with and colxml  Job
      ${PCOLXML} $LOAD_DB -j${RAN}
      ${PCPIMPORT}  -m1 -j${RAN}
       
      Used script  to run multiple cpimport jobs simultaneously
      Loop over the tpc-ds tables : ${PCPIMPORT} -m1  ${LOAD_DB} ${i%.*}  insert-data-tables/data/${LOAD_DB}/${i}  &
       
      PCPIMPORT=/home/mariadb-user/mariadb/columnstore/bin/cpimport
      PCOLXML=/home/mariadb-user/mariadb/columnstore/bin/colxml
      
      

      Updated with last load time results with Scale Factor 1000 [1TB data], obtained on Intel Packet with optane drives

      load method load time
      cpimport mode m1 with colxml jobs 1 hours 41 min 29 sec
      cpimport mode m1, force parallel 0 hours 51 min 20 sec

      Attachments

        Activity

          People

            toddstoffel Todd Stoffel (Inactive)
            winstone Zdravelina Sokolovska (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.