Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-2038

mcsimport load time is significantly slower than cpimport load time

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.2.2
    • Fix Version/s: Icebox
    • Component/s: mcsapi, mcsimport
    • Labels:
      None
    • Environment:
      mcs single server; 64G memory; 8CPUs; CentOS7.5

      Description

      mcsimport load time is significantly slower than cpimport load time
      the greater the data volume to be imported the larger the delay in load time of mcsimport tool in comparison to the cpimport load time.

      mcsimport was installed and executed locally on MCS in order to compare load time of the mcsimport tool to cpimport load time excluding the network delay

      It would be expected not slower load time achieved by the mcsimport tool in comparison to
      cpimport run in mode m1.

      notes: tables were loaded one by one ;
      raw data was stored locally on attached storage;
      mcsimport is installed with mariadb-columnstore-api-cpp and mariadb-columnstore-tools packages;
      cpimport comes with mariadb-columnstore installation
      it's was run cpimport with mode m1;

      used scripts:

      PCPIMPORT=/usr/local/mariadb/columnstore/bin/cpimport
      MCSIMPORT=/usr/local/mariadb/columnstore/tools/mcsimport/mcsimport
      MCSIMPORTXML=/usr/local/mariadb/columnstore/etc/Columnstore.xml
      

       ${PCPIMPORT} -m1  ${LOAD_DB} ${i%.*}  insert-data-tables/data/${LOAD_DB}/${i}
      

       ${MCSIMPORT}    ${LOAD_DB}  ${tpc_ds_tbls[i]%.*} insert-data-tables/data/${LOAD_DB}/${tpc_ds_tbls[i]} -d "|" -c ${MCSIMPORTXML} 
      

      load time results from loading 1G and 100G data:

      Scale Factor Data Volume Load Time mcsimport Load Time cpimport
      1 1G 0 hours 7 minutes 23 seconds 0 hours 1 minutes 4 seconds
      100 100G 9 hours 5 minutes 16 seconds 0 hours 31 minutes 47 seconds

      100M data, load times per table:

      Table Name Row Count Load Time mcsimport [sec] Load Time cpimport [sec]
      call_center 30 1.56082 1.16745
      catalog_page 20400 1.53188 2.49096
      catalog_returns 14404374 797.426 24.0058
      catalog_sales 143997065 10590.8 342.157
      customer 2000000 111.714 17.4019
      customer_address 1000000 51.4268 12.0627
      customer_demographics 1920800 30.6589 3.48936
      date_dim 73049 7.2738 2.84914
      household_demographics 7200 0.805241 0.424782
      income_band 20 0.36258 0.303956
      inventory 399330000 2162.55 581.433
      item 204000 18.9072 7.44638
      promotion 1000 1.87475 2.10844
      reason 55 0.614517 0.674388
      ship_mode 20 1.06706 1.14927
      store 402 3.91851 4.05436
      store_returns 28795080 1096.54 62.0299
      store_sales 287997024 12569.7 637.249
      time_dim 86400 3.56302 2.69203
      warehouse 15 2.18175 2.94752
      web_page 2040 1.4963 2.25162
      web_returns 7197670 322.083 17.4413
      web_sales 72001237 4934.32 174.369
      web_site 24 3.64008 5.64348

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              ben.thompson Ben Thompson
              Reporter:
              winstone Zdravelina Sokolovska (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:

                  Git Integration