[MCOL-2040]  mcsimport load is executed with worst compression ratio and more used disk space than mcs cpimport Created: 2018-12-19  Updated: 2023-10-26  Resolved: 2022-05-09

Status: Closed
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: 1.2.3, 1.2.2
Fix Version/s: Icebox

Type: Bug Priority: Major
Reporter: Zdravelina Sokolovska (Inactive) Assignee: Unassigned
Resolution: Won't Fix Votes: 0
Labels: None
Environment:

mcs single server; 64G memory; 8CPUs; CentOS7.5


Issue Links:
Duplicate
is duplicated by MCOL-5013 Support Load data from AWS S3 : UDF... Closed
Epic Link: Consolidate & Redevelop All Columnstore Tools (SDK, Adapters, Backup, Restore, mcsimport)

 Description   

mcsimport load is executed with worst compression ratio and more used disk space than mcs cpimport

It would be expected to accomplish mcsimport tool without degradation in compression ratio and increased disk space usage in comparison to the cpimport

notes:
mcsimport was installed and executed locally on MCS in order to compare the mcsimport tool to cpimport excluding the network delay
mcsimport is installed with mariadb-columnstore-api-cpp and mariadb-columnstore-tools packages;
cpimport comes with mariadb-columnstore installation; cpimport was run in mode m1;

MCS Load Method Scale Factor Data Volume Used Disk Space Compression_Ratio
mcsimport 100 100GB 31.69 GB 5.0409:1
cpimport 100 100GB 21.5 GB 5.0392:1
MCS Load Method Scale Factor Data Volume Used Disk Space
mcsimport 1000 1TB 493.46 GB
cpimport 1000 1TB 360.13 GB

*cpimport*
Start Load Test columnstore_info.total_usage
TOTAL_DATA_SIZE TOTAL_DISK_USAGE
249.05 GB       83.51 GB
 
 
 
End Load Test columnstore_info.total_usage
TOTAL_DATA_SIZE TOTAL_DISK_USAGE
311.82 GB       105.01 GB
 
 
columnstore_info.table_usage
TABLE_SCHEMA    TABLE_NAME      DATA_DISK_USAGE DICT_DATA_USAGE TOTAL_USAGE
tpcds_100       call_center     48.74 MB        36.14 MB        84.88 MB
tpcds_100       catalog_page    13.07 MB        8.03 MB 21.10 MB
tpcds_100       catalog_returns 1.52 GB 0.00 Bytes      1.52 GB
tpcds_100       catalog_sales   5.07 GB 0.00 Bytes      5.07 GB
tpcds_100       customer        808.14 MB       264.06 MB       1.05 GB
tpcds_100       customer_address        720.10 MB       206.08 MB       926.18 MB
tpcds_100       customer_demographics   304.07 MB       4.02 MB 308.09 MB
tpcds_100       date_dim        25.22 MB        4.02 MB 29.23 MB
tpcds_100       household_demographics  6.04 MB 2.01 MB 8.05 MB
tpcds_100       income_band     3.02 MB 0.00 Bytes      3.02 MB
tpcds_100       inventory       454.61 MB       0.00 Bytes      454.61 MB
tpcds_100       item    34.17 MB        272.10 MB       306.27 MB
tpcds_100       promotion       17.40 MB        8.03 MB 25.43 MB
tpcds_100       reason  5.02 MB 4.02 MB 9.04 MB
tpcds_100       ship_mode       11.05 MB        10.04 MB        21.09 MB
tpcds_100       store   45.73 MB        34.13 MB        79.86 MB
tpcds_100       store_returns   2.20 GB 0.00 Bytes      2.20 GB
tpcds_100       store_sales     3.40 GB 0.00 Bytes      3.40 GB
tpcds_100       time_dim        13.58 MB        8.03 MB 21.61 MB
tpcds_100       warehouse       23.61 MB        20.08 MB        43.69 MB
tpcds_100       web_page        16.36 MB        6.02 MB 22.38 MB
tpcds_100       web_returns     768.19 MB       0.00 Bytes      768.19 MB
tpcds_100       web_sales       5.13 GB 0.00 Bytes      5.13 GB
tpcds_100       web_site        41.70 MB        32.12 MB        73.83 MB
 
columnstore_info.compression_ratio
COMPRESSION_RATIO
5.0392:1

*mcsimport*
 
Start Load Test columnstore_info.total_usage
TOTAL_DATA_SIZE TOTAL_DISK_USAGE
249.05 GB       83.51 GB
 
 
End Load Test columnstore_info.total_usage
TOTAL_DATA_SIZE TOTAL_DISK_USAGE
311.83 GB       115.20 GB
 
columnstore_info.table_usage
TABLE_SCHEMA    TABLE_NAME      DATA_DISK_USAGE DICT_DATA_USAGE TOTAL_USAGE
tpcds_100       call_center     48.74 MB        36.14 MB        84.88 MB
tpcds_100       catalog_page    13.07 MB        8.03 MB 21.10 MB
tpcds_100       catalog_returns 1.69 GB 0.00 Bytes      1.69 GB
tpcds_100       catalog_sales   8.51 GB 0.00 Bytes      8.51 GB
tpcds_100       customer        808.14 MB       264.06 MB       1.05 GB
tpcds_100       customer_address        720.10 MB       206.08 MB       926.18 MB
tpcds_100       customer_demographics   304.07 MB       4.02 MB 308.09 MB
tpcds_100       date_dim        25.22 MB        4.02 MB 29.23 MB
tpcds_100       household_demographics  6.04 MB 2.01 MB 8.05 MB
tpcds_100       income_band     3.02 MB 0.00 Bytes      3.02 MB
tpcds_100       inventory       1.00 GB 0.00 Bytes      1.00 GB
tpcds_100       item    34.17 MB        272.10 MB       306.27 MB
tpcds_100       promotion       17.40 MB        8.03 MB 25.43 MB
tpcds_100       reason  5.02 MB 4.02 MB 9.04 MB
tpcds_100       ship_mode       11.05 MB        10.04 MB        21.09 MB
tpcds_100       store   45.73 MB        34.13 MB        79.86 MB
tpcds_100       store_returns   2.50 GB 0.00 Bytes      2.50 GB
tpcds_100       store_sales     5.75 GB 0.00 Bytes      5.75 GB
tpcds_100       time_dim        13.58 MB        8.03 MB 21.61 MB
tpcds_100       warehouse       23.61 MB        20.08 MB        43.69 MB
tpcds_100       web_page        16.36 MB        6.02 MB 22.38 MB
tpcds_100       web_returns     768.19 MB       0.00 Bytes      768.19 MB
tpcds_100       web_sales       8.51 GB 0.00 Bytes      8.51 GB
tpcds_100       web_site        41.70 MB        32.12 MB        73.83 MB
 
 
columnstore_info.compression_ratio
COMPRESSION_RATIO
5.0409:1



 Comments   
Comment by Andrew Hutchings (Inactive) [ 2018-12-19 ]

Could be either dictionary de-duplication not working properly or post-segment fill file truncation not happening. Need to see extent map and files to be sure.

Comment by Ben Thompson (Inactive) [ 2022-05-09 ]

Newer project reference MCOL-5013 to replace mcsimport

Generated at Thu Feb 08 02:33:16 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.