[MCOL-1522] Optimize the tpc-ds data generation on multi core CPU environments Created: 2018-07-02  Updated: 2023-10-26  Resolved: 2018-09-21

Status: Closed
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: None
Fix Version/s: Icebox

Type: Task Priority: Major
Reporter: Zdravelina Sokolovska (Inactive) Assignee: Zdravelina Sokolovska (Inactive)
Resolution: Fixed Votes: 0
Labels: None


 Description   

Currently we are running dsdgen to generate the entire data set as single thread
usually when deploy local load from data stored on the UM Node ,
and in multi treads only when have to
generate data and distribute data source files across PM nodes -eg to deploy cpimport load modes m2 and m3.

It's observed that it make long time to prepare bigger data sets especially with Scale Factor 1000 and above and/or when store all data on the UM or few PMs

It's needed to optimize the TPC_DS data generation on multi core CPU environments.



 Comments   
Comment by Zdravelina Sokolovska (Inactive) [ 2018-09-21 ]

tpc-ds data generation on multi core CPU environments,
data generation is accelerated regarding the HW capabilities of the environment;
in addition scrip works in 2 modes :
mode 1 is preparing data for MCS cpimport mode1
mode 2 is preparing data for MCS cpimport distributed methods

Generated at Thu Feb 08 02:29:24 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.