[MCOL-250] cpimport not distributed data evenly... Created: 2016-07-06  Updated: 2017-06-07  Resolved: 2017-06-07

Status: Closed
Project: MariaDB ColumnStore
Component/s: cpimport
Affects Version/s: 1.0.1
Fix Version/s: Icebox

Type: Bug Priority: Major
Reporter: David Hill (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: None

Sprint: 2016-19

 Description   

report by an alpha customer

An update on this configuration... we have now doubled the data since 3 weeks ago now almost 13 billion rows across the 10 tables. I noticed a strange behaviour on the primary PM (OAM)... The data did is not spread evenly across the PM's... PM1 dbroots are far more utilized than the other dbroots on the other 3 PM's. We configured 2 x dbroots (2 TB each) per pm having 8 dbroots across all pm's. Dbroots 1 and 2 on PM1 is nearing 80% full whilst the other dbroots are only 39% full.



 Comments   
Comment by David Hill (Inactive) [ 2016-07-06 ]

additional info from customer

We are using cpimport mode 1 distributed across all 4 pm's... or so we thought. We thought of using modes 2 or 3 but due to the complexity of the ETL architecture we decided to keep to mode 1 for now. The load files are located on the pm's. We have ETL loaders on all 4 pm's loading in macro batches all day long. We decided to not use the um's for loading as it need to be free for user queries. Each of the ETL loaders on the PM's load a different table/s in mode 1 to all PM's. Yet still PM 1 is by far more utilized than the other 3.

Comment by David Thompson (Inactive) [ 2016-10-11 ]

Can you validate the data distribution on the multi node testing?

Comment by Daniel Lee (Inactive) [ 2017-04-21 ]

Tested using 1.0.8-1 on a 1um4pm stack and loaded 25GB of data (10g dbt3 and 2 of just the 10g lineitem.tbl).
It seems to be loaded evenly.

[root@localhost ~]# du -sh /usr/local/mariadb/columnstore/data1
6.0G /usr/local/mariadb/columnstore/data1

[root@localhost ~]# du -sh /usr/local/mariadb/columnstore/data2
5.5G /usr/local/mariadb/columnstore/data2

[root@localhost ~]# du -sh /usr/local/mariadb/columnstore/data3
5.9G /usr/local/mariadb/columnstore/data3

[root@localhost ~]# du -sh /usr/local/mariadb/columnstore/data4
5.9G /usr/local/mariadb/columnstore/data4

Comment by David Thompson (Inactive) [ 2017-06-07 ]

Not able to reproduce and possibly only an alpha issue.

Generated at Thu Feb 08 02:19:37 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.