[MCOL-776] Not able to import CSV file in multi server architecture. Created: 2017-06-17  Updated: 2020-08-27  Resolved: 2017-07-12

Status: Closed
Project: MariaDB ColumnStore
Component/s: writeengine
Affects Version/s: 1.0.9
Fix Version/s: Icebox

Type: Bug Priority: Major
Reporter: Prakash Padhiyar Assignee: Unassigned
Resolution: Not a Bug Votes: 0
Labels: None
Environment:

AWS EC2 Instances - Ubuntu 16.04 - 64 Bit


Attachments: Microsoft Word Mariadb_Columnstore_Jira.docx     File columnstoreSupportReport.mycolumnstore1_20_6.tar.gz     File columnstoreSupportReport.mycs1.tar.gz     PNG File mariadb.PNG    

 Description   

Dear Support Team,

We have configured mariadb columnstore with multi server architecture in AWS with below configuration and server status shows everything is in running/active mode.
UM1 - 172.31.7.171 ( Ubuntu 16.04 64 Bit )
PM1 - 172.31.0.155 ( Ubuntu 16.04 64 Bit )
PM2 - 172.31.7.211 ( Ubuntu 16.04 64 Bit )
DB Root : Mount AWS EFS with specific dbroot along with /etc/fstab entries.

Below is the Error which we receive while import .CSV files using below command on UM1 :-
LOAD DATA INFILE '/home/ubuntu/ferrero_mults/fgroup.csv' INTO TABLE fgroup FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY '\n';

++++++++++++++++++++++++++++++++++++++++++
Jun 16 20:11:48 ip-172-31-7-171 writeenginesplit[7451]: 48.355327 |0|0|0| E 33 CAL0000: #033[0;31mReceived a Cpimport Failure from PM2#033[0m
Jun 16 20:11:48 ip-172-31-7-171 writeenginesplit[7451]: 48.355411 |0|0|0| E 33 CAL0087: BulkLoad Error: #033[0;31mReceived a Cpimport Failure from PM2#033[0m
Jun 16 20:13:14 ip-172-31-7-171 writeenginesplit[7515]: 14.580518 |0|0|0| E 33 CAL0000: #033[0;31mReceived a Cpimport Failure from PM1#033[0m
Jun 16 20:13:14 ip-172-31-7-171 writeenginesplit[7515]: 14.580587 |0|0|0| E 33 CAL0087: BulkLoad Error: #033[0;31mReceived a Cpimport Failure from PM1#033[0m
Jun 16 20:15:00 ip-172-31-7-171 writeenginesplit[7585]: 00.155368 |0|0|0| E 33 CAL0000: #033[0;31mReceived a Cpimport Failure from PM1#033[0m
Jun 16 20:15:00 ip-172-31-7-171 writeenginesplit[7585]: 00.155471 |0|0|0| E 33 CAL0087: BulkLoad Error: #033[0;31mReceived a Cpimport Failure from PM1#033[0m
+++++++++++++++++++++++++++++++++++++++++++++++

Here i have also attached "columnstoreSupportReport" which is generated from UM1.
Please help me to resolve this as i could find much more help to resolve this issue.



 Comments   
Comment by Andrew Hutchings (Inactive) [ 2017-06-19 ]

Hi,

A couple of questions regarding this:

1. Does running regular inserts work against these tables?
2. Do the CSV files contain any UTF-8 (or other multibyte) characters?

Comment by Prakash Padhiyar [ 2017-06-19 ]

Hi Andrew,

Thanks for the update.

We are able to perform normal queries and its a normal .CSV file. But good thing is now we are able to run that i have reported query.

It's my bad and drive mounting not properly configured with AWS EFS so i have done proper remount and reinstall works for me.

Another thing that might i need to know from you is..
We have right now.. 1 UM and 2 PM right now active. Now i have added PM3 but that shows me in COLD STANDBY. Why ? asking just because right now i am not able use the PM3 when my both PM's are fully utilized and no data store to PM3 as well.

See the attached mariadb.png screenshot may help you to get an idea.

Comment by David Hill (Inactive) [ 2017-06-19 ]

On your system, PM1 is considered the Active PM since that is where the ProcessManager is Active. PM2 - ProcessManager is hot-standby meaning PM2 would take over as Active PM if PM1 failed in a High Available configured setup. PM2 - ProcessManager is cold-standby meaning PM3 would take over as Hot-standby if PM1 failed in a High Available configured setup.

You mentioned you added pm3 into the system. If you disk are fully utilized on PM1/PM2 ,then you will need to run the redistrubutedDB command to free up disk space on PM1/PM2 by moving some of the data to PM3.

https://mariadb.com/kb/en/mariadb/columnstore-redistribute-data/

David Hill

Comment by Prakash Padhiyar [ 2017-06-19 ]

Ok Great understanding provided by you. Thanks David.

Just to curious , How can i use PM3 server as load distribution to get faster output of my query. ?
For example... If i have run the query from UM1 with lots of data and now i want that query to be executed simultaneously on all PM1 , PM2 and PM3 with maximum CPU and Memory utilization. Probably i am thinking about to add PM4 as well. Is that workable solution ? Kindly Suggest best way.

Thanks
Prakash

Comment by David Hill (Inactive) [ 2017-06-19 ]

Once you have the data balanced between the 3 pms, then the queries will utilize the 3 pms making it faster. This will all happen automatically..

David Hill

Comment by Prakash Padhiyar [ 2017-06-19 ]

David,

What command do i need to run for data balance between all the 3 PMs ? Because right now i could not see any data is store on PM3.

Please help.

Thanks
Prakash

Comment by David Hill (Inactive) [ 2017-06-19 ]

Its document in the KB doc link I provide, assuming you have access to this document:

https://mariadb.com/kb/en/mariadb/columnstore-redistribute-data/

mcsadmin> redistributeData start

Comment by Prakash Padhiyar [ 2017-06-19 ]

David,

I already tried with that as well. We are executing a query with full load and our both PM1 and PM2 are getting fully utilized But not a single request is coming to my PM3 server.

Is there any configuration missing for PM3 ? How we can use PM3 to execute the query for faster response and store the data.

Please guide.

Comment by Daniel Lee (Inactive) [ 2017-06-19 ]

Prakash;

Please check the following

1) on each PMs, please run "du -sh /usr/local/mariadb/columnstore/dataN, where N is the 1, 2, 3 for PM1, PM2, PM3 respectively.
2) on the mcsadmin console, run "redistributedata start"
3) while redistributedata is running, run "redistributedata status" a few times and capture output
4) when redistributedata is finished, check dbroot sizes again.
5) repeat step 1

Please provided the output for the above steps.

Thanks

Comment by Prakash Padhiyar [ 2017-06-19 ]

Denial,

Please find attached screenshot document which contain all the information which you suggested to provide.

Kindly help me to out of this.

Thanks in Advance.
Prakash

Comment by Daniel Lee (Inactive) [ 2017-06-19 ]

Prakash;

I noticed that you don't have much data in your database. There are files for each column in each table. Most likely, you don't have enough data for the columns to cause redistribution to occur. If this is not a production system and it is for testing only, you can load 20g data into the system and you will see data spread across dbroots.

Daniel

Comment by Daniel Lee (Inactive) [ 2017-06-19 ]

Prakash;

If you want to see your existing data in all three PMs, you can truncate or recreate the tables, then reload them again.

Daniel

Comment by Developer [ 2017-06-20 ]

Hi Deniel,

I am working as a developer for this project. Let me clear you our concern. Erlier when we have 1 UM and 1 PM at that time when I will run any query It was taking 8 Sec to fetch data.

Than after we have added one more PM called PM2 So now we have 1 UM and 2 PM in that case when I have run same query It was taking 4 sec to fetch data as we have added more resource so It has shared resource of PM1 and PM2.

Now we need to improve this result also in future so what we need to do we have to add one more PM with current system So we are considering current system (1 UM and 2 PM) as a our live system and we have to attach one more PM with current setup.

So we have added that as prakash has provided that screenshots but that new PM3 is showing ProcessManager as "Cold Stand BY" thats why it is not sharing load of PM1 and PM2 so can you help us to resolve that problem.

Earlier when we have added 1 UM and 2 PM that we have achieved by doing whole setup from scratch but in this case we dont want to do this as in future lets say our system is live with 30GB data in that case we cant take any risk of our data.

Please provide your input for this.

Thanks.

Comment by Daniel Lee (Inactive) [ 2017-06-20 ]

Shashank;

You can find answers to your questions or concerns in the prior comments that we provided. Let me summarize them.

1) All your nodes, UM and PMs, are active and functioning normally. PM1 is acting as the master controller of the system. PM2 is designated as a hot backup, and PM3 is designated as the backup for PM3.

2) Each PM only process data in dbroots attached to it. Your PM3 does not have (or does not have much) data in dbroot3. Therefore, PM3 is not doing much processing work. After you added PM2, there is data in PM2. Did you load additional data or run redistributedata?

3) The redistributedata command moves data around by relocating files. It would be effective if you have more data in the database. If you want your existing data to be balanced, you can truncate the table and reload them again. The system will try to spread data across all dbroots by batches, possibly resulting in smaller, non-full files. Depending on your dataset, your may not have enough data for even distribution among dbroots.

Comment by Developer [ 2017-06-20 ]

Hi Daniel,

Thank you very much for your explanation. I have fixed issue by cropping database and re creating structure and importing data again. So in future when we need to add new PM than after configuration we need to redistributedata. That will make new PM aligned.

Thank you very much again.

Comment by David Thompson (Inactive) [ 2017-07-12 ]

This was resolved without a code fix.

Generated at Thu Feb 08 02:23:44 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.