[MCOL-382] Not able to import csv file (LOAD DATA INFILE) Created: 2016-10-30  Updated: 2017-08-09  Resolved: 2016-11-29

Status: Closed
Project: MariaDB ColumnStore
Component/s: MariaDB Server
Affects Version/s: 1.0.4
Fix Version/s: Icebox

Type: Bug Priority: Minor
Reporter: Christian Hotz-Behofsits Assignee: David Thompson (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

Ubuntu 16.04.1 LTS (Linux db-imsm 4.4.0-45-generic #66-Ubuntu SMP Wed Oct 19 14:12:37 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux)


Issue Links:
Relates
relates to MCOL-372 dbrm-worker node continuing issues so... Closed

 Description   

I tried to import a csv file into an existing table. But I get the following error:

ERROR 1030 (HY000) at line 1: Got error -1 "Internal error < 0 (Not system error)" from storage engine Columnstore

My err.log contains the following lines:

Oct 30 11:29:15 db-imsm cpimport.bin[25783]: 15.690561 |0|0|0| E 34 CAL0087: BulkLoad Error: Actual error row count(1) exceeds the max error rows(0) allowed for table youtube.videos
Oct 30 11:29:15 db-imsm cpimport.bin[25783]: 15.700120 |0|0|0| E 34 CAL0087: BulkLoad Error: Bulkload Read (thread 0) Failed for Table youtube.videos.  Terminating this job.
Oct 30 11:29:15 db-imsm writeengineserver[29681]: 15.715875 |0|0|0| E 32 CAL0000: pushing data : PIPE error .........Broken pipe
Oct 30 11:29:16 db-imsm writeengineserver[29681]: 16.698631 |0|0|0| E 32 CAL0000: 8418 : cpimport exit on failure (signal -1)
Oct 30 11:29:16 db-imsm writeenginesplit[25740]: 16.699185 |0|0|0| E 33 CAL0000: #033[0;31mReceived a Cpimport Failure from PM1#033[0m
Oct 30 11:29:16 db-imsm writeenginesplit[25740]: 16.699469 |0|0|0| E 33 CAL0087: BulkLoad Error: #033[0;31mReceived a Cpimport Failure from PM1#033[0m
Oct 30 11:29:16 db-imsm writeenginesplit[25740]: 16.700762 |0|0|0| E 33 CAL0000: PM1 : Bulkload Read (thread 0) Failed for Table youtube.videos.  Terminating this job.
Oct 30 11:29:16 db-imsm writeenginesplit[25740]: 16.700899 |0|0|0| E 33 CAL0087: BulkLoad Error: PM1 : Bulkload Read (thread 0) Failed for Table youtube.videos.  Terminating this job.
Oct 30 11:29:40 db-imsm cpimport.bin[26010]: 40.493277 |0|0|0| E 34 CAL0087: BulkLoad Error: Actual error row count(1) exceeds the max error rows(0) allowed for table youtube.videos
Oct 30 11:29:40 db-imsm cpimport.bin[26010]: 40.494443 |0|0|0| E 34 CAL0087: BulkLoad Error: Bulkload Read (thread 0) Failed for Table youtube.videos.  Terminating this job.
Oct 30 11:29:40 db-imsm writeengineserver[29681]: 40.508467 |0|0|0| E 32 CAL0000: pushing data : PIPE error .........Broken pipe
Oct 30 11:29:41 db-imsm writeengineserver[29681]: 41.756233 |0|0|0| E 32 CAL0000: 1839 : cpimport exit on failure (signal -1)
Oct 30 11:29:41 db-imsm writeenginesplit[25967]: 41.756643 |0|0|0| E 33 CAL0000: #033[0;31mReceived a Cpimport Failure from PM1#033[0m
Oct 30 11:29:41 db-imsm writeenginesplit[25967]: 41.756860 |0|0|0| E 33 CAL0087: BulkLoad Error: #033[0;31mReceived a Cpimport Failure from PM1#033[0m
Oct 30 11:29:41 db-imsm writeenginesplit[25967]: 41.757576 |0|0|0| E 33 CAL0000: PM1 : Bulkload Read (thread 0) Failed for Table youtube.videos.  Terminating this job.
Oct 30 11:29:41 db-imsm writeenginesplit[25967]: 41.757766 |0|0|0| E 33 CAL0087: BulkLoad Error: PM1 : Bulkload Read (thread 0) Failed for Table youtube.videos.  Terminating this job.

And my info.log/debug.logs are flooded by messages like the following one (up to 5 per second):

Oct 30 11:37:19 db-imsm messagequeue[28558]: 19.616034 |0|0|0| W 31 CAL0071: InetStreamSocket::read: timeout during readToMagic: socket read error: Success; InetStreamSocket: sd: 17 inet: 127.0.0.1 port: 40166; Will retry.

Server version: 10.1.18-MariaDB Columnstore 1.0.4-1



 Comments   
Comment by David Thompson (Inactive) [ 2016-10-31 ]

The error log is benign and not due to this, it is being tracked in MCOL-372 which also includes a config workaround to suppress this.

Comment by David Thompson (Inactive) [ 2016-10-31 ]

Can you add the cpimport command and maybe a sample of your data if possible to help repro?

Check also if you are specifying the right arguments for your data format, documented here:
https://mariadb.com/kb/en/mariadb/columnstore-bulk-data-loading/

Especially -s (field seperator) and -E (enclosed by character). For CSV you might need something like the following: -s ',' -E '"' depending on your exact format?

Comment by David Thompson (Inactive) [ 2016-11-29 ]

Closing as cannot repro due to lack of activity. Please reopen if you have time to test this and still have issues.

Comment by Damon [ 2017-08-08 ]

Experiencing this issue.
Server version: 10.1.25-MariaDB Columnstore 1.0.10-1
Seems to be data related. (special character?)
Happening while copying data from an existing InnoDB table to a Columnstore table.
I can get around it by skipping over the bad data WHERE uid >= 10 and uid <=30
then another insert WHERE uid >= 40
Gets me around whatever data is corrupting this load. I wish it was as easy as a 10 record chunk but I have 100's of millions of records.
Maybe this will ring some bells? Data issue- exists okay in every other engine except Columnstore?

Comment by David Thompson (Inactive) [ 2017-08-09 ]

So if you use insert select with no transactions what happens is we optimize this internally to a cpimport call but there are some limitations to this because it has to effectively convert the data to csv. I'd take a guess you might be hitting a problem with your data containing a delimeter. deepvoice There is a system variable to override this:
https://mariadb.com/kb/en/mariadb/columnstore-batch-insert-mode/

Also if it works in a transaction then that would also likely be the problem since that just maps to slower direct inserts.

I'd also check the columnstore and mysql error logs. In the case of when cpimport is used internally you might see some .bad and .err files in under mysql/db that might point to what is going on.

If you have more details it'd be best to file as a new jira and if possible provide the problem data (if you don't want to post that on public jira we can exchange that some other means).

Comment by Damon [ 2017-08-09 ]

REPLACE(data,'\'','\\\''),

I didn't even think about the character sets. Who does in the 21st century. But alas this is it. Unable to change to a data set that works with single quotes.

Comment by Damon [ 2017-08-09 ]

Actual error row count(1) exceeds the max error rows
Broken pipe

I have hundreds of millions of rows. Any guidance on how to find whatever it is that is causing it? I am not doing a file import, I am doing a table (InnoDB) to table (CS) copy.

Comment by Damon [ 2017-08-09 ]

figured it out. cpimport with debug 3 and errors set to a high number.
I also changed the field delimiter to a pipe.

cpimport <db> <table> <path_to_datafile> -d 3 -s '|' -E '"' -C '\N' -j 3001 -S -e 50000

Comment by David Thompson (Inactive) [ 2017-08-09 ]

Excellent - i was just about to start looking at this.

Generated at Thu Feb 08 02:20:39 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.