[MCOL-589] Not able to import data from a csv file (LOAD DATA LOCAL INFILE) Created: 2017-02-23  Updated: 2017-06-03  Resolved: 2017-06-03

Status: Closed
Project: MariaDB ColumnStore
Component/s: cpimport
Affects Version/s: 1.0.7
Fix Version/s: Icebox

Type: Bug Priority: Blocker
Reporter: 胡彬 Assignee: Unassigned
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

CentOS 7.2


Attachments: Text File err.log     Text File info.log     Text File warning.log    

 Description   

My csv file is generated by tpc-ds tools. the data scale is 20T and partioned to 10 chunk. When I use 'LOAD DATA LOCAL INFILE' to import this csv, most are successful, but several failed. The err msg mcsmysql got is "ERROR 1030 (HY000) at line 1: Got error -1 "Internal error < 0 (Not system error)" from storage engine Columnstore". There is not any err log related found in UM or PMs.

My ColumnStore system is composed of 1 UM and 3 PMs in seperate mode, and the csv file is stored on another seperate server which run the 'LOAD DATA'.



 Comments   
Comment by David Thompson (Inactive) [ 2017-02-23 ]

Please make sure you have checked all logs (inclluding maria server) as documented here: https://mariadb.com/kb/en/mariadb/system-troubleshooting-mariadb-columnstore/. Especially check for evidence of the um server (or pms) runnig out of memory and the processes being restarted.

Second, is there a specific reason you are using LDI? If not can you try using cpimport directly as it also faster. We have seen some evidence that very large data loads going through LDI can sometimes lead to memory being exhausted on the UM. Alternatively try with smaller LDI files.

If you are still stuck it would help if you could provide the output from the columnstoreSupport utility.

Comment by Andrew Hutchings (Inactive) [ 2017-02-23 ]

In addition to David's advice, does your data include CHAR/VARCHAR columns with multibyte text?

Comment by 胡彬 [ 2017-02-23 ]

Thx for your answer.
When using mcsmysql to load data from a csv file, I logged the output into a file named xxx.csv.log. so the last modifed time of the log file should be the time when the error occured. Hers is the 'ls -lrt' output:

-rw-r--r-- 1 root root 115 Feb 22 22:53 catalog_returns_2_10.csv.log
-rw-r--r-- 1 root root 115 Feb 22 23:17 store_returns_2_10.csv.log
-rw-r--r-- 1 root root 115 Feb 23 00:05 catalog_returns_3_10.csv.log
-rw-r--r-- 1 root root 115 Feb 23 01:32 store_returns_3_10.csv.log
-rw-r--r-- 1 root root 115 Feb 23 05:10 web_sales_2_10.csv.log
-rw-r--r-- 1 root root 115 Feb 23 09:18 web_sales_3_10.csv.log
-rw-r--r-- 1 root root 115 Feb 23 11:35 catalog_sales_3_10.csv.log
-rw-r--r-- 1 root root 115 Feb 23 12:56 catalog_sales_2_10.csv.log
-rw-r--r-- 1 root root 115 Feb 23 14:02 store_sales_2_10.csv.log

Follow up on your suggestion, I rechecked the log file in /var/log/mariadb/columnstore on um1.
There are some msg in err.log, but the point in time does not exactly match the log file's modification time

Attachments are log files FYI. err.log info.log warning.log

Comment by Andrew Hutchings (Inactive) [ 2017-02-23 ]

The logs appear to indicate that you are trying to do LDI on multiple clients to the same table simultaneously. ColumnStore won't allow this which is probably why you are seeing errors. There are ways of doing this with cpimport, particularly loading into multiple PMs simultaneously.

Comment by 胡彬 [ 2017-02-23 ]

@Andrew, the data is generated by standard tpc-ds tools and should not have multibyte text. Especially the table 'web_sales' do not have char/varchar columns.

Comment by David Thompson (Inactive) [ 2017-05-08 ]

solargg do you have a response on doing LDI on mulitple clients at the same, if so that is not supported? Also with very high volumes for flat files it's better to consider using cpimport?

Comment by David Thompson (Inactive) [ 2017-06-03 ]

Please re-open if you have further information on the question.

Generated at Thu Feb 08 02:22:13 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.