[MCOL-633] Unable to run LOAD DATA INFILE where cpimport succeeds Created: 2017-03-23  Updated: 2017-06-05  Resolved: 2017-03-24

Status: Closed
Project: MariaDB ColumnStore
Component/s: MDB Plugin
Affects Version/s: None
Fix Version/s: Icebox

Type: Bug Priority: Major
Reporter: Dimitris Theodorou Assignee: Andrew Hutchings (Inactive)
Resolution: Not a Bug Votes: 0
Labels: None

Attachments: Microsoft Word 1row.csv     Text File create_table.sql     File game_answer_facts.tbl.Job_4253_30358.bad_1     File game_answer_facts.tbl.Job_4253_30358.err_1     Text File insert_select_debug.log     Text File ldi_debug.log    

 Description   

I'm trying to load a csv into a 10.1.21-MariaDB Columnstore through LOAD DATA INFILE. The reason I use that instead of cpimport is because it allows me to run the command from a remote client, while cpimport is an executable located only in the server.

Here are the results I get when trying to load a single row with double-quoted tab-delimited fields. I have attached the csv file

With cpimport:

sudo /usr/local/mariadb/columnstore/bin/cpimport <db> <table> <csv file> -s '\t' -E '"' -C '\\' -S

Successful result:

[vagrant@columnstore vagrant]$ sudo /usr/local/mariadb/columnstore/bin/cpimport <db> <table> <csv file> -s '\t' -E '"' -C '\\' -S
Locale is : C
Column delimiter : \t
Enclosed by Character : "
Escape Character  : \
 
Using table OID 4253 as the default JOB ID
Input file(s) will be read from : /vagrant
Job description file : /usr/local/mariadb/columnstore/data/bulk/tmpjob/4253_D20170323_T160949_S888528_Job_4253.xml
Log file for this job: /usr/local/mariadb/columnstore/data/bulk/log/Job_4253.log
2017-03-23 16:09:49 (30539) INFO : successfully loaded job file /usr/local/mariadb/columnstore/data/bulk/tmpjob/4253_D20170323_T160949_S888528_Job_4253.xml
2017-03-23 16:09:49 (30539) INFO : Job file loaded, run time for this step : 0.0393081 seconds
2017-03-23 16:09:49 (30539) INFO : PreProcessing check starts
2017-03-23 16:09:49 (30539) INFO : input data file <csv file>
2017-03-23 16:09:50 (30539) INFO : PreProcessing check completed
2017-03-23 16:09:50 (30539) INFO : preProcess completed, run time for this step : 0.242127 seconds
2017-03-23 16:09:50 (30539) INFO : No of Read Threads Spawned = 1
2017-03-23 16:09:50 (30539) INFO : No of Parse Threads Spawned = 3
2017-03-23 16:09:50 (30539) INFO : For table goingup.game_answer_facts: 1 rows processed and 1 rows inserted.
2017-03-23 16:09:51 (30539) INFO : Bulk load completed, total run time : 1.29235 seconds

With LOAD DATA INFILE (run with mcsmysql --local-infile):

LOAD DATA LOCAL INFILE '<csv file>'
INTO TABLE <table>
FIELDS TERMINATED BY '\t' ENCLOSED BY '"' ESCAPED BY '\\'
LINES TERMINATED BY '\n';

Unsuccessful result:

MariaDB [db]>
LOAD DATA LOCAL INFILE '<csv file>' INTO TABLE <table> FIELDS TERMINATED BY '\t' ENCLOSED BY '"' ESCAPED BY '\\' LINES TERMINATED BY '\n';
ERROR 1815 (HY000): Internal error: PM1 : Bulkload Read (thread 0) Failed for Table goingup.game_answer_facts.  Terminating this job.

2 files related to the error are ouput at the /usr/local/mariadb/columnstore/mysql/db/ directory, I have attached those as well. The err file says "Line number 1; Error: Data violates NOT NULL constraint with no default; field 2" which doesn't make sense. The bad file looks like it contains bad data, like some data truncation is taking place but I can't say exactly what's happening since it looks like it contains some binary data.

I tried the exact same LOAD DATA INFILE statement to an identical table on InnoDB, and it is successful with 0 warnings. Curiously, I tried INSERT ... SELECT to insert data from InnoDB to Columnstore and it fails with exactly the same error.



 Comments   
Comment by David Thompson (Inactive) [ 2017-03-24 ]

Could you share the create table statement so i can see if we can repro?

When you run load data infile with autocommit on, internally we map it to cpimport so this is why you see the bad and err files. There may also be something helpful in the columnstore logs under /var/log/mariadb/columnstore that may pinpoint the issue.

Comment by Dimitris Theodorou [ 2017-03-24 ]

I attached the create statement, and the debug logs from the LOAD DATA INFILE as well as the INSERT .. SELECT commands. The logs don't say anything more helpful than there's 1 row failing with errors

Comment by Andrew Hutchings (Inactive) [ 2017-03-24 ]

TL;DR: utf8mb4 isn't yet supported with ColumnStore.

OK, I did some debugging and the reason for this is a little complex...

LOAD DATA INFILE works as follows:

  • MariaDB starts parsing the CSV file
  • MariaDB tells ColumnStore to expect a bulk insert
  • ColumnStore spawns a cpimport instance
  • MariaDB sends ColumnStore the data in binary row format
  • ColumnStore converts this back into CSV format and pipes it into the cpimport

Now, there is code in that final conversion step that pretty much says 'if utf8 read the character length this way' which is based on the regular utf8 charset. It is not designed to work with utf8mb4 and we do not yet support this.

Comment by Andrew Hutchings (Inactive) [ 2017-03-24 ]

Setting as 'Not a Bug' as this is not a supported action. If you create the table as 'utf8' instead of 'utf8m4' this will work.

Only regular UTF-8 is supported as documented here: https://mariadb.com/kb/en/mariadb/mariadb-columnstore-system-usage/

We plan in a future version to support a wider range of character sets along with proper collation support.

Comment by Dimitris Theodorou [ 2017-03-24 ]

Thanks for the help.

I think you should update the documentation then to point out which character sets are supported. From the page you linked, I read

Please see the following MariaDB reference of valid values: Supported Character Sets and Collations.

And that page contains all supported MariaDB charsets including utf8mb4. The columnstore docs should be updated with the charsets it actually supports.

Comment by David Thompson (Inactive) [ 2017-03-27 ]

Good point, i have removed that link from the section for now as it's better that we keep people on utf-8 for now. https://mariadb.com/kb/en/mariadb/mariadb-columnstore-system-usage/

Comment by Dimitris Theodorou [ 2017-04-09 ]

Just had the same issue with utf8_unicode_ci collation. utf8_general_ci worked

Comment by David Thompson (Inactive) [ 2017-04-10 ]

Thanks, I would stick with the documented setup for now and we'll come back to greater character set support as an improvement.

Generated at Thu Feb 08 02:22:33 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.