[MCOL-1252] CSV load into columnstore table using Pentaho adapter with CS Bulk Import plugin doesn't succeed Created: 2018-03-08  Updated: 2023-10-26  Resolved: 2018-03-14

Status: Closed
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: None
Fix Version/s: 1.1.4

Type: Bug Priority: Major
Reporter: Elena Kotsinova (Inactive) Assignee: Elena Kotsinova (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

CentOS7 7,
PDI 7.1


Attachments: File csv-import.kjb     File csv_bulk_cs_load.ktr     Microsoft Word query-validation-results-1-0.csv    
Sprint: 2018-05, 2018-06

 Description   

Use case:
User has CSV file with data. CSV file must be loaded in ColumnStore using CS Bulk Loader.
User creates a job flow with respective transformation in PDI.

Result:
Error in CS Bulk Loader execution:

2018/03/08 17:17:23 - MariaDB ColumnStore Bulk Loader.0 - Starting to run...
2018/03/08 17:17:23 - MariaDB ColumnStore Bulk Loader.0 - Input field names and types
2018/03/08 17:17:23 - MariaDB ColumnStore Bulk Loader.0 - 0 : Field_000          (Integer)
2018/03/08 17:17:23 - MariaDB ColumnStore Bulk Loader.0 - 1 : Field_001          (Integer)
2018/03/08 17:17:23 - MariaDB ColumnStore Bulk Loader.0 - 2 : Field_002          (Integer)
2018/03/08 17:17:23 - MariaDB ColumnStore Bulk Loader.0 - 3 : Field_003          (Integer)
2018/03/08 17:17:23 - MariaDB ColumnStore Bulk Loader.0 - 4 : Field_004          (String)
2018/03/08 17:17:23 - MariaDB ColumnStore Bulk Loader.0 - 5 : Field_005          (String)
2018/03/08 17:17:23 - MariaDB ColumnStore Bulk Loader.0 - 6 : Field_006          (String)
2018/03/08 17:17:23 - MariaDB ColumnStore Bulk Loader.0 - 7 : Field_007          (Integer)
2018/03/08 17:17:23 - MariaDB ColumnStore Bulk Loader.0 - 8 : Field_008          (Integer)
2018/03/08 17:17:23 - MariaDB ColumnStore Bulk Loader.0 - ColumnStore rows and types
2018/03/08 17:17:23 - MariaDB ColumnStore Bulk Loader.0 - 0 : idqsvr : DATA_TYPE_INT
2018/03/08 17:17:23 - MariaDB ColumnStore Bulk Loader.0 - 1 : idquerytemplate : DATA_TYPE_INT
2018/03/08 17:17:23 - MariaDB ColumnStore Bulk Loader.0 - 2 : syntaxvalid : DATA_TYPE_TINYINT
2018/03/08 17:17:23 - MariaDB ColumnStore Bulk Loader.0 - 3 : resultvalid : DATA_TYPE_TINYINT
2018/03/08 17:17:23 - MariaDB ColumnStore Bulk Loader.0 - 4 : reasonsyntaxfailure : DATA_TYPE_VARCHAR
2018/03/08 17:17:23 - MariaDB ColumnStore Bulk Loader.0 - 5 : reasonresultfailure : DATA_TYPE_VARCHAR
2018/03/08 17:17:23 - MariaDB ColumnStore Bulk Loader.0 - 6 : notes : DATA_TYPE_VARCHAR
2018/03/08 17:17:23 - MariaDB ColumnStore Bulk Loader.0 - 7 : iderror : DATA_TYPE_INT
2018/03/08 17:17:23 - MariaDB ColumnStore Bulk Loader.0 - 8 : idrun : DATA_TYPE_INT
2018/03/08 17:17:23 - MariaDB ColumnStore Bulk Loader.0 - Iterating through the ColumnStore table to set the row object
2018/03/08 17:17:23 - MariaDB ColumnStore Bulk Loader.0 - Column 0 - idqsvr - trying to insert item: 0, value to String: [B@1ec63060
2018/03/08 17:17:23 - MariaDB ColumnStore Bulk Loader.0 - Try to insert item 0 as Long
2018/03/08 17:17:23 - MariaDB ColumnStore Bulk Loader.0 - ERROR (version 7.1.0.0-12, build 1 from 2017-05-16 17.18.02 by buildguy) : Unexpected error
2018/03/08 17:17:23 - MariaDB ColumnStore Bulk Loader.0 - ERROR (version 7.1.0.0-12, build 1 from 2017-05-16 17.18.02 by buildguy) : java.lang.ClassCastException: [B cannot be cast to java.lang.Long
2018/03/08 17:17:23 - MariaDB ColumnStore Bulk Loader.0 - 	at com.mariadb.columnstore.api.kettle.KettleColumnStoreBulkExporterStep.processRow(KettleColumnStoreBulkExporterStep.java:207)
2018/03/08 17:17:23 - MariaDB ColumnStore Bulk Loader.0 - 	at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62)
2018/03/08 17:17:23 - MariaDB ColumnStore Bulk Loader.0 - 	at java.lang.Thread.run(Thread.java:748)
2018/03/08 17:17:23 - csv_bulk_cs_load - Transformation has allocated 3 threads and 2 rowsets.

Steps to reproduce:
1. Download the attached files
2. Open PDI Spoon tool and open cvs-import.kjb
3. Navigate to Database connections and update MariaDB Connection with correct server credentials. The connection is set to be shared so it is expected to be visible to all components in the job. Because of this it is enough to change connection setting only here.
4. Open csv_bulk_cs_load transformation object
5. Update path to the csv file query-validation-results-1-0.csv. Used path in testing environment is: /home/pentaho/Downloads/CSV/query-validation-results-1-0.csv
6. Run the job.

Expected:
Job must create 2 tables in database "test".
One InnoDB table and one ColumnStore table.
Both tables must have 2 records in them as per used CSV file.

Actual Result:
Error on loading of the ColumnStore table. ColumnStore table is empty.



 Comments   
Comment by Jens Röwekamp (Inactive) [ 2018-03-09 ]

turns out to be related to lazy conversion, if set to false (unchecked), the job will succeed. Found a different error though regarding null values, which is fixed now.

Will investigate further regarding, how to support lazy conversion.

Comment by Jens Röwekamp (Inactive) [ 2018-03-09 ]

Support for lazy conversion and handling of null data added.

Comment by Elena Kotsinova (Inactive) [ 2018-03-14 ]

Verified as fixed for build 1.1.4 from 14 March 2018.

Lazy convention tested.
Empty fields in input CSV file tested.

Generated at Thu Feb 08 02:27:21 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.