[MCOL-768] Mariadb columnstore problem with BATCH INSERT Created: 2017-06-10  Updated: 2017-08-09  Resolved: 2017-08-09

Status: Closed
Project: MariaDB ColumnStore
Component/s: DMLProc
Affects Version/s: None
Fix Version/s: Icebox

Type: Bug Priority: Major
Reporter: Kleyson Rios Assignee: Unassigned
Resolution: Not a Bug Votes: 0
Labels: None
Environment:

github.com/mariadb-corporation/mariadb-columnstore-docker.gi‌​t
mariadb-columnstore-1.0.9-1-centos7.x86_64.rpm.tar.gz


Attachments: File columnstoreSupportReport.columnstore-1.tar.gz     File kettle.log     PNG File pdi-batch.png     PNG File pdi-non-batch.png     PNG File slow-load.png    

 Description   

I'm using Pentaho Kettle to load data into the DW.

I'm trying to store data using the step 'Table Output' in 'batch mode' with batch size 1000.

After run the transformation, Kettle throw an exception (see the the kettle.log file attached).

If I change the connection string to:

jdbc:mysql://localhost:3306/dbtest?sessionVariables=sql_mode='MSSQL'

I'm able to run the transformation without errors, but no record is stored in the database.

If the 'batch mode' is disable the records are stored successfully one by one.

Attached the Kettle images showing the process in batch and non-batch mode.

IAttached the columnstoreSupportReport.columnstore-1.tar.gz file as well.

Best Regards.



 Comments   
Comment by David Thompson (Inactive) [ 2017-06-10 ]

The kettle log has:
2017/06/10 06:42:08 - Table output.0 - Caused by: java.sql.BatchUpdateException: (conn:109) Out of range value for column 'day_in_year' at row 1

In looking at the schema report in pm1_dbmsReport.txt i can see that this column is a tinyint, you should make this a smallint to cover the possible values. Can you see if that fixes the problem?

Details on datatype ranges: https://mariadb.com/kb/en/mariadb/columnstore-data-types/

Comment by Kleyson Rios [ 2017-06-11 ]

After fixed the data-type/sizes the load happens but in a very slow rate 4 r/s. Basically the same rate that in non-batch mode.

Comment by David Thompson (Inactive) [ 2017-06-11 ]

That's probably to be expected with direct transactional DML depending on your table width and storage speed.

If you need to table based ETL, could you do:
1. current logic to an innodb staging table
2. invoke custom script to run insert into <columnstore-table> select * from <innodb-staging-table> with autocommit on.
3. truncate staging table

Insert select with autocommit on will be internally optimized into a cpimport call.

For 1.1 we are working on a bulk write API that can enable creation of ETL tool connectors to write directly into our internal bulk write logic. But for 1.0 the above is probably the best option for you.

Generated at Thu Feb 08 02:23:41 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.