[MCOL-4000] Add an option to allow using cpimport for LDI and INSERT..SELECT in a transaction Created: 2020-05-12  Updated: 2020-05-19  Resolved: 2020-05-19

Status: Closed
Project: MariaDB ColumnStore
Component/s: MDB Plugin
Affects Version/s: 1.4.3
Fix Version/s: 1.4.4

Type: Task Priority: Major
Reporter: Gagan Goel (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MCOL-4007 Add an option to allow using cpimport... Closed
Sprint: 2020-7

 Description   

CS has a session variable, columnstore_use_import_for_batchinsert, that has 2 values, ON (default) and OFF. This allows a user to set the option of using cpimport (ON) or the regular DML (OFF) for LDI and INSERT SELECT statements.

If a user performs these queries in a transaction (such as by setting autocommit=0 or using the START TRANSACTION statement), however, CS fallsback to using DML for the operation. This behaviour is because of the current limitation of cpimport to handle rollbacks.

We want to add a third option to columnstore_use_import_for_batchinsert: ALWAYS. This value will always use cpimport for LDI and INSERT SELECT queries, irrespective of whether the query runs in a transaction or not. This gives the user the ability to use a faster import method, with the caveat that if a user issues a rollback of the transaction, it will have no effect as the data would have already been committed to actual database files by cpimport.



 Comments   
Comment by Gagan Goel (Inactive) [ 2020-05-13 ]

Following values can be used for columnstore_use_import_for_batchinsert:

OFF (0): Do not use cpimport for LDI or INSERT..SELECT
ON (1, default): Use cpimport for LDI or INSERT..SELECT only for non-transactional queries
ALWAYS (2): Use cpimport for LDI or INSERT..SELECT from within a transaction as well as for non-transactional queries

The session variable can be changed for either the current session using the 'set' command,
or for all future sessions using the 'set global' command.

For QA:
Create a table with a few columns and a large data file (e.g. 100k or 200k records) to be imported using LDI.

1. First test the current default behaviour, i.e. columnstore_use_import_for_batchinsert=ON:

  • Perform LDI inside a transaction (e.g. by using the START TRANSACTION command, or SET AUTOCOMMIT=0): this should use the regular DML operation. Doing a rollback will undo the load from the table.
  • Perform LDI outside the transaction: this will use cpimport for the load. To check the operation uses cpimport, use the linux top command. This operation will also run magnitudes of order faster than the previous one from within a transaction that used DML. Performing a rollback with not undo the load.

2. Test the new behaviour by setting columnstore_use_import_for_batchinsert=ALWAYS:

  • Perform LDI inside a transaction: this time, it should use cpimport for the operation. Doing a rollback will not undo the load from the table.
  • Perform LDI outside the transaction: this should again use cpimport for the load. Performing a rollback with not undo the load.

Same tests can be repeated for INSERT ... SELECT.

Comment by Gagan Goel (Inactive) [ 2020-05-15 ]

For QA: I have added instructions above on how to test this feature. In addition, we have added test cases to the regression suite for this: mysql/queries/working_dml/misc/MCOL-4000.sql

Comment by Daniel Lee (Inactive) [ 2020-05-19 ]

Created MCOL-4007 for tracking this issue in 1.5. Removing fixed version of 1.5 in this ticket.

Comment by Daniel Lee (Inactive) [ 2020-05-19 ]

Build verified: 1.4.4-1 (Jenkins 20200518)

Tested all these modes (On, Off, Always) for the columnstore_use_import_for_batchinsert variable, with and without transaction.

Tested for commit, auto commit, and rollback, for both LDI and insert..select.

Generated at Thu Feb 08 02:47:02 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.