[MCOL-608] INSERT converted to bulk load Created: 2017-03-07  Updated: 2022-02-25  Resolved: 2017-03-07

Status: Closed
Project: MariaDB ColumnStore
Component/s: cpimport
Affects Version/s: None
Fix Version/s: Icebox

Type: New Feature Priority: Minor
Reporter: Anders Karlsson Assignee: Todd Stoffel (Inactive)
Resolution: Won't Fix Votes: 0
Labels: None


 Description   

One advantage of ColumnStore is that standard DML is also allowed, in addition to Bulk Loading. The issue is that is this is still very slow. Many application developers probably appreciate the ability to run DML instead of using bulk loading facilities, data might be streamed or some other reason that makes classic bulk loading cumbersome. I thuse came up with the idea to use MaxScale as a converter of INSERT to bulk loading. The way I see it working is this:
1. All SQL to CS pass though MaxScale.
2. Any SELECT, UPDATE or DELETE are passed directly to ColumnStore.
3. INSERTs are passed to a module that writes data in the insert to a file, somehow named after the table being inserted to, and in a cpimport friendly format so each table has a separate file.
4. At regular intervals, based on # of records in the file, time since the first records was placed in the file or size of the file, whichever comes first. The file is renamed using some sequence and a new file is created.
5. After the switch has taken place, an external script run asynchronously, with the filename as an argument.
6. The script will then run cpimport, or whatever else actually, to import the data in to ColumnStore.
7. The script should also support doing a distributed load somehow, assuming the the file that MaxScale creates is shared across the nodes.

This would allow a user to run standard INSERTs with bulk-load speed without code changes, admittedly, the load would be asynchronous, but the performance benefits should outweigh this many times over.



 Comments   
Comment by Andrew Hutchings (Inactive) [ 2017-03-07 ]

Such a feature would require changes to MaxScale, not ColumnStore.

There are also already plans to produce a toolset which would provide a solution to your use case.

Comment by Anders Karlsson [ 2017-03-07 ]

Sure, I was unsure of if this was a MaxScale or ColumnStore feature. I also though I flaged it is New Feature and not a Bug, sorry for that. In any case, I see some advantages to doing this through MaxScale and not through a "toolset", in particular as MaxScale already has read/write split capabilities. I can report it as a ColumnStore feature.

Comment by Dipti Joshi (Inactive) [ 2017-03-07 ]

Insert Stream plugin in MaxScale 2.1.0 does what you are requesting from MaxScale here.

Generated at Thu Feb 08 02:22:21 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.