Details
-
New Feature
-
Status: Closed (View Workflow)
-
Minor
-
Resolution: Won't Fix
-
None
-
None
Description
One advantage of ColumnStore is that standard DML is also allowed, in addition to Bulk Loading. The issue is that is this is still very slow. Many application developers probably appreciate the ability to run DML instead of using bulk loading facilities, data might be streamed or some other reason that makes classic bulk loading cumbersome. I thuse came up with the idea to use MaxScale as a converter of INSERT to bulk loading. The way I see it working is this:
1. All SQL to CS pass though MaxScale.
2. Any SELECT, UPDATE or DELETE are passed directly to ColumnStore.
3. INSERTs are passed to a module that writes data in the insert to a file, somehow named after the table being inserted to, and in a cpimport friendly format so each table has a separate file.
4. At regular intervals, based on # of records in the file, time since the first records was placed in the file or size of the file, whichever comes first. The file is renamed using some sequence and a new file is created.
5. After the switch has taken place, an external script run asynchronously, with the filename as an argument.
6. The script will then run cpimport, or whatever else actually, to import the data in to ColumnStore.
7. The script should also support doing a distributed load somehow, assuming the the file that MaxScale creates is shared across the nodes.
This would allow a user to run standard INSERTs with bulk-load speed without code changes, admittedly, the load would be asynchronous, but the performance benefits should outweigh this many times over.