|
Internally this does the following:
- MariaDB tells ColumnStore to start a bulk write operation (INSERT_SELECT)
- ColumnStore Engine spins up an instance of cpimport in piped mode 1.
- MariaDB sends the storage engine row by row the data (blocking calls)
- The ColumnStore engine converts this binary format into text CSV
- The CSV row is piped into cpimport
Now, the problem is that the processing of binary to CSV format blocks the engine from getting the next row from MariaDB (causing the performance difference). This could be solved by having a FIFO buffer which the write_row call stores the binary row data into and a thread does the CSV conversion and pipes into cpimport.
A change such as this should probably wait until we have a cpimport API so we can clean up this code. Ideally the API would support a binary format as well as CSV so that the double conversion (binary->CSV->binary) isn't required.
In the mean time there is a mode which uses direct bulk insert instead of cpimport. I've not tried this, it might be slightly faster (probably not) but there might be dragons. This is the system variable to toggle it: infinidb_use_import_for_batchinsert
|