[MCOL-1289] Python bulk load is slower than expected Created: 2018-03-21 Updated: 2023-10-26 Resolved: 2021-07-08 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | N/A, 1.4.5 |
| Type: | Bug | Priority: | Major |
| Reporter: | Geoff Montee (Inactive) | Assignee: | Unassigned |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Sprint: | 2018-07, 2018-08, 2018-09, 2018-10, 2018-11, 2018-12, 2018-13, 2018-14, 2018-15, 2018-16, 2018-17, 2018-18, 2018-19, 2018-20, 2018-21, 2019-01, 2019-02, 2019-03, 2019-04 | ||||||||
| Description |
|
A user has reported that bulk loading data with the Python API is slower than expected. He said that neither the network nor the WriteEngine seem to be the bottleneck, so he suspects that performance can be improved. One suggestion provided was to have some functions in the Python API where he could pass the data already formatted for the table, and then have all the setColumn calls and data casting performed in the underlying C API instead of Python. He suspects that this would probably help to speed up the load process. |
| Comments |
| Comment by Andrew Hutchings (Inactive) [ 2018-03-21 ] | ||||||||
|
| ||||||||
| Comment by Andrew Hutchings (Inactive) [ 2018-03-22 ] | ||||||||
|
This has been assigned and set to a sprint to be investigated where the bottleneck is before we look into where we can make improvements. | ||||||||
| Comment by Jens Röwekamp (Inactive) [ 2018-03-24 ] | ||||||||
|
Benchmarked on my laptop local machine CentOS 7 CS 1.1.3-2, remote machine Debian 9 CS 1.1.3-1 | ||||||||
| Comment by Jens Röwekamp (Inactive) [ 2018-04-03 ] | ||||||||
|
One reason why Python 3 is slower than Python 2 is that _pymcsapi.ColumnStoreBulkInsert_setColumn, calling pymcsapi.ColumnStoreBulkInsert._setattr_ which triggers _swig_setattr() _pymcsapi compiled for Python 2 doesn't show this behaviour. The optional Swig flag -py3 didn't help either. Callgrind logs added to the log files. ------------------------------------------------------------------------------------------- A general injection increase from about 34% for Python 2 and 29% for Python 3 was observed when renaming overloaded functions ColumnStoreBulkInsert.setColumn() into unique functions (e.g. ColumnStoreBulkInsert.setColumn_int32()). Verified on the example of setColumn_int32(). |