[MCOL-808] mcsapi needs to use async write calls. Created: 2017-07-10  Updated: 2021-01-17  Resolved: 2021-01-17

Status: Closed
Project: MariaDB ColumnStore
Component/s: N/A
Affects Version/s: None
Fix Version/s: N/A

Type: New Feature Priority: Major
Reporter: Andrew Hutchings (Inactive) Assignee: Unassigned
Resolution: Won't Do Votes: 2
Labels: None

Issue Links:
Blocks
blocks MCOL-1287 Small batches of loads are not distri... Closed
Relates
relates to MCOL-1289 Python bulk load is slower than expected Closed
Epic Link: ColumnStore Performance Improvements
Sprint: 2018-13, 2018-14, 2018-15, 2018-16, 2018-17, 2018-18, 2018-19

 Description   

mcsapi has an underlying async IO library but we are currently blocking when sending/receiving packets of data. We should have an internal option to set whether to wait for return or not and use this for bulk write events. This will let us buffer write events and significantly increase performance of the library.



 Comments   
Comment by patrice [ 2017-09-28 ]

I did some test in C++ and it looks like every 200000 rows (2PM) , it is waiting for little while then get back at 100%cpu.

Writting about 3.6 million row with +200 columns took 22 minutes, compare to cpimport which takes only 6 minutes.

Also i am not sure about the way to use multiple thread if creating a new instance of BulkInsert is gonna put a lock on the table. Can there be multiple lock on the same table ?

thanks.

Comment by Andrew Hutchings (Inactive) [ 2017-09-28 ]

Hi Patrice,

That 200,000 row point will be the point it is flushing the network buffer and waiting for WriteEngine to finish writing the data. This is the part that will be significantly improved by this ticket.
In addition we are looking into a way of writing to multiple PMs in multiple threads (but that is a separate issue).

Comment by Todd Stoffel (Inactive) [ 2021-01-17 ]

Deprecated

Generated at Thu Feb 08 02:23:59 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.