|
This is because the batch size is set to 100,000 rows. What would be the expected behaviour here?
|
|
I was under the impression that cpimport balance the data on the PMs automatically, even if there are multiple small imports. So the mcsapi would have the same behavior, in facts i understand if the 8 million row extent is not full to keep pushing to the same PM, but then it will create the next extent on the next PM , roundrobin like. That way after multiple small batch load, the data will still be balanced.
|
|
No, the API currently round robins every 100,000 rows. Until MCOL-808 is implemented then it would be a performance hit to do otherwise. We are also looking into having an instance of the API bulk insert per PM which would let you control which data goes into which PM.
I'll change this to a feature request for now and we will review it after MCOL-808.
|
|
LinuxJedi While API round robins 100,000 rows by default, I thought the batch size could be changed programmatically by API user. - Is that not correct?
|
|
dshjoshi The API is there to do it, but it is not hooked up yet. It would be trivial to hook it up but with the caveat that I'm not sure what would happen if you set it too high or too low (> 8 million will very likely do bad things). Since I have not had time to test it this has been left disabled in 1.1 so far and documented as such.
|