Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
1.1.6
-
None
-
None
-
2018-19, 2018-20
Description
Instead of using collect() on the DataFrame to export on the Spark Driver toLocalIterator() should be used. It sequentially loads the DataFrame's partitions into the Spark driver and exports them. Therefore, the Spark Driver only needs as much memory for the export as the size of the largest DataFrame partition.
This is a hotfix until MCOL-1362 solves the problem in a more efficient manner.
Attachments
Issue Links
- relates to
-
MCOL-1362 Add a export function that utilizes (sequential) write from Spark workers
- Closed