[MCOL-1852] Spark Exporter uses collect() instead of toLocalIterator() on DataFrames to export and therefore uses too much memory on the Driver Created: 2018-11-02  Updated: 2023-10-26  Resolved: 2018-11-10

Status: Closed
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: 1.1.6
Fix Version/s: 1.1.7, 1.2.1

Type: Bug Priority: Major
Reporter: Jens Röwekamp (Inactive) Assignee: Elena Kotsinova (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MCOL-1362 Add a export function that utilizes (... Closed
Sprint: 2018-19, 2018-20

 Description   

Instead of using collect() on the DataFrame to export on the Spark Driver toLocalIterator() should be used. It sequentially loads the DataFrame's partitions into the Spark driver and exports them. Therefore, the Spark Driver only needs as much memory for the export as the size of the largest DataFrame partition.

This is a hotfix until MCOL-1362 solves the problem in a more efficient manner.



 Comments   
Comment by Jens Röwekamp (Inactive) [ 2018-11-02 ]

Switched from collect() to toLocalIterator() to reduce the memory consumption on the Spark Driver.

For QA:

  • Execute the regression test suite on CentOS 7, Windows and at least one Ubuntu/Debian operating system
Generated at Thu Feb 08 02:31:52 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.