[MCOL-1852] Spark Exporter uses collect() instead of toLocalIterator() on DataFrames to export and therefore uses too much memory on the Driver - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 1.1.6
Fix Version/s: 1.1.7, 1.2.1
Component/s: None
Labels:
None

Sprint:
2018-19, 2018-20

Description

Instead of using collect() on the DataFrame to export on the Spark Driver toLocalIterator() should be used. It sequentially loads the DataFrame's partitions into the Spark driver and exports them. Therefore, the Spark Driver only needs as much memory for the export as the size of the largest DataFrame partition.

This is a hotfix until ~~MCOL-1362~~ solves the problem in a more efficient manner.

Attachments

Issue Links

relates to

MCOL-1362 Add a export function that utilizes (sequential) write from Spark workers

Closed

Activity

People

Assignee:: Elena Kotsinova (Inactive)

Reporter:: Jens Röwekamp (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 2018-11-02 21:30

Updated:: 2023-10-26 13:17

Resolved:: 2018-11-10 23:14

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.