Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-1852

Spark Exporter uses collect() instead of toLocalIterator() on DataFrames to export and therefore uses too much memory on the Driver

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 1.1.6
    • 1.1.7, 1.2.1
    • None
    • None
    • 2018-19, 2018-20

    Description

      Instead of using collect() on the DataFrame to export on the Spark Driver toLocalIterator() should be used. It sequentially loads the DataFrame's partitions into the Spark driver and exports them. Therefore, the Spark Driver only needs as much memory for the export as the size of the largest DataFrame partition.

      This is a hotfix until MCOL-1362 solves the problem in a more efficient manner.

      Attachments

        Issue Links

          Activity

            People

              elena.kotsinova Elena Kotsinova (Inactive)
              jens.rowekamp Jens Röwekamp (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.