Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-1852

Spark Exporter uses collect() instead of toLocalIterator() on DataFrames to export and therefore uses too much memory on the Driver

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.1.6
    • Fix Version/s: 1.1.7, 1.2.1
    • Component/s: mcsapi
    • Labels:
      None
    • Sprint:
      2018-19, 2018-20

      Description

      Instead of using collect() on the DataFrame to export on the Spark Driver toLocalIterator() should be used. It sequentially loads the DataFrame's partitions into the Spark driver and exports them. Therefore, the Spark Driver only needs as much memory for the export as the size of the largest DataFrame partition.

      This is a hotfix until MCOL-1362 solves the problem in a more efficient manner.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              elena.kotsinova Elena Kotsinova
              Reporter:
              jens.rowekamp Jens Röwekamp (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.