[MCOL-1852] Spark Exporter uses collect() instead of toLocalIterator() on DataFrames to export and therefore uses too much memory on the Driver Created: 2018-11-02 Updated: 2023-10-26 Resolved: 2018-11-10 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | None |
| Affects Version/s: | 1.1.6 |
| Fix Version/s: | 1.1.7, 1.2.1 |
| Type: | Bug | Priority: | Major |
| Reporter: | Jens Röwekamp (Inactive) | Assignee: | Elena Kotsinova (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Sprint: | 2018-19, 2018-20 | ||||||||
| Description |
|
Instead of using collect() on the DataFrame to export on the Spark Driver toLocalIterator() should be used. It sequentially loads the DataFrame's partitions into the Spark driver and exports them. Therefore, the Spark Driver only needs as much memory for the export as the size of the largest DataFrame partition. This is a hotfix until |
| Comments |
| Comment by Jens Röwekamp (Inactive) [ 2018-11-02 ] |
|
Switched from collect() to toLocalIterator() to reduce the memory consumption on the Spark Driver. For QA:
|