[MCOL-1177] SparkConnector runs out of memory for large datasets, JDBC can handle the datasets just fine Created: 2018-01-25 Updated: 2023-10-26 Resolved: 2018-01-26 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | None |
| Affects Version/s: | 1.1.2 |
| Fix Version/s: | 1.1.3 |
| Type: | Bug | Priority: | Major |
| Reporter: | Jens Röwekamp (Inactive) | Assignee: | Andrew Hutchings (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Description |
|
Both Scala and Python run out of memory while exporting large datasets to ColumnStore using their respective benchmark scripts. Contrary JDBC handles the same datasets well and writes them to ColumnStore without major increases in memory demand. Need to investigate the source of the huge demand of memory in the SparkConnector and reduce it if possible. |
| Comments |
| Comment by Jens Röwekamp (Inactive) [ 2018-01-25 ] |
|
Fixed 2 bugs in the benchmark result output to command line. Changed the amount of rows to write to 7000000 with regards to Memory issue was a configuration matter. Both Scala and PySpark weren't executed with enough heap allocation. Fixed that to 10GiB max in the execution scripts. Now the benchmark runs successfully. Before, in case of Scala only around 2.5GiB of max heap size were set and the dataframe to write to ColumnStore allocated around 2GiB alone in memory. |