[MCOL-1360] Spark connector performs all its work on the Driver Created: 2018-04-23 Updated: 2023-10-26 Resolved: 2018-04-24 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Icebox |
| Type: | Bug | Priority: | Major |
| Reporter: | Charles Coleman | Assignee: | Unassigned |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Description |
|
mariadb-columnstore-api/spark-connector/scala/src/main/scala/com/mariadb/columnstore/api/connector/ColumnStoreExporter.scala Line 25 performs a df.collect() which pulls all the data to the Spark Driver node which runs counter to having a distributed cluster, and forces the Driver node to have enough RAM to fit all the data into before sending any to the columnstore database. |
| Comments |
| Comment by Jens Röwekamp (Inactive) [ 2018-04-24 ] |
|
Hello Charles, I saw this issues too late, and created a new one. ( Therefore, I'll close this one as duplicate. |