[MCOL-1360] Spark connector performs all its work on the Driver Created: 2018-04-23  Updated: 2023-10-26  Resolved: 2018-04-24

Status: Closed
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: None
Fix Version/s: Icebox

Type: Bug Priority: Major
Reporter: Charles Coleman Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: None


 Description   

mariadb-columnstore-api/spark-connector/scala/src/main/scala/com/mariadb/columnstore/api/connector/ColumnStoreExporter.scala Line 25 performs a df.collect() which pulls all the data to the Spark Driver node which runs counter to having a distributed cluster, and forces the Driver node to have enough RAM to fit all the data into before sending any to the columnstore database.



 Comments   
Comment by Jens Röwekamp (Inactive) [ 2018-04-24 ]

Hello Charles, I saw this issues too late, and created a new one. (MCOL-1362)

Therefore, I'll close this one as duplicate.

Generated at Thu Feb 08 02:28:09 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.