Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Duplicate
-
None
-
None
-
None
Description
mariadb-columnstore-api/spark-connector/scala/src/main/scala/com/mariadb/columnstore/api/connector/ColumnStoreExporter.scala Line 25 performs a df.collect() which pulls all the data to the Spark Driver node which runs counter to having a distributed cluster, and forces the Driver node to have enough RAM to fit all the data into before sending any to the columnstore database.