[MCOL-1119] spark connector for publishing dataframe results using mcsapi to columnstore. Created: 2017-12-18  Updated: 2023-10-26  Resolved: 2018-04-02

Status: Closed
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: 1.1.2
Fix Version/s: 1.1.3

Type: New Feature Priority: Major
Reporter: David Thompson (Inactive) Assignee: David Thompson (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Attachments: Zip Archive spark-dev-build.zip     Zip Archive spark-dev.zip    
Sprint: 2017-25, 2018-01, 2018-02, 2018-03, 2018-04, 2018-05, 2018-06, 2018-07

 Description   

We should support a data adapter that allows bridging spark (both scala and pyspark) to columnstore. The intended use case is to support publishing ML results to column store as both results system of record and to enable easier consumption of that data with SQL and other data already stored in MariaDB.

Broadly speaking the goal is to take a DataFrame object and serialize that to a ColumnStore table using mcsapi. This requires creation of new code to bridge the spark world to mcsapi. The first implementation can make assumptions that an appropriate table exists but it would be valuable to either create or adapt some code to generate appropriate columnstore create table statements that could be run as stage 1 before writing the data.



 Comments   
Comment by Jens Röwekamp (Inactive) [ 2018-01-17 ]

Added a spark-connector that uses mcsapi to export a DataFrame to ColumnStore.

Supported are Python2/3 and Scala.

Automatic build and basic tests have been included in CMakeLists.txt and are executed successfully in Ubuntu 16.04.

Comment by Jens Röwekamp (Inactive) [ 2018-01-17 ]

Attached my docker test environment build file.

cd spark-dev
docker-compose up -d

One can access Jupyter on http://localhost:8888
Login password: mariadb

Comment by Andrew Hutchings (Inactive) [ 2018-01-18 ]

Looks great!

Moved to DT to test as he understands the requirements for this better than me.

Generated at Thu Feb 08 02:26:20 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.