[MDEV-16418] CONNECT Engine - Feature Request: Remote Script Execution Created: 2018-06-07  Updated: 2018-06-09  Resolved: 2018-06-09

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - Connect
Fix Version/s: N/A

Type: Task Priority: Minor
Reporter: Juan Telleria Assignee: Olivier Bertrand
Resolution: Won't Do Votes: 0
Labels: 10.4.1

Issue Links:
Problem/Incident
causes MDEV-16453 TRANSACT-SQL Port: Call External R & ... Open

 Description   

1. Objective

To give the possibility to call the execution of external:

Scripts directly from MariaDB, just as SQL Server, from 2017 Edition onward, allows to do so:

And PostgreSQL so does for Python:

The ultimate objetive is, for example, to trigger the execution of a remote Python/R Code for updating a predictive model in keras/h2o/tensorflow/... when new data is introduced into the database, or as part of a SQL procedure.

2. OPTION A: Through a Table

2.1 Implementation

One possible implementation could be as follows:

2.1.1. Enable External Script Execution: my.ini

[mysqld]
connect_enable_execute_external_script = {0|1}
connect_python_exe_dir = "C:/Users/J-tel/AppData/Local/Programs/Python/Python36-32/python.exe"
connect_r_exe_dir = "C:/Program Files/Microsoft/R Open/R-3.4.3/bin/Rscript.exe"
connect_max_exe_parallel = {1(Default)|Number}

2.1.2. Script Repository Creation: CREATE TABLE

For using the "CONNECT REMOTE_TABLE" Engine, the user will have to create an empty table as follows:

CREATE [OR REPLACE] [IF NOT EXISTS] TABLE tbl_name_connect
  ENGINE = CONNECT
  TABLE_TYPE = REMOTE_TABLE
  LANGUAGE = {R|Python}
  [PARALLEL={0|1}]
  [WITH VERSION CONTROL];

2.1.3. Call a Script: SELECT

There will be 3 main ways of calling a R / Python Script from MariaDB:

A. To Import an R / Python Script on the fly an call it from MariaDB Server:

SELECT
  LOAD_FILE('/path/.../script_file_name.txt'))
FROM
  tbl_name_connect;

B. To write a Script directly in the IDE and call it from MariaDB Server:

SELECT
  QUOTE("... [R or Python Script] ...")
FROM
  tbl_name_connect;

C. To call a previously saved R / Python Script into tbl_name:

SELECT
  (SELECT Col_Script FROM tbl_name_script_repository LIMIT 1;)
FROM
  tbl_name_connect;

2.2 Internals

Connect will:

  1. Send the Script Text to the .exe file.
  2. Retrieve a data.frame Table (Last Evaluated Object or return() Object), or retrieve any error / warnings and if the execution was complete.

An example R Script to execute in the RScript.exe, which makes use of an ODBC Connection, would be:

library(tidyverse)
library(DBI)
library(RMariaDB)
 
con <- dbConnect(RMariaDB::MariaDB(), group = "my-db")
 
dbWriteTable(con, "mtcars", mtcars)
dbReadTable(con, "mtcars")
dbDisconnect(con)

2.3. Security Issues

This new feature could imply security issues, so:

  • It shall be disabled by default.
  • Have its own USER security permission.
  • MariaDB to include its own trusted distributions of R and Python.

Also another solution would be to have to force the users to store the Python and R .exe directly in the MariaDB installation folders, so that greater care of such executable files can be taken in the system.

3. OPTION B: Through a Stored Procedure

This new feature could imply security issues, so:

  • It shall be disabled by default.
  • Have its own USER security permission.
  • MariaDB to include its own trusted distributions of R and Python.

Also another solution would be to have to force the users to store the Python and R .exe directly in the MariaDB installation folders, so that greater care of such executable files can be taken in the system.

4. Collaboration

I could ask the maintainer of the R "RMariaDB" package (Developed in C++ and R) in case he would be interested in collaborate in this project.



 Comments   
Comment by Olivier Bertrand [ 2018-06-07 ]

To: Juan
Can you elaborate by giving examples. For instance what it the script name for as its text is given? Could it be also a script file whose path would be given. SQL server requires the language to be specified, how about CONNECT?

Actually, I am not sure I understand what you're doing. When creating a table, in all DBMSs, the column specifications describe the returned result. This is what SQL Server does. So would all specifications of the script to execute be contained in options, not in columns?

Comment by Olivier Bertrand [ 2018-06-08 ]

As I previously said, in a create table, columns definition is meant to describe the returned result and cannot be used for other purpose.
Another problem is that all this is based on CONNECT executing an exe program. I don't know if it is possible and, even it is, it probably cannot be permitted for security reasons.
Perhaps doing this can be achieve using a procedure, as SQL server does, but I don't see how CONNECT could do it, for instance how to retrieve data from the script?

Comment by Juan Telleria [ 2018-06-09 ]

Can you elaborate by giving examples.

Done.

SQL server requires the language to be specified, how about CONNECT?

CONNECT would also need the language to be specified: 'R' or 'Python'

So would all specifications of the script to execute be contained in options, not in columns?

In my example, I've moved to my.ini options some configuration settings for greater sequrity.

As I previously said, in a create table, columns definition is meant to describe the returned result and cannot be used for other purpose.

In SQL Server, it is possible to return a Pandas DataFrame, and in R a data.frame, a tibble, and a data.table I guess. Also, it would be nice to return an empty table only with warnings / errors / etc. for remote execution without retrieving any results.

Another problem is that all this is based on CONNECT executing an exe program. I don't know if it is possible and, even it is, it probably cannot be permitted for security reasons.

It is also a concern for me, how does SQL Server solve it?

for instance how to retrieve data from the script?

We could request collaboration for R maybe to Kirill Müller, creator of package RMariaDB (https://cran.r-project.org/web/packages/RMariaDB/index.html), maybe he will be open for collaboration, or at least, give some advice.

Thank you

Comment by Olivier Bertrand [ 2018-06-09 ]

From your description, this looks like a new feature of MariaDB, as it is for SQL Server. However, why this would be implemented by a storage engine? This request should be directly addressed to MariaDB development. Unless there is something proving that only CONNECT can be used to implement it, I won't study how to add it to CONNECT, because it seems to be too a complicate thing to do, and mainly because CONNECT being now a GA product, adding this would regress it to a beta availability product.

Comment by Juan Telleria [ 2018-06-09 ]

Maybe it shall have its own storage engine (Alpha), and for greater sequrity, this storage engine shall come with its own distribution of R / Python for greater safety.

Mmmm... Thank you however, we at least tried

Comment by Juan Telleria [ 2018-06-09 ]

However, why this would be implemented by a storage engine? This request should be directly addressed to MariaDB development.

Have opened a new issue: MDEV-16453

Comment by Juan Telleria [ 2018-06-09 ]

Do you know who in the MariaDB Development Team could reconsider this implementation with another focus?

Could you re-assign the issue (if appropriate)?

Thank you.
Juan

Comment by Olivier Bertrand [ 2018-06-09 ]

Perhaps serg

Generated at Thu Feb 08 08:28:46 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.