Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-16418

CONNECT Engine - Feature Request: Remote Script Execution

Details

    Description

      1. Objective

      To give the possibility to call the execution of external:

      Scripts directly from MariaDB, just as SQL Server, from 2017 Edition onward, allows to do so:

      And PostgreSQL so does for Python:

      The ultimate objetive is, for example, to trigger the execution of a remote Python/R Code for updating a predictive model in keras/h2o/tensorflow/... when new data is introduced into the database, or as part of a SQL procedure.

      2. OPTION A: Through a Table

      2.1 Implementation

      One possible implementation could be as follows:

      2.1.1. Enable External Script Execution: my.ini

      [mysqld]
      connect_enable_execute_external_script = {0|1}
      connect_python_exe_dir = "C:/Users/J-tel/AppData/Local/Programs/Python/Python36-32/python.exe"
      connect_r_exe_dir = "C:/Program Files/Microsoft/R Open/R-3.4.3/bin/Rscript.exe"
      connect_max_exe_parallel = {1(Default)|Number}
      

      2.1.2. Script Repository Creation: CREATE TABLE

      For using the "CONNECT REMOTE_TABLE" Engine, the user will have to create an empty table as follows:

      CREATE [OR REPLACE] [IF NOT EXISTS] TABLE tbl_name_connect
        ENGINE = CONNECT
        TABLE_TYPE = REMOTE_TABLE
        LANGUAGE = {R|Python}
        [PARALLEL={0|1}]
        [WITH VERSION CONTROL];
      

      2.1.3. Call a Script: SELECT

      There will be 3 main ways of calling a R / Python Script from MariaDB:

      A. To Import an R / Python Script on the fly an call it from MariaDB Server:

      SELECT
        LOAD_FILE('/path/.../script_file_name.txt'))
      FROM
        tbl_name_connect;
      

      B. To write a Script directly in the IDE and call it from MariaDB Server:

      SELECT
        QUOTE("... [R or Python Script] ...")
      FROM
        tbl_name_connect;
      

      C. To call a previously saved R / Python Script into tbl_name:

      SELECT
        (SELECT Col_Script FROM tbl_name_script_repository LIMIT 1;)
      FROM
        tbl_name_connect;
      

      2.2 Internals

      Connect will:

      1. Send the Script Text to the .exe file.
      2. Retrieve a data.frame Table (Last Evaluated Object or return() Object), or retrieve any error / warnings and if the execution was complete.

      An example R Script to execute in the RScript.exe, which makes use of an ODBC Connection, would be:

      library(tidyverse)
      library(DBI)
      library(RMariaDB)
       
      con <- dbConnect(RMariaDB::MariaDB(), group = "my-db")
       
      dbWriteTable(con, "mtcars", mtcars)
      dbReadTable(con, "mtcars")
      dbDisconnect(con)
      

      2.3. Security Issues

      This new feature could imply security issues, so:

      • It shall be disabled by default.
      • Have its own USER security permission.
      • MariaDB to include its own trusted distributions of R and Python.

      Also another solution would be to have to force the users to store the Python and R .exe directly in the MariaDB installation folders, so that greater care of such executable files can be taken in the system.

      3. OPTION B: Through a Stored Procedure

      This new feature could imply security issues, so:

      • It shall be disabled by default.
      • Have its own USER security permission.
      • MariaDB to include its own trusted distributions of R and Python.

      Also another solution would be to have to force the users to store the Python and R .exe directly in the MariaDB installation folders, so that greater care of such executable files can be taken in the system.

      4. Collaboration

      I could ask the maintainer of the R "RMariaDB" package (Developed in C++ and R) in case he would be interested in collaborate in this project.

      Attachments

        Issue Links

          Activity

            To: Juan
            Can you elaborate by giving examples. For instance what it the script name for as its text is given? Could it be also a script file whose path would be given. SQL server requires the language to be specified, how about CONNECT?

            Actually, I am not sure I understand what you're doing. When creating a table, in all DBMSs, the column specifications describe the returned result. This is what SQL Server does. So would all specifications of the script to execute be contained in options, not in columns?

            bertrandop Olivier Bertrand added a comment - To: Juan Can you elaborate by giving examples. For instance what it the script name for as its text is given? Could it be also a script file whose path would be given. SQL server requires the language to be specified, how about CONNECT? Actually, I am not sure I understand what you're doing. When creating a table, in all DBMSs, the column specifications describe the returned result. This is what SQL Server does. So would all specifications of the script to execute be contained in options, not in columns?

            As I previously said, in a create table, columns definition is meant to describe the returned result and cannot be used for other purpose.
            Another problem is that all this is based on CONNECT executing an exe program. I don't know if it is possible and, even it is, it probably cannot be permitted for security reasons.
            Perhaps doing this can be achieve using a procedure, as SQL server does, but I don't see how CONNECT could do it, for instance how to retrieve data from the script?

            bertrandop Olivier Bertrand added a comment - As I previously said, in a create table, columns definition is meant to describe the returned result and cannot be used for other purpose. Another problem is that all this is based on CONNECT executing an exe program. I don't know if it is possible and, even it is, it probably cannot be permitted for security reasons. Perhaps doing this can be achieve using a procedure, as SQL server does, but I don't see how CONNECT could do it, for instance how to retrieve data from the script?
            Juan Juan Telleria added a comment - - edited

            Can you elaborate by giving examples.

            Done.

            SQL server requires the language to be specified, how about CONNECT?

            CONNECT would also need the language to be specified: 'R' or 'Python'

            So would all specifications of the script to execute be contained in options, not in columns?

            In my example, I've moved to my.ini options some configuration settings for greater sequrity.

            As I previously said, in a create table, columns definition is meant to describe the returned result and cannot be used for other purpose.

            In SQL Server, it is possible to return a Pandas DataFrame, and in R a data.frame, a tibble, and a data.table I guess. Also, it would be nice to return an empty table only with warnings / errors / etc. for remote execution without retrieving any results.

            Another problem is that all this is based on CONNECT executing an exe program. I don't know if it is possible and, even it is, it probably cannot be permitted for security reasons.

            It is also a concern for me, how does SQL Server solve it?

            for instance how to retrieve data from the script?

            We could request collaboration for R maybe to Kirill Müller, creator of package RMariaDB (https://cran.r-project.org/web/packages/RMariaDB/index.html), maybe he will be open for collaboration, or at least, give some advice.

            Thank you

            Juan Juan Telleria added a comment - - edited Can you elaborate by giving examples. Done. SQL server requires the language to be specified, how about CONNECT? CONNECT would also need the language to be specified: 'R' or 'Python' So would all specifications of the script to execute be contained in options, not in columns? In my example, I've moved to my.ini options some configuration settings for greater sequrity. As I previously said, in a create table, columns definition is meant to describe the returned result and cannot be used for other purpose. In SQL Server, it is possible to return a Pandas DataFrame, and in R a data.frame, a tibble, and a data.table I guess. Also, it would be nice to return an empty table only with warnings / errors / etc. for remote execution without retrieving any results. Another problem is that all this is based on CONNECT executing an exe program. I don't know if it is possible and, even it is, it probably cannot be permitted for security reasons. It is also a concern for me, how does SQL Server solve it? for instance how to retrieve data from the script? We could request collaboration for R maybe to Kirill Müller, creator of package RMariaDB ( https://cran.r-project.org/web/packages/RMariaDB/index.html ), maybe he will be open for collaboration, or at least, give some advice. Thank you

            From your description, this looks like a new feature of MariaDB, as it is for SQL Server. However, why this would be implemented by a storage engine? This request should be directly addressed to MariaDB development. Unless there is something proving that only CONNECT can be used to implement it, I won't study how to add it to CONNECT, because it seems to be too a complicate thing to do, and mainly because CONNECT being now a GA product, adding this would regress it to a beta availability product.

            bertrandop Olivier Bertrand added a comment - From your description, this looks like a new feature of MariaDB, as it is for SQL Server. However, why this would be implemented by a storage engine? This request should be directly addressed to MariaDB development. Unless there is something proving that only CONNECT can be used to implement it, I won't study how to add it to CONNECT, because it seems to be too a complicate thing to do, and mainly because CONNECT being now a GA product, adding this would regress it to a beta availability product.
            Juan Juan Telleria added a comment -

            Maybe it shall have its own storage engine (Alpha), and for greater sequrity, this storage engine shall come with its own distribution of R / Python for greater safety.

            Mmmm... Thank you however, we at least tried

            Juan Juan Telleria added a comment - Maybe it shall have its own storage engine (Alpha), and for greater sequrity, this storage engine shall come with its own distribution of R / Python for greater safety. Mmmm... Thank you however, we at least tried
            Juan Juan Telleria added a comment - - edited

            However, why this would be implemented by a storage engine? This request should be directly addressed to MariaDB development.

            Have opened a new issue: MDEV-16453

            Juan Juan Telleria added a comment - - edited However, why this would be implemented by a storage engine? This request should be directly addressed to MariaDB development. Have opened a new issue: MDEV-16453
            Juan Juan Telleria added a comment -

            Do you know who in the MariaDB Development Team could reconsider this implementation with another focus?

            Could you re-assign the issue (if appropriate)?

            Thank you.
            Juan

            Juan Juan Telleria added a comment - Do you know who in the MariaDB Development Team could reconsider this implementation with another focus? Could you re-assign the issue (if appropriate)? Thank you. Juan

            Perhaps serg

            bertrandop Olivier Bertrand added a comment - Perhaps serg

            People

              bertrandop Olivier Bertrand
              Juan Juan Telleria
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.