[MDEV-22306] Support for LOAD DATA FROM/INTO STREAM Created: 2020-04-19  Updated: 2023-12-22  Resolved: 2022-04-27

Status: Closed
Project: MariaDB Server
Component/s: Parser, Protocol
Fix Version/s: N/A

Type: Task Priority: Major
Reporter: Georg Richter Assignee: Sergei Golubchik
Resolution: Won't Fix Votes: 0
Labels: None


 Description   

Since enabling local_imfile is considered to be unsecure (and disabled in recent versions), implement a new command
LOAD DATA FROM/TO STREAM.

Syntax:

1. FROM STREAM

LOAD DATA [LOW_PRIORITY | CONCURRENT] FROM STREAM
    [REPLACE | IGNORE]
    INTO TABLE tbl_name
    [CHARACTER SET charset_name]
    [{FIELDS | COLUMNS}
        [TERMINATED BY 'string']
        [[OPTIONALLY] ENCLOSED BY 'char']
        [ESCAPED BY 'char']
    ]
    [LINES
        [STARTING BY 'string']
        [TERMINATED BY 'string']
    ]
    [IGNORE number LINES]
    [(col_name_or_user_var,...)]
    [SET col_name = expr,...]

2. TO STREAM

LOAD DATA INTO STREAM
        [CHARACTER SET charset_name]
        [export_options]
SELECT ....
 
export_options:
    [{FIELDS | COLUMNS}
        [TERMINATED BY 'string']
        [[OPTIONALLY] ENCLOSED BY 'char']
        [ESCAPED BY 'char']
    ]
    [LINES
        [STARTING BY 'string']
        [TERMINATED BY 'string']
    ]

3. Client implementation

New options:

MARIADB_OPT_STREAM_IN
MARIADB_OPT_STREAM_OUT

Example for registering a stream:

FILE *fp_in= fopen("./foo.csv", "r");
 
rc= mysql_options(mysql, MARIADB_OPT_STREAM_IN, fp_in);
check_mysql_rc(rc, mysql);
 
rc= mysql_query("LOAD DATA FROM STREAM INTO TABLE t1 FIELDS TERMINATED BY ',', 
check_mysql_rc(rc, mysql);
 
/* tbd: should Connector/C close the stream or should it be handled by application?! */

4. Protocol implementation (tbd)

Server response package:
0xFB 0x00 0x01 for FROM STREAM
0xFB 0x00 0x02 for INTO_STREAM



 Comments   
Comment by Sergei Golubchik [ 2020-04-21 ]

a couple of thoughts:

  • should be SELECT ... INTO LOCAL OUTFILE not LOAD DATA INTO STREAM
  • LOAD DATA ... LOCAL INFILE can be used instead of LOAD DATA FROM STREAM, like this:

    FILE *fp_in= fopen("./foo.csv", "r");
    rc= mysql_options(mysql, MARIADB_OPT_STREAM_IN, fp_in);
    check_mysql_rc(rc, mysql);
    rc= mysql_query("LOAD DATA LOCAL INFILE INTO TABLE t1 FIELDS TERMINATED BY ',',  ...);
    check_mysql_rc(rc, mysql);
    

    and the connector will feed the data from the "stream" if set and otherwise will flatly reject all file requests from the server, for security reasons.

Comment by Diego Dupin [ 2020-04-24 ]

wlad point out a problem here: in term of protocol, load local data infile is the only command that need more than one 'ping-pong' after connection.
For asynchronous driver, this is bad, because driver must ensure state.

Example:

    conn.query("SELECT 1").then(res => { }).catch(err => { });
    conn.query("SELECT 2").then(res => { }).catch(err => { });

if load local infile is not permit, driver will send 2 COM_QUERY, then wait for the 2 results.
if load local infile is permitted, driver send a first COM_QUERY, wait for result (can be sure it is not a LOAD LOCAL INFILE command) and only then send second COM_QUERY

This feature is not possible when there is command that must have many exchanges.

It would be better to send data with one dedicated command COM_STREAM, and have an OK_Packet or ERR_packet as a result.

Comment by Sergei Golubchik [ 2020-08-16 ]

This "load from stream" solves it, because it forces the user to provide the data.

This could be a client-side setting, like conn.load_from_stream=1;

After that you can always send two COM_QUERY from your example together. If the first will happen to be LOAD DATA LOCAL, the user will get packet out of order no matter whether you wait or not. The correct code could look something like

    conn.query("LOAD DATA LOCAL INFILE ...").then(res => { }, data => { ..} ).catch(err => { });
    conn.query("SELECT 2").then(res => { }).catch(err => { });

that is, if the user has provided the data to insert, then you should expect the server asking for it. If the user haven't — you don't.

For this to work, as I mentioned above, you should never just send the filename that server asked for. Which is a good security practice too.

Comment by Georg Richter [ 2020-09-18 ]

Another idea:

COM_PREPARE: “LOAD DATA FROM STREAM ? INTO ……”
COM_STMT_LONG_DATA: stream
COM_STMT_EXECUTE

Generated at Thu Feb 08 09:13:45 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.