[MDEV-9114] Bulk operations (Array binding) Created: 2015-11-10  Updated: 2018-02-14  Resolved: 2016-11-02

Status: Closed
Project: MariaDB Server
Component/s: OTHER
Fix Version/s: 10.2.3

Type: Task Priority: Blocker
Reporter: Oleksandr Byelkin Assignee: Michael Widenius
Resolution: Fixed Votes: 2
Labels: None

Issue Links:
Relates
relates to CONJ-389 implement bulk operation 10.2 Closed
relates to MDEV-5422 binary protocol: default values not s... Closed
relates to MDEV-11359 Implement IGNORE for bulk operation Closed
relates to MDEV-12471 BULK Command Closed
Sprint: 10.2.2-1, 10.2.2-2, 10.2.2-3, 5.5.51 & 10.2.2, 10.2.2-3, 10.2.2-4, 10.2.3-1, 10.2.3-2, 10.0.28

 Description   

Why?

To achieve better performance we need to support bulk operations: Instead of executing a prepared statement n times it shold be possible to bind an array of size N and execute the statement once.

*What needs to be changed ?

*Indicator variables

Since the length of an array element might change, an array element could be a NULL value or the column default should be used we need to introduce indicator variables. An Indicator can have the following values:

 STMT_INDICATOR_NONE
 STMT_INDICATOR_NULL     Null value
 STMT_INDICATOR_DEFAULT  use column default value

Indicator variable arrays will be stored in the MYSQL_BIND structure.

Additional definitions

#define STMT_BUFFER_UNSIGNED  32768  
#define STMT:BUFFER_INDICATOR 16384 (specifies if an indicator variable is used)

Protocol implementation

1   MARIADB_COM_STMT_BULK_EXECUTE
4   Statement id
1   flags (cursor)
4   Iteration count (=n)
 
1   Send types to server

Parameter:

if send_types_to_server && num_params > 0

 2 bytes for each parameter
    1st byte: parameter type
    2nd byte: unsigned flag = 32768 (is part of type)
              indicator variable set= 16384

Data-Array:

if (indicator variable set)

 1   indicator value:
               STMT_INDICATOR_NONE       no indicator
               STMT_INDICATOR_NULL       null value
               STMT_INDICATOR_DEFAULT    use default value
               
           n   value of each parameter

API changes

The function mysql_stmt_attr_set should support the following attributes:
STMT_ATTR_ARRAY_SIZE (specifies the size of bound array)
STMT_ATTR_BIND_TYPE (specifies the type of binding: column or row wise binding)
The API will now also support row wise binding, where the application defines the structure containing elements for each column. The application declares the size of the structure to the driver with the SQL_ATTR_BIND_TYPE statement attribute and binds the address of each member Thus, the connector can calculate the address of the data for a particular row and column as

Address = Bound Address + (Row Number * Structure Size)

In case the structure contains non fixed length members like char * (which are implicitly allocated) the address of this member has to be specified. BIND_IS_PTR flag in bind.flags needs to be set to indicate this special case.

Example:

 struct st_my_data {
   int id;
   char *buffer;
  
   ....
 };
 
 struct st_my_data data[20];
 
 bind[0].buffer= &data[0].id;
 bind[0].buffer_type= MYSQL_TYPE_LONG;
 
 bind[1].buffer= &data[0].buffer;
 bind[1].flags|= SIND_IS_PTR;
 bind[1].lengths= length_array;
[edit] MYSQL_BIND structure:
 
offset member of MYSQL_BIND structure is used for fetch only, so we build an union around:
 
 union {
   unsigned long offset;           /* offset position for char/binary fetch */
   struct {
     uchar *indicator;            /* array of indicator flags */
   };
 }
 ...

Indicator variable flags:

#define INDICATOR_NONE      0
#define INDICATOR_NULL      1    /* array element is a null value */
#define INDICATOR_DEFAULT   2    /* column default value should be used */
#define INDICATOR_NTS       4    /* Null terminated string: this will be used by client only
                                    to determine correct data length */

Checking server capabilities

We need to check if the server supports bulk operations, so we need an additional define

#define CLIENT_STMT_BULK_OPERATIONS,

Fixed string lengths

char *str_test[]= {"Keremidarski", NULL, "Widenius", NULL};
ulong lengths[]= {-1, -1, -1, -1};
uchar indicators[] = {0, STMT_INDICATOR_NULL, 0, 0, STMT_INDCATOR_DEFAULT};
ulong size= 4;
char *query= "INSERT INTO names VALUES (?)";
 
stmt= mysql_stmt_init(NULL);
 
mysql_stmt_prepare(stmt, query, strlen(query));
 
mysql_stmt_attr_set(stmt, STMT_ATTR_ARRAY_SIZE, &size);
 
/* Initialize bind structure */
bzero(&bind, sizeof(MYSQL_BIND));
 
bind[0].buffer_type= MYSQL_TYPE_STRING;
bind[0].buffer= str_test[0];
bind[0].length= lengths;
 
mysql_stmt_bind_param(stmt, bind);
mysql_stmt_execute(stmt);

Server Side Optimizations

Open tables only in the beginning of the batch (the array) processing. (Limitation: all data should be in one packet i.e. less then max allowed packet).
No need full cleanup for Items between executions for different bindings of the batch (array) (Especially for Item_field: tables are opened and can't change).
Especially popular statement types could be optimized like multi-value INSERT, i.e. short loop over all parameters after all check and preparation/optimization to execute (SELECT-like could use subquery execution mechanism).

Limitations

Bulk operations cannot be used in combination with mysql_stmt_send_long_data. See also Long Data and SQLSetPos and SQLBulkOperations



 Comments   
Comment by Diego Dupin [ 2016-06-29 ]

I know it may be late, but it would be good to reserve now a byte, for a future flag.
This flag would permit to ask for returning only one OKPacket.
Its like insert multiple rows using a single SQL INSERT statement. (INSERT INTO table VALUES ('a'), ('b'), ('c') ) => only one OKPacket is returned to driver.
Parsing those packet take a lot of time for batch.

In JDBC, batches (bulk operation) are done using a PreparedStatment object. You can indicate if resulting insert ids are needed or not.
Parsing all returning OKPacket take a lot of time (nearly half the time when local).

Reserve one byte now will permit to implement this in the future.

Comment by Oleksandr Byelkin [ 2016-07-02 ]

I've checked and found that array binding generate only one OK for whole command (independently of number of parameters) and it is also true for "not so optimized" for bulk execution commands (checked on UPDATE also)

Comment by Oleksandr Byelkin [ 2016-08-14 ]

Probably effort to tie default value during parameter binding was not correct. It is virtually impossible for non optimized command where we first set default, then bind...

Comment by Diego Dupin [ 2016-09-06 ]

Following discussion :
What would be great is that client indicate if he want all autoincrementIds or not.

There is actually one byte flag to indicate to use cursor (description indicate cursor, there won't be cursor here)
This byte can be used to indicate that client want all autoincrementIds, like 0x01 = send all auto increment ids.

Result could be an extended Okpacket replacing :

	int<lenenc>	last_insert_id	last insert-id

by

	if (query is COM_STMT_BULK_EXECUTE response ) {
	  int(lenenc)	number of insert id 
	  n int(lenenc) insert id
	} else {
	  int(lenenc)	last insert id
	}

This would permit to handle for exemple java prepare batch : Connection.prepareStatement(String sql,int autoGeneratedKeys) with autoGeneratedKeys can have Statement.RETURN_GENERATED_KEYS or Statement.NO_GENERATED_KEYS.
the fastest possible way

Generated at Thu Feb 08 07:32:14 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.