[CONCPP-99] Invalid queries generated when rewriteBatchedStatements is used in batch mode Created: 2022-04-26  Updated: 2022-10-06  Resolved: 2022-07-24

Status: Closed
Project: MariaDB Connector/C++
Component/s: General
Affects Version/s: 1.0.1
Fix Version/s: 1.0.2, 1.1.2

Type: Task Priority: Major
Reporter: M. Tarek Spotts Assignee: Lawrin Novitsky
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File mariadb_nodejs_batching_demo.js    
Issue Links:
Relates
relates to CONCPP-106 Add support of MariaDB prepared state... Closed

 Description   

When enabling the `rewriteBatchedStatements` option in the C++ connector, the resulting queries are invalid, even for a trivial table.

Given a simple INSERT operation such as:

INSERT INTO tester (id, someval) VALUES (?, ?)

The resulting batch operation will fail with an error such as this:

You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'INSERT INTO tester (id, someval) VALUES1, 'someval_1'),INSERT INTO tester (id...' at line 1
Query is: INSERT INTO tester (id, someval) VALUES (?, ?)
java thread: 139933589112640

At the very least, the statements are missing parentheses before the first value in the VALUES clause (should read `VALUES (1`, not `VALUES1`.)

Below is a minimal example to reproduce the error.

Sample app code

#include <iostream>
#include <string>
#include <mariadb/conncpp.hpp>
 
const int QUANTITY = 10;
int main(int argc, char *argv[])
{
    try {
        sql::Driver* driver = sql::mariadb::get_driver_instance();
        sql::SQLString url("jdbc:mariadb://some-mariadb/somedb");
        sql::SQLString timeoutLength = "60000";
        sql::Properties props(
            {
                {"user", "example-user"},
                {"password", "my_cool_secret"},
                {"socketTimeout", timeoutLength},
                {"rewriteBatchedStatements", "true"},
                {"useBulkStmts", "true"},
                {"useBatchMultiSend", "true"},
                {"allowMultiQueries", "true"}
            }
        );
        std::unique_ptr<sql::Connection> conn(driver->connect(url, props));
        std::unique_ptr<sql::Statement> s(conn->createStatement());
        s->execute("DROP TABLE IF EXISTS tester");
        s->execute("CREATE TABLE tester (id int not null, someval varchar(255) not null, primary key(id))");   
 
        std::unique_ptr<sql::PreparedStatement> stmnt(
            conn->prepareStatement("INSERT INTO tester (id, someval) VALUES (?, ?)")
        );
 
        for (int i = 1; i <= QUANTITY; ++i)
        {
            std::string someval = "someval_" + std::to_string(i);
            stmnt->setInt(1, i);
            stmnt->setString(2, someval);
            stmnt->addBatch();
        }
 
        stmnt->executeBatch();
        return 0;
    } 
    catch(sql::SQLException& e) {
        std::cerr << "Error Connecting to MariaDB Platform: " 
        << e.what() << std::endl;
        // Exit (Failed)
        return 1;        
    }    
    return 0;
}

The app can be compiled the same way as the sample task app, e.g.,

g++ -o test main.cpp -std=c++11 -lmariadbcpp

App output

Error Connecting to MariaDB Platform: (conn=16) You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'INSERT INTO tester (id, someval) VALUES1, 'someval_1'),INSERT INTO tester (id...' at line 1
Query is: INSERT INTO tester (id, someval) VALUES (?, ?)
java thread: 139933589112640

Expected output
App should run and successfully insert 10 rows to the table without errors.

General query log

220426  9:40:19	    
16 Connect	example-user@172.28.0.2 on somedb using TCP/IP
16 Query	set autocommit=1, session_track_schema=1, session_track_system_variables='auto_increment_increment' , sql_mode = concat(@@sql_mode,',STRICT_TRANS_TABLES')
16 Query	SELECT @@max_allowed_packet,@@system_time_zone,@@time_zone,@@auto_increment_increment
16 Query	DROP TABLE IF EXISTS tester
16 Query	CREATE TABLE tester (id int not null, someval varchar(255) not null, primary key(id))
16 Query	(INSERT INTO tester (id, someval) VALUES1, 'someval_1'),INSERT INTO tester (id, someval) VALUES2, 'someval_2'),INSERT INTO tester (id, someval) VALUES3, 'someval_3'),INSERT INTO tester (id, someval) VALUES4, 'someval_4'),INSERT INTO tester (id, someval) VALUES5, 'someval_5'),INSERT INTO tester (id, someval) VALUES6, 'someval_6'),INSERT INTO tester (id, someval) VALUES7, 'someval_7'),INSERT INTO tester (id, someval) VALUES8, 'someval_8'),INSERT INTO tester (id, someval) VALUES9, 'someval_9'),INSERT INTO tester (id, someval) VALUES10, 'someval_10')
16 Quit	

Workarounds tried
I tried commenting out/removing the rewriteBatchedStatements line and running it in 'normal' batched mode. In that mode, the queries will work, but they do not appear to actually be batched. The general query log in that case just shows regular queries being executed ten times over (notice the repeat 'Query' event in the general log below.)

220426 10:02:18	    24 Connect	example-user@172.28.0.2 on somedb using TCP/IP
		    24 Query	set autocommit=1, session_track_schema=1, sql_mode = concat(@@sql_mode,',STRICT_TRANS_TABLES')
		    24 Query	SELECT @@max_allowed_packet,@@system_time_zone,@@time_zone,@@auto_increment_increment
		    24 Query	DROP TABLE IF EXISTS tester
		    24 Query	CREATE TABLE tester (id int not null, someval varchar(255) not null, primary key(id))
		    24 Query	INSERT INTO tester (id, someval) VALUES (1, 'someval_1')
		    24 Query	INSERT INTO tester (id, someval) VALUES (2, 'someval_2')
		    24 Query	INSERT INTO tester (id, someval) VALUES (3, 'someval_3')
		    24 Query	INSERT INTO tester (id, someval) VALUES (4, 'someval_4')
		    24 Query	INSERT INTO tester (id, someval) VALUES (5, 'someval_5')
		    24 Query	INSERT INTO tester (id, someval) VALUES (6, 'someval_6')
		    24 Query	INSERT INTO tester (id, someval) VALUES (7, 'someval_7')
		    24 Query	INSERT INTO tester (id, someval) VALUES (8, 'someval_8')
		    24 Query	INSERT INTO tester (id, someval) VALUES (9, 'someval_9')
		    24 Query	INSERT INTO tester (id, someval) VALUES (10, 'someval_10')
		    24 Quit	

We then tried enabling useServerPrepStmts (with rewriteBatchedStatements still false). This was to try to force the usage of server side prepared statements to see if this would remove the unnecessary round trips. The queries themselves will succeed in this mode as well, but once again, we see what appears to be continuous round-trips to the DBMS instead of a true batched operation:

		    34 Query	DROP TABLE IF EXISTS tester
		    34 Query	CREATE TABLE tester (id int not null, someval varchar(255) not null, primary key(id))
		    34 Prepare	INSERT INTO tester (id, someval) VALUES (?, ?)
		    34 Execute	INSERT INTO tester (id, someval) VALUES (1, 'someval_1')
		    34 Execute	INSERT INTO tester (id, someval) VALUES (2, 'someval_2')
		    34 Execute	INSERT INTO tester (id, someval) VALUES (3, 'someval_3')
		    34 Execute	INSERT INTO tester (id, someval) VALUES (4, 'someval_4')
		    34 Execute	INSERT INTO tester (id, someval) VALUES (5, 'someval_5')
		    34 Execute	INSERT INTO tester (id, someval) VALUES (6, 'someval_6')
		    34 Execute	INSERT INTO tester (id, someval) VALUES (7, 'someval_7')
		    34 Execute	INSERT INTO tester (id, someval) VALUES (8, 'someval_8')
		    34 Execute	INSERT INTO tester (id, someval) VALUES (9, 'someval_9')
		    34 Execute	INSERT INTO tester (id, someval) VALUES (10, 'someval_10')
		    34 Close stmt	
		    34 Quit

Essentially, with useServerPrepStmts mode the repetitive Query log entries are just replaced with repetitive Execute calls on the statement handler. At the end of the day it still appears to be ten separate round trips to the DBMS to perform 10 trivial row inserts.

Expected server side log pattern
Here is what the comparable event looks like when run from the Node.JS connector.

220426 10:41:02	    36 Connect	example-user@172.28.0.2 on somedb using TCP/IP
		    36 Prepare	DROP TABLE IF EXISTS tester
		    36 Execute	DROP TABLE IF EXISTS tester
		    36 Prepare	CREATE TABLE tester (id int not null, someval varchar(255) not null, primary key(id))
		    36 Execute	CREATE TABLE tester (id int not null, someval varchar(255) not null, primary key(id))
		    36 Prepare	INSERT INTO tester (id, someval) VALUES (?, ?)
		    36 Execute	INSERT INTO tester (id, someval) VALUES (?, ?)

Notice in the NodeJS version that only one Execute operation is done on the INSERT statement. This demonstrates a benefit of batching, e.g., fewer round trips to the server, since the ten trivial inserts are all done in one call.

I am including the sample code used to run the Node.JS version as an attachment, since NodeJS code is not directly germane to this bug report about the C++ connector; but running such code may still be helpful in comparing the General Log output of the two implementations.

Server info
Our DB is running mariadb:10.7.3.



 Comments   
Comment by Lawrin Novitsky [ 2022-05-20 ]

Thank you for your report. Apparently this was not tested, but on other hand I don't think we advertised rewriteBatchedStatements as a supported. Yeah, I wrote in README, that list of options is (undefined) subset of C/J options, and for the table of C/C++ supported options I made remark, that it's not complete. Probably I have to re-write it to be more specific.
I will also think about whether it should be fixed/added in the 1.0 version, or to new 1.1 only

Comment by M. Tarek Spotts [ 2022-05-20 ]

Thank you for the reply. There were several things that had caused me to think this was supported.

In the readme file it said "Not complete list of supported options - mostly newly added or with new aliases...". This led me to think that there may be other options that are available but simply are not listed in the docs. When I saw that, I figured the best place to check whether the option was supported was to look in the source. In doing so, I found several things that led me to think that executing batches was tested and supported:

  • test/unit/classes/connectionmetadata.cpp - At line 1390, an assert is done to ensure that

    ASSERT_EQUALS(true, dbmeta->supportsBatchUpdates());

  • test/unit/classes/preparedstatement.cpp - Seems like the test uses prepared statements + batching to insert rows into a table. A subsequent query is done after batch operations to ensure that the resulting rows are what was expected (e.g., that rows inserted/updated through the batch actually succeeded.)
  • test/unit/classes/statement.cpp - Similar to previous line, but, regular statements are used instead of prepared statements.

I am not sure if these tests actually pass (or if they do, how they are passing), but there are ASSERTs in place and a series of tests that make it look like batching is actually being tested. All of this led to a lot of confusion for me, because, with those types of tests present in the codebase at least, it does make it seem that inserting or updating rows using either a regular or prepared statement is supposed to work.

In any case, thank you for looking into this.

Comment by Lawrin Novitsky [ 2022-05-20 ]

The tests are about batch feature(addBatch/executeBatch), that is part of JDBC. rewriteBatchedStatements is option controlling the way in which bacth is executed.
Yes, I need to change the README =) The current wording can make false promises. Here in documentation https://mariadb.com/docs/connect/programming-languages/cpp/] Here the list of supported options matches the table in the README

Comment by M. Tarek Spotts [ 2022-05-20 ]

Thank you for your answers.

In my 'workarounds' section I did try just using the batch commands, but without rewriteBatchedStatements. Those queries did succeed, but in the General Log it appeared that the driver was really just sending the queries one by one instead of saving some round trips.

The behaviour I saw in the NodeJS driver is that it did not rewrite the queries, but its batch implementation did only one statement prepare and one statement exec was done even though we were inserting 10 rows. I think that the payload for multiple operations is sent alongside the prepared statement as long as the total payload would be less than max_packet_size.

Do we know what the "point" is of the `addBatch`/`executeBatch` is in its current state in the C++ driver? Is it just to be compliant with the JDBC interface? Or is the current batching capability going to save some network round-trips and maybe I was just misunderstanding the General Log results?

Comment by Lawrin Novitsky [ 2022-05-22 ]

In the current state it's more like API compliance. Lots of optimizations is possible. At least some of them will be the result of this ticket.

Comment by Lawrin Novitsky [ 2022-07-24 ]

rewriteBatchedStatements option support

For insert queries, rewrites batch of staments to execute in a single
executeQuery. e.g.
insert into ab (i) values (?) with first batch values = 1, second = 2
will be rewritten as insert into ab (i) values (1), (2)
If query cannot be rewriten in "multi-values", rewrite will use
multi-queries, i.e.
insert into ab (i) values (1);insert into ab (i) values (2)
Using this option causes the useServerPrepStmts option to be set to false

Comment by Lawrin Novitsky [ 2022-10-06 ]

I've changed type to "Task" as it's more like adding support of the feature, rather then fixing it. Also, linked related task, that implemented another mode of batch execution.

Generated at Thu Feb 08 03:08:20 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.