[CONJ-920] MariaDB Java Client 3.0.3 - Java batched statements with optimistic locking failing Created: 2022-01-30  Updated: 2023-12-07  Resolved: 2023-08-01

Status: Closed
Project: MariaDB Connector/J
Component/s: batch
Affects Version/s: 3.0.3
Fix Version/s: 3.2.0

Type: Bug Priority: Major
Reporter: Ronald Dehuysser Assignee: Diego Dupin
Resolution: Fixed Votes: 2
Labels: None

Issue Links:
Duplicate
is duplicated by CONJ-1038 Regression - Prepared Statement execu... Closed
Relates
relates to CONJ-1135 MariaDB Java Client 3.3.1 - Java batc... Closed

 Description   

Hi, I'm the author of JobRunr (https://github.com/jobrunr/jobrunr), a distributed job scheduler for Java.

JobRunr supports quite a lot of SQL databases and after the upgrade to `org.mariadb.jdbc:mariadb-java-client:3.0.3`, none of the batched insert & update statements work anymore.

These worked fine with all prior mariadb-java-clients and the same code also works fine for MySQL, Postgres, ... .

How to reproduce:



 Comments   
Comment by Diego Dupin [ 2022-01-31 ]

I've not checked in detail, but 3.0 version by default enable option `useBulkStmts` that change batch behavior.

By default batch are send by bunch :
imagine that prepared statement like "INSERT INTO myTABLE VALUES (?,?)" is executed 10 times with different couple of values. Driver will now send those 10 couple of information in one command, permitting super fast batches. For those batch commands, server just indicates that command is successful or not. When successful, driver then return just an array of update count with value SUCCESS_NO_INFO ( =value -2) as stated in https://docs.oracle.com/en/java/javase/17/docs/api/java.sql/java/sql/Statement.html#executeBatch() :

The elements in the array returned by the method executeBatch may be one of the following:
* A number greater than or equal to zero -- indicates that the command was processed successfully and is an update count giving the number of rows in the database that were affected by the command's execution
* A value of SUCCESS_NO_INFO -- indicates that the command was processed successfully but that the number of rows affected is unknown
* ...

see hibernate batch verification : https://github.com/hibernate/hibernate-orm/blob/fa8b78d345e01d980a6046f226664f28799c73e0/hibernate-core/src/main/java/org/hibernate/jdbc/Expectations.java#L58

When quickly checking code about that : https://github.com/jobrunr/jobrunr/blob/master/core/src/main/java/org/jobrunr/storage/sql/common/db/Sql.java#L153 it would seems SUCCESS_NO_INFO is not checked.
I might be wrong, you'll tell

Comment by Ronald Dehuysser [ 2022-03-12 ]

Hi Diego, thanks for the swift response and sorry for the late reply.

When I update JobRunr to also accept SUCCESS_NO_INFO as a valid SQL response (so indeed, no error occurred), less tests fail.

The test https://github.com/jobrunr/jobrunr/blob/master/core/src/testFixtures/java/org/jobrunr/storage/StorageProviderTest.java#L338-L360 check whether optimistic locking works with batch jobs.

However, one test still fails as the result is [-2, -2, -2 -2].
For all other clients, I receive [1, 0, 0, 1].

Concretely, this means I cannot know whether an update has succeeded or failed due to optimistic locking. I hope it makes sense what I say.

Comment by Ronald Dehuysser [ 2022-03-12 ]

See also https://mariadb.com/kb/en/mariadb-connector-j-230-release-notes/#conj-642-disable-the-option-usebulkstmts-by-default.

Comment by Radek Wikturna [ 2023-01-06 ]

Hello
I'm working on a Spring Boot project (which uses Hibernate as JPA provider) and after upgrading MariaDB JDBC driver to version 3.x optimistic locking stopped working, i.e. `StaleStateException` is NOT thrown when it should because all row counts are -2 and Hibernate only logs a debug message (see https://github.com/hibernate/hibernate-orm/blob/fa8b78d345e01d980a6046f226664f28799c73e0/hibernate-core/src/main/java/org/hibernate/jdbc/Expectations.java#L58).

Ironically, I reported the same bug a few years ago in driver version 2.2.6 - see https://jira.mariadb.org/browse/CONJ-630. It appeared due to the very same reason - making `useBulkStmts` the default behaviour. It was disabled by https://jira.mariadb.org/browse/CONJ-642 so my bug was closed. Now, version 3.x enables `useBulkStmts` again by default causing the same problem. I consider this a bug or at least critical incompatibility change - optimistic locking won't work in any project using Hibernate (or using the same algorithm for checking update counts). Performance enhancement cannot cause other change in functionality of another function, especially such that has impact on optimistic locking.

Possible solutions:
a) disable `useBulkStmts` again and keep it this way. Update documentation stating that enabling this flag will cause PrepareStatement.executeBatch() return only -2 in all buckets, which will silently disable optimistic locking in Hibernate or projects relying on these values.
b) implement the performance optimization in such a way the update counts are returned.

Comment by Radek Wikturna [ 2023-01-06 ]

Also, I would like to understand the option `useBulkStms` .
The documentation only says:

Use dedicated COM_STMT_BULK_EXECUTE protocol for batch insert when possible. (batch without Statement.RETURN_GENERATED_KEYS and streams) to have faster batch. (significant only on >= MariaDB 10.2.7)
Default: true since 3.0.0 (was false since version >= 2.3.0)

From the documentation it appears it should affect only INSERT statements but it seems this is not true. Can you comment on that?

Do you have some performance measurements measuring the following?
a) inserting N rows one by one ( no batching)
b) inserting N rows using a batch of X, when useBulkStms is disabled
c) inserting N rows using a batch of X, when useBulkStms is enabled

Thanks

Comment by Diego Dupin [ 2023-01-06 ]

I'll try to make this response exhaustive. Batch implementation can be complex.

About COM_STMT_BULK_EXECUTE : this exchange permit batches in general, so that's mostly used for INSERT, but can be UPDATE/DELETE as well.

There is multiple possible implementation for batches.
Here is some of possible implementations:

  1. loop using TEXT protocol :
  2. loop using BINARY protocol
  3. loop using BINARY protocol, pipelining
  4. contatenate TEXT commands:
  5. rewrite command
  6. using dedicated bulk command.
loop using TEXT protocol :

driver parse command and substitute '?' by escaped parameters
so that would mean for example in term of exchanges :
send "INSERT INTO myTABLE VALUE ('val1')"
read response
send "INSERT INTO myTABLE VALUE ('val2')"
read response
...
2 responses are received, driver then knows each affected rows.

loop using BINARY protocol

(PREPARE command already done, retuning a prepare statement id)
driver only send the prepare statement id, and parameters
so that would mean in term of exchanges :
send statement id + 'val1'
read response
send statement id + 'val2'
read response
...
2 responses are received, driver then knows each affected rows.

loop using BINARY protocol, pipelining

Same than previous, but all send are done without reading server response, then reading all responses at once.
so that would mean in term of exchanges :
send statement id + 'val1'
send statement id + 'val2'
read response
read response
...
The advantage is when server finish handling one command is that next one is already in socket buffer, permitting avoiding some of network latency
2 responses are received, driver then knows each affected rows.

contatenate TEXT commands:

Send all the text command in one exchange
so that would mean in term of exchanges :
send "INSERT INTO myTABLE VALUE ('val1');INSERT INTO myTABLE VALUE ('val2')"
read response of the 2 commands
...
This permit faster exchanges, but usually not used, because multi-commands is usually disable to avoid injection.
2 responses are received, driver then knows each affected rows.

rewrite command

some driver (mysql) do that for INSERT command only :
driver parse command, and if it is some INSERT command that can be rewritten (some cannot) then rewrite command.
so that would mean in term of exchanges :
send "INSERT INTO myTABLE VALUE ('val1'),('val2')"
read response.

One response only is received, driver then knows affected rows for the total amount of affected rows.

using dedicated bulk command.

(PREPARE command already done, retuning a prepare statement id)
driver only send one command (COM_STMT_BULK_EXECUTE) containing the prepare statement id, and all parameters
send statement id + 'val1' + 'val2'
read response
...
One response only is received, driver then knows affected rows for the total amount of affected rows.

To give you an idea of performance of those different implementation, executing 100 INSERT with one parameter for each insert (a string with a length of 100 chars)

type number of operation per seconds
loop using TEXT protocol 72
loop using BINARY protocol 144
rewrite command 2588
using dedicated bulk command 5935

( i doesn't have immediate data for "contatenate TEXT commands", but that better than looping using binary, and less than rewrite)

the reason for better performance for rewrite and bulk is simple : less networks exchanges, and permit server optimisation (inserting multiple data directly for example)

There is 2 restrictions then : the driver knows only success or failure, and if success, only the total number of affected rows, and latest generated Id.
When generated ids was expressly required, COM_STMT_BULK_EXECUTE will not be use, preferring loop using BINARY protocol, pipelining.
JDBC specification permit that, driver then returning STATEMENT.SUCCESS_NO_INFO.

So that is not compatible with optimistic commands. workaround is simple, disabling option `useBulkStmts`

Enabling useBulkStmts in 2.x was an error, because a major breaking change. Enabling it in 3.x make more sense, but creating this problem with optimistic commands.
Actually, when COM_STMT_BULK_EXECUTE command has been designed, we have permit having one flag to indicate wanting to receive all result packets. That would permit having generated value AND each affected rows, but that is actually not implemented.
I will first check if that can be easy implemented

Comment by Radek Wikturna [ 2023-01-09 ]

I appreciate the detailed answer. I admit I don't fully understand the technical details of implementation of the batch update.
I wrote a test case (see below) that show what various JDBC drivers (and their versions) return by 3 possible methods of PreparedStatement.

First, let's create a test table
{{
// Oracle
// CREATE TABLE TEST(
// "NAME" VARCHAR2(100),
// "ID" NUMBER(10) GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY
// )

// SQL Server
// CREATE TABLE TEST (
// id INT NOT NULL IDENTITY PRIMARY KEY,
// name VARCHAR(100)
// )

// MariaDB
// CREATE TABLE `TEST` (
// `NAME` varchar(100),
// `ID` int(11) NOT NULL AUTO_INCREMENT,
// PRIMARY KEY (`ID`)
// )
}}

then run the test.

	public void testBatchUpdateCounts(Connection connection) throws SQLException {
		try(Statement st = connection.createStatement()) {
			st.execute("insert into TEST(NAME) VALUES('A')");
			st.execute("insert into TEST(NAME) VALUES('B')");
			connection.commit();
		}
		try(PreparedStatement ps = connection.prepareStatement("update TEST set NAME = ? where ID = ?")) {
			ps.setString(1, "AA");
			ps.setInt(2, 1);
			ps.addBatch();
			
			ps.setString(1, "BB");
			ps.setInt(2, -1);//non existent primary key - simulates optimistic locking
			ps.addBatch();
			
			int[] updateCounts = ps.executeBatch();
			int updateCount = ps.getUpdateCount();
			long largeUpdateCount = ps.getLargeUpdateCount();
 
			System.out.println("JDBC Driver " + connection.getMetaData().getDriverVersion());
			String info = "Counts reported by various methods:\n" +
					"  executeBatch(): " + "[" + updateCounts[0] + ", " + updateCounts[1] + "]\n" +
					"  getUpdateCount(): " + updateCount + "\n" +
					"  getLargeUpdateCount(): " + (largeUpdateCount == -2 ? "unsupported" : largeUpdateCount) + "\n";
			System.out.println(info);
			
			assertArrayEquals(new int[] {1, 0}, updateCounts);
		}
	}

Results:

Oracle JDBC Driver 12.2.0.1.0 or 19.8.0.0.0
Counts reported by various methods:
executeBatch(): [1, 0]
getUpdateCount(): 1
getLargeUpdateCount(): -1

SQL Server JDBC Driver 7.4.1.0
Counts reported by various methods:
executeBatch(): [1, 0]
getUpdateCount(): -1
getLargeUpdateCount(): -1

MariaDB JDBC Driver 2.3.0
Counts reported by various methods:
executeBatch(): [1, 0]
getUpdateCount(): -1
getLargeUpdateCount(): -1

MariaDB JDBC Driver 3.1.0
Counts reported by various methods:
executeBatch(): [-2, -2]
getUpdateCount(): 1
getLargeUpdateCount(): 1

  1. note: test fails

MariaDB JDBC Driver 3.1.0, useBulkStmts=false
Counts reported by various methods:
executeBatch(): [1, 0]
getUpdateCount(): 1
getLargeUpdateCount(): 1

As can be seen, both Oracle and SQL Server driver return correct and expected values by executeBatch(), Oracle also by getUpdateCount().
MariaDB returns correct results by older driver version and by new version only if useBulkStmts=false. The new driver also returns correct value by getUpdateCount() and getLargeUpdateCount().

So, there is always some way how to find out if one or more UPDATES updated no rows due to optimistic locking condition. However, Hibernate only checks the result of executeBatch() (otherwise on each supported DB platform the implementation would have to be implemented differently)

You wrote:
> About COM_STMT_BULK_EXECUTE : this exchange permit batches in general, so that's mostly used for INSERT, but can be UPDATE/DELETE as well.

In reality, it is far more likely to batch insert a lot of data rather then to batch update a lot of data. Therefore I think the option useBulkStmts should influence and speed up only INSERT statements, but it should not silently make optimistic locking non-functional. Alternatively, there should be 2 options useBulkStmtsForInserts (by default true), useBulkStmtsForUpdatesAndDeletes (by default false).
The biggest danger I see here is that the problem may be left unnoticed for a long time, even years (there is no exception so everything seems to work fine) and only if you happen to write a JUnit testing optimistic locking in batch mode then you may find out it doesn't work any longer, which was our case. Hibernate probably doesn't have a JUnit for this.

Regards

Comment by Diego Dupin [ 2023-01-09 ]

I've created a server task to add the missing feature of bulk permitting this : MDEV-30366
That's will be discussed, and that's the only real solid solution in order to preserve performance and behavior.
So until that, workaround is to disable option useBulkStmts

Comment by Bas de Bakker [ 2023-01-09 ]

While we don't use Hibernate, we have a very similar use case and have postponed upgrading the JDBC driver for this reason. I agree with Radek that it would be useful to be able to use bulk statements only for inserts and not for updates. While we haven't measured it, I expect that turning off bulk statements will have a major impact on inserts compared to the old driver with rewriteBatchedStatements=true that we use now.

Comment by Radek Wikturna [ 2023-01-10 ]

I've discovered an article Batch Operations with MariaDB Connector/J which list various connection parameters (including useBulkStmts, rewriteBatchedStatements) that influence batching performance. What would be best settings for performance without loosing optimistic locking with the latest driver?

This article gives performance measurements but its pretty old (old driver version was used).

Comment by Diego Dupin [ 2023-01-10 ]

This can be a solution, to parse command, detect that this is an INSERT, then ensure that total of affected rows correspond to number of set of parameter, returning array of 1, not Statement.SUCCESS_NO_INFO( like rewriteBatchedStatements did). This solution is safe, even if parsing command will always have a little cost in terms of performance.
Still this behavior is safer and will be faster than 2.x implementation (even using rewriteBatchedStatements option)

Until having MDEV-30366, this seems a better default.

To resume :

  • adding a new option useBulkStmtsForInserts that would permit using bulk statement for INSERT command only, enable by default
  • changing useBulkStmts to false by default.

Still this would be a better default, even if slowing down batch of DELETE and UPDATE.

Generated at Thu Feb 08 03:19:17 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.