[CONJ-1017] Calendar possible race condition, cause wrong timestamp setting Created: 2022-10-05  Updated: 2022-11-07  Resolved: 2022-10-10

Status: Closed
Project: MariaDB Connector/J
Component/s: Other
Affects Version/s: 3.0.8
Fix Version/s: 3.0.9

Type: Bug Priority: Major
Reporter: Romain Moreau Assignee: Diego Dupin
Resolution: Fixed Votes: 0
Labels: None


 Description   

For instance, given the following data:

Id Event_Date_Time
0 2022-09-01T00:00:00Z
1 2022-09-01T01:00:00Z
2 2022-09-01T02:00:00Z
3 2022-09-01T03:00:00Z
4 2022-09-01T04:00:00Z
5 2022-09-02T00:00:00Z
6 2022-09-02T01:00:00Z
7 2022-09-02T02:00:00Z
8 2022-09-02T03:00:00Z
9 2022-09-02T04:00:00Z

If I insert these rows with 2 threads in parallel (one thread bulk inserting the 5 first rows and another thread bulk inserting the 5 last rows), when I select the inserted rows, the values are randomly scrambled, for instance:

Id Event_Date_Time
0 2022-09-01T00:00:00Z
1 2022-09-02T01:00:00Z
2 2022-09-02T03:00:00Z
3 2022-09-02T04:00:00Z
4 2022-09-01T04:00:00Z
5 2022-09-01T00:00:00Z
6 2022-09-02T01:00:00Z
7 2022-09-02T02:00:00Z
8 2022-09-02T03:00:00Z
9 2022-09-02T04:00:00Z

If I insert these rows in bulk without concurrence, there's no issue.

Demo : https://github.com/romainmoreau/mariadb-demo



 Comments   
Comment by Marko Mäkelä [ 2022-10-05 ]

Which storage engine are you using? If it is InnoDB, you should be aware of MDEV-515, MDEV-24818, and MDEV-24621. In short, you’d better avoid interleaved inserts from multiple sessions to the same initially empty table. It should be better to let each session insert into its own initially empty InnoDB table.

Comment by Romain Moreau [ 2022-10-05 ]

Yes my my.ini is pretty much vanilla:

[mysqld]
datadir=C:\SomeDir
innodb_buffer_pool_size=4G
[client]
plugin-dir=C:\Program Files\MariaDB 10.8/lib/plugin

This issue also happens when the table is not empty (I updated the demo).

Comment by Marko Mäkelä [ 2022-10-06 ]

InnoDB does row-level locking. It is expected that when multiple connections are inserting data concurrently, the writes can be interleaved. If you want to prevent that, you could escalate to table-level locking by issuing a LOCK TABLE statement. You might also define and use a logical PRIMARY KEY in your database schema design.

Comment by Romain Moreau [ 2022-10-06 ]

If this behavior is expected on a server side, then probably this issue should be addressed by the Java connector project.

Comment by Diego Dupin [ 2022-10-07 ]

In your example, the behavior is expected :
ORM code

		for (var demo : demoList) {
			if (i > 0 && i % BATCH_SIZE == 0) {
				entityManager.flush();
				entityManager.clear();
			}
			entityManager.persist(demo);
			i++;
		}

Will result in hibernate internally executing 5 * JDBC prepareStatement.execute().
Since there is 2 paralleles runner, this result with INSERT with mixed IDs, like you describe.
There is nothing wrong in MariaDB Server or connector there.
This is just how hibernate send the data.

There is some solutions hibernate side, like for example adding batch_size :

spring:
  jpa:
    hibernate:
      jdbc:
        batch_size: 2000

In that case, hibernate will not run some PrepareStatement.execute, but will execute those in batch (preparestatement.addBatch() x times, then preparestatement.executeBatch)
MariaDB connector and Server will then receive only one command with all the insert at a time, then IDs will be orderered.

Comment by Romain Moreau [ 2022-10-07 ]

If you debug the demo, you'll see it's not the way you describe because you seem to have missed this line which sets the bach size only for the current hibernate Session instead of setting it globally like you were suggesting:

entityManager.unwrap(Session.class).setJdbcBatchSize(BATCH_SIZE);

In fact, this part is not executed in the current settings of the demo (4 objects by thread) but I left it to support lists which have a size higher than BATCH_SIZE (50):

entityManager.flush();
entityManager.clear();

I added the following to show what commands the MariaDB connector client was sending:

logging:
  level:
    '[org.mariadb]': DEBUG

And aside from transactions related commands, only two commands are issued by each thread : one prepare (PreparePacket) and one bulk insert (BulkExecutePacket) with no mixed up data when I inspect the batchParameterList field.

Comment by Diego Dupin [ 2022-10-09 ]

sorry, i've made bad assumption. This is indeed an issue connector side : calendar use in prepareStatement.setTimestamp(int parameterIndex, Timestamp x, Calendar cal) is not used in synchronized block, resulting in this issue when use in parallel

problem is only in 3.0 version

Comment by Diego Dupin [ 2022-10-10 ]

corrected in 3.0.9-SNAPSHOT version with commit https://github.com/mariadb-corporation/mariadb-connector-j/commit/22388e698edb045ea9e8c560195caa9bf4bb6da3

available using snapshot repository :

<repositories>
    <repository>
        <id>sonatype-nexus-snapshots</id>
        <name>Sonatype Nexus Snapshots</name>
        <url>https://oss.sonatype.org/content/repositories/snapshots</url>
    </repository>
</repositories>
 
<dependencies>
    <dependency>
        <groupId>org.mariadb.jdbc</groupId>
        <artifactId>mariadb-java-client</artifactId>
        <version>3.0.9-SNAPSHOT</version>
    </dependency>
</dependencies>

Comment by Romain Moreau [ 2022-10-10 ]

It works great with the demo and also with a real world application, thanks!

Generated at Thu Feb 08 03:20:00 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.