[CONJ-715] OOM happening when querying large datasets Created: 2019-06-27  Updated: 2022-02-15  Resolved: 2019-07-23

Status: Closed
Project: MariaDB Connector/J
Component/s: performance
Affects Version/s: 2.2.1
Fix Version/s: N/A

Type: Bug Priority: Blocker
Reporter: Arun Chaitanya Miriappalli Assignee: Diego Dupin
Resolution: Not a Bug Votes: 1
Labels: None

Attachments: PNG File heap_dump_analysis.png    
Issue Links:
Relates
relates to CONJ-276 java.lang.OutOfMemoryError: Java heap... Closed

 Description   

I am using the following

  • AWS Aurora MySQL 5.6
  • mariadb-java-client 2.2.1
  • springframework 4.3.16

I am currently using fetchsize of 1000 in JdbcTemplate. For large size tables, when using select queries, I am experiencing OOM. I could confirm that Spring is able to propagate the fetchSize to the driver.

When I started debugging, I found the following observation in SelectResultSet in the driver code. When fetchSize > 0, the execution flow proceeds like

  • nextStreamingValue
  • addStreamingValue
  • readNextValue
  • growDataArray

So essentially, SelectResultSet data variable keeps growing when next is called on the ResultSet. So OOM is ought to happen eventually. BTW, I am using the default TYPE_FORWARD_ONLY scroll type.

Could anyone confirm if my understanding is correct? Is there a way to query large datasets without OOM?



 Comments   
Comment by Diego Dupin [ 2019-07-08 ]

>So essentially, SelectResultSet data variable keeps growing when next is called on the ResultSet. So OOM is ought to happen eventually
This is right, until data size reach fetchsize limit.
Using a fetch size of 1000 means having a maximum of 1000 rows in memory at a time.
If rows are huge, this can cause OOM. Then better set fetchsize to a lower value.
could you send query and associate DDL to check row size ?

To explain by code :
rows are stored in an array (variable is `data`) with an initial size of 10, but this array will grow to reach fetchsize if > 10.
but data maximum size will be limited to fetchsize.

Comment by Arun Chaitanya Miriappalli [ 2019-07-10 ]

> If rows are huge, this can cause OOM. Then better set fetchsize to a lower value.
Agreed. But in our case, it seems it is not the cause. I tried to lower the fetch size and still saw OOM happening.

> data maximum size will be limited to fetchsize.
Is data maximum size stored as part of variable dataSize?

What I observed from our heapdump is dataSize > fetchSize.

If my understanding is correct, the dataSize should never exceed fetchSize. Isn't it?

Comment by Arun Chaitanya Miriappalli [ 2019-07-23 ]

I realized my mistake.

This is happening because a new query is executed on the same connection/transaction which forced the previous resultset to be fully loaded in memory. I had to create a new transaction to run another query. Now I can iterate both the resultsets without causing OOM.

It would be great if it were clearly mentioned clearly in fetchsize documentation. Otherwise, this issue can be closed.

Comment by Diego Dupin [ 2019-07-23 ]

Alright. I didn't think of that case.

I've added some documentation about this case and closing ticket

Comment by Cyril Joui [ 2022-02-15 ]

Hello

I'm getting the same issue with Kafka Connect JDBC Source.
We have migrated our connector from Oracle to Aurora (RDS) and now using the MariaDB Connector.

We have a large table to read with connector and we are facing OOM.
A #isValid call of the current connection (when iterating a result set) produces an OOM because of

// load active result if any so buffer are clean for next query
if (activeStreamingResult != null)

{ activeStreamingResult.loadFully(false, this); activeStreamingResult = null; }

Do you have any idea why calling isValid method has this impact ?
Thank you

STACK :
java.lang.OutOfMemoryError: Java heap space
at java.base/java.util.Arrays.copyOf(Arrays.java:3689)
at org.mariadb.jdbc.internal.com.read.resultset.SelectResultSet.growDataArray(SelectResultSet.java:521)
at org.mariadb.jdbc.internal.com.read.resultset.SelectResultSet.readNextValue(SelectResultSet.java:435)
at org.mariadb.jdbc.internal.com.read.resultset.SelectResultSet.addStreamingValue(SelectResultSet.java:364)
at org.mariadb.jdbc.internal.com.read.resultset.SelectResultSet.fetchRemaining(SelectResultSet.java:311)
at org.mariadb.jdbc.internal.com.read.dao.Results.loadFully(Results.java:274)
at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.cmdPrologue(AbstractQueryProtocol.java:1918)
at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.ping(AbstractQueryProtocol.java:1225)
at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.isValid(AbstractQueryProtocol.java:1272)
at org.mariadb.jdbc.MariaDbConnection.isValid(MariaDbConnection.java:1228)
at io.confluent.connect.jdbc.dialect.GenericDatabaseDialect.isConnectionValid(GenericDatabaseDialect.java:253)
at io.confluent.connect.jdbc.util.CachedConnectionProvider.isConnectionValid(CachedConnectionProvider.java:70)
at io.confluent.connect.jdbc.util.CachedConnectionProvider.getConnection(CachedConnectionProvider.java:53)
at io.confluent.connect.jdbc.source.JdbcSourceTask.poll(JdbcSourceTask.java:364)

Comment by Diego Dupin [ 2022-02-15 ]

This is expected :

When streaming data, data are read row per row (or to be precise by bunch corresponding to fetch size).

See the red paragraph in https://mariadb.com/kb/en/about-mariadb-connector-j/#streaming-result-sets :

If another query is executed on the same connection when a streaming resultset has not been fully read, the connector will put the whole remaining streaming resultset in memory in order to execute the next query. This can lead to OutOfMemoryError if not handled.

The problem is if the same connection is used when streaming, to execute another command ( a Connection.isValid() in your case ), the connector will force the loading of streaming resultset, to read the new command result. The forced loading of streaming resultset might cause this OOM error.
I don't know Kafka Connect JDBC Source, but it would seems it use connection caching (pool?) and validate the connection before use (=> calling Connection.isValid while a streaming resultset was in progress. There is a problem somewhere, a connection must not have been return to pool if used.

Comment by Cyril Joui [ 2022-02-15 ]

Thank you for your answer and documentation details.
As you have seen, I've create a dedicated issue : https://jira.mariadb.org/browse/CONJ-929

I'm also discussing with kafka jdbc connector to find the best fix.

Generated at Thu Feb 08 03:17:47 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.