Uploaded image for project: 'MariaDB Connector/J'
  1. MariaDB Connector/J
  2. CONJ-715

OOM happening when querying large datasets

Details

    • Bug
    • Status: Closed (View Workflow)
    • Blocker
    • Resolution: Not a Bug
    • 2.2.1
    • N/A
    • performance
    • None

    Description

      I am using the following

      • AWS Aurora MySQL 5.6
      • mariadb-java-client 2.2.1
      • springframework 4.3.16

      I am currently using fetchsize of 1000 in JdbcTemplate. For large size tables, when using select queries, I am experiencing OOM. I could confirm that Spring is able to propagate the fetchSize to the driver.

      When I started debugging, I found the following observation in SelectResultSet in the driver code. When fetchSize > 0, the execution flow proceeds like

      • nextStreamingValue
      • addStreamingValue
      • readNextValue
      • growDataArray

      So essentially, SelectResultSet data variable keeps growing when next is called on the ResultSet. So OOM is ought to happen eventually. BTW, I am using the default TYPE_FORWARD_ONLY scroll type.

      Could anyone confirm if my understanding is correct? Is there a way to query large datasets without OOM?

      Attachments

        Issue Links

          Activity

            diego dupin Diego Dupin added a comment - - edited

            >So essentially, SelectResultSet data variable keeps growing when next is called on the ResultSet. So OOM is ought to happen eventually
            This is right, until data size reach fetchsize limit.
            Using a fetch size of 1000 means having a maximum of 1000 rows in memory at a time.
            If rows are huge, this can cause OOM. Then better set fetchsize to a lower value.
            could you send query and associate DDL to check row size ?

            To explain by code :
            rows are stored in an array (variable is `data`) with an initial size of 10, but this array will grow to reach fetchsize if > 10.
            but data maximum size will be limited to fetchsize.

            diego dupin Diego Dupin added a comment - - edited >So essentially, SelectResultSet data variable keeps growing when next is called on the ResultSet. So OOM is ought to happen eventually This is right, until data size reach fetchsize limit. Using a fetch size of 1000 means having a maximum of 1000 rows in memory at a time. If rows are huge, this can cause OOM. Then better set fetchsize to a lower value. could you send query and associate DDL to check row size ? To explain by code : rows are stored in an array (variable is `data`) with an initial size of 10, but this array will grow to reach fetchsize if > 10. but data maximum size will be limited to fetchsize.

            > If rows are huge, this can cause OOM. Then better set fetchsize to a lower value.
            Agreed. But in our case, it seems it is not the cause. I tried to lower the fetch size and still saw OOM happening.

            > data maximum size will be limited to fetchsize.
            Is data maximum size stored as part of variable dataSize?

            What I observed from our heapdump is dataSize > fetchSize.

            If my understanding is correct, the dataSize should never exceed fetchSize. Isn't it?

            chaitan64arun Arun Chaitanya Miriappalli added a comment - > If rows are huge, this can cause OOM. Then better set fetchsize to a lower value. Agreed. But in our case, it seems it is not the cause. I tried to lower the fetch size and still saw OOM happening. > data maximum size will be limited to fetchsize. Is data maximum size stored as part of variable dataSize ? What I observed from our heapdump is dataSize > fetchSize. If my understanding is correct, the dataSize should never exceed fetchSize. Isn't it?

            I realized my mistake.

            This is happening because a new query is executed on the same connection/transaction which forced the previous resultset to be fully loaded in memory. I had to create a new transaction to run another query. Now I can iterate both the resultsets without causing OOM.

            It would be great if it were clearly mentioned clearly in fetchsize documentation. Otherwise, this issue can be closed.

            chaitan64arun Arun Chaitanya Miriappalli added a comment - I realized my mistake. This is happening because a new query is executed on the same connection/transaction which forced the previous resultset to be fully loaded in memory. I had to create a new transaction to run another query. Now I can iterate both the resultsets without causing OOM. It would be great if it were clearly mentioned clearly in fetchsize documentation. Otherwise, this issue can be closed.
            diego dupin Diego Dupin added a comment -

            Alright. I didn't think of that case.

            I've added some documentation about this case and closing ticket

            diego dupin Diego Dupin added a comment - Alright. I didn't think of that case. I've added some documentation about this case and closing ticket
            cyril.joui Cyril Joui added a comment - - edited

            Hello

            I'm getting the same issue with Kafka Connect JDBC Source.
            We have migrated our connector from Oracle to Aurora (RDS) and now using the MariaDB Connector.

            We have a large table to read with connector and we are facing OOM.
            A #isValid call of the current connection (when iterating a result set) produces an OOM because of

            // load active result if any so buffer are clean for next query
            if (activeStreamingResult != null)

            { activeStreamingResult.loadFully(false, this); activeStreamingResult = null; }

            Do you have any idea why calling isValid method has this impact ?
            Thank you

            STACK :
            java.lang.OutOfMemoryError: Java heap space
            at java.base/java.util.Arrays.copyOf(Arrays.java:3689)
            at org.mariadb.jdbc.internal.com.read.resultset.SelectResultSet.growDataArray(SelectResultSet.java:521)
            at org.mariadb.jdbc.internal.com.read.resultset.SelectResultSet.readNextValue(SelectResultSet.java:435)
            at org.mariadb.jdbc.internal.com.read.resultset.SelectResultSet.addStreamingValue(SelectResultSet.java:364)
            at org.mariadb.jdbc.internal.com.read.resultset.SelectResultSet.fetchRemaining(SelectResultSet.java:311)
            at org.mariadb.jdbc.internal.com.read.dao.Results.loadFully(Results.java:274)
            at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.cmdPrologue(AbstractQueryProtocol.java:1918)
            at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.ping(AbstractQueryProtocol.java:1225)
            at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.isValid(AbstractQueryProtocol.java:1272)
            at org.mariadb.jdbc.MariaDbConnection.isValid(MariaDbConnection.java:1228)
            at io.confluent.connect.jdbc.dialect.GenericDatabaseDialect.isConnectionValid(GenericDatabaseDialect.java:253)
            at io.confluent.connect.jdbc.util.CachedConnectionProvider.isConnectionValid(CachedConnectionProvider.java:70)
            at io.confluent.connect.jdbc.util.CachedConnectionProvider.getConnection(CachedConnectionProvider.java:53)
            at io.confluent.connect.jdbc.source.JdbcSourceTask.poll(JdbcSourceTask.java:364)

            cyril.joui Cyril Joui added a comment - - edited Hello I'm getting the same issue with Kafka Connect JDBC Source. We have migrated our connector from Oracle to Aurora (RDS) and now using the MariaDB Connector. We have a large table to read with connector and we are facing OOM. A #isValid call of the current connection (when iterating a result set) produces an OOM because of // load active result if any so buffer are clean for next query if (activeStreamingResult != null) { activeStreamingResult.loadFully(false, this); activeStreamingResult = null; } Do you have any idea why calling isValid method has this impact ? Thank you STACK : java.lang.OutOfMemoryError: Java heap space at java.base/java.util.Arrays.copyOf(Arrays.java:3689) at org.mariadb.jdbc.internal.com.read.resultset.SelectResultSet.growDataArray(SelectResultSet.java:521) at org.mariadb.jdbc.internal.com.read.resultset.SelectResultSet.readNextValue(SelectResultSet.java:435) at org.mariadb.jdbc.internal.com.read.resultset.SelectResultSet.addStreamingValue(SelectResultSet.java:364) at org.mariadb.jdbc.internal.com.read.resultset.SelectResultSet.fetchRemaining(SelectResultSet.java:311) at org.mariadb.jdbc.internal.com.read.dao.Results.loadFully(Results.java:274) at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.cmdPrologue(AbstractQueryProtocol.java:1918) at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.ping(AbstractQueryProtocol.java:1225) at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.isValid(AbstractQueryProtocol.java:1272) at org.mariadb.jdbc.MariaDbConnection.isValid(MariaDbConnection.java:1228) at io.confluent.connect.jdbc.dialect.GenericDatabaseDialect.isConnectionValid(GenericDatabaseDialect.java:253) at io.confluent.connect.jdbc.util.CachedConnectionProvider.isConnectionValid(CachedConnectionProvider.java:70) at io.confluent.connect.jdbc.util.CachedConnectionProvider.getConnection(CachedConnectionProvider.java:53) at io.confluent.connect.jdbc.source.JdbcSourceTask.poll(JdbcSourceTask.java:364)
            diego dupin Diego Dupin added a comment -

            This is expected :

            When streaming data, data are read row per row (or to be precise by bunch corresponding to fetch size).

            See the red paragraph in https://mariadb.com/kb/en/about-mariadb-connector-j/#streaming-result-sets :

            If another query is executed on the same connection when a streaming resultset has not been fully read, the connector will put the whole remaining streaming resultset in memory in order to execute the next query. This can lead to OutOfMemoryError if not handled.

            The problem is if the same connection is used when streaming, to execute another command ( a Connection.isValid() in your case ), the connector will force the loading of streaming resultset, to read the new command result. The forced loading of streaming resultset might cause this OOM error.
            I don't know Kafka Connect JDBC Source, but it would seems it use connection caching (pool?) and validate the connection before use (=> calling Connection.isValid while a streaming resultset was in progress. There is a problem somewhere, a connection must not have been return to pool if used.

            diego dupin Diego Dupin added a comment - This is expected : When streaming data, data are read row per row (or to be precise by bunch corresponding to fetch size ). See the red paragraph in https://mariadb.com/kb/en/about-mariadb-connector-j/#streaming-result-sets : If another query is executed on the same connection when a streaming resultset has not been fully read, the connector will put the whole remaining streaming resultset in memory in order to execute the next query. This can lead to OutOfMemoryError if not handled. The problem is if the same connection is used when streaming, to execute another command ( a Connection.isValid() in your case ), the connector will force the loading of streaming resultset, to read the new command result. The forced loading of streaming resultset might cause this OOM error. I don't know Kafka Connect JDBC Source, but it would seems it use connection caching (pool?) and validate the connection before use (=> calling Connection.isValid while a streaming resultset was in progress. There is a problem somewhere, a connection must not have been return to pool if used.
            cyril.joui Cyril Joui added a comment -

            Thank you for your answer and documentation details.
            As you have seen, I've create a dedicated issue : https://jira.mariadb.org/browse/CONJ-929

            I'm also discussing with kafka jdbc connector to find the best fix.

            cyril.joui Cyril Joui added a comment - Thank you for your answer and documentation details. As you have seen, I've create a dedicated issue : https://jira.mariadb.org/browse/CONJ-929 I'm also discussing with kafka jdbc connector to find the best fix.

            People

              diego dupin Diego Dupin
              chaitan64arun Arun Chaitanya Miriappalli
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.