[CONJ-1122] OutOfMemoryError when iterating over result set and modify rows in the same connection Created: 2023-11-14 Updated: 2023-11-27 Resolved: 2023-11-22 |
|
| Status: | Closed |
| Project: | MariaDB Connector/J |
| Component/s: | Other |
| Affects Version/s: | 3.1.4 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Radek Wikturna | Assignee: | Diego Dupin |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Windows 10 |
||
| Attachments: |
|
| Description |
|
Background: I created a test project that demonstrates various ways in Jakarta Persistence API (JPA) how to iterate over arbitrary number of entities read from the database Attached is the complete project. Just unzip it, import it to some IDE (set JDK 17) and run the Gradle build. Set Set the -Xmx=200M in order to detect possible memory problems soon. Follow README.md. The most important class in MassDataService. It has various functions that show the possible ways how to iterate over objects in JPA (resultList, offsetPaging, resultStream, stream) and in JDBC:
|
| Comments |
| Comment by Diego Dupin [ 2023-11-16 ] | |||||||||||||||
|
Thanks for details testing project, i think that's the first time i see a test example so prepared ! The problem is just about oracle and mariadb connector differ about default fetch size (oracle is the only connector i know that has a default fetch size (of 10 rows)). Having fetch size has some limitations, JDBC spec permits to go to previous resulset rows, and setting fetch size remove that possibility, and decrease performance for small resultset. For big resultset, fetch size has to be set, in order to avoid mounting the whole resultset in memory. This can be done in different ways, like like in your example :
| |||||||||||||||
| Comment by Radek Wikturna [ 2023-11-23 ] | |||||||||||||||
|
I'm really surprised that this bug got closed as 'Not a Bug'! > For big resultset, fetch size has to be set, in order to avoid mounting the whole resultset in memory. Maybe I shouldn't have included all the possible function like resultList() - I admit I'm not setting the fetch size there (it wouldn't help anyway) because I wanted to test how many records can be iterated using this function before OOME occurs. So, please just focus on fun jdbcResultSet(), which simulates what JPA/Hibernate does under the hood (e.g. when you call fun getResultStrem()). Please reopen the issue. I'm sure we can find a way how understand each other. I have also previously read this article from a Hibernate guru, which gives some advice on setting fetch size on mySQL to integer minimum value or setting connection property useCursorFetch - neither works on mariaDB: | |||||||||||||||
| Comment by Diego Dupin [ 2023-11-24 ] | |||||||||||||||
|
There is different things that will indicate to connector how to handle memory : fetch size hint and resultset type. Oracle and mariadb protocol are different, when using fetch size. for example for a value of fetch size of X, mariadb will retrieve the whole resultset, putting X rows in memory at a time, while oracle will only retrieve X rows, then when needing more rows, will ask the the next X rows. ( To be complete, mariadb has another exchange protocol that permit to retrieve data the same way, but beeing around 2x less performant that current implementation, it's better this way) The thing is those 2 parameters - fetch size and resultset type - are how connector handle to know how memory must be handled. Oracle by default set fetch size to 10, while mariadb load the complete resultset (fetch size of 0). The thing is that loading complete resultset is a bit faster for small resultset. Small resultset are the norm, so i think we have good default. I'll not talk about MySQL connector setFetchSize(Integer.MIN_VALUE) that correspond in fact to setFetchSize(1), because that doesn't follow JDBC specification (fetch size must be >= 0) | |||||||||||||||
| Comment by Radek Wikturna [ 2023-11-27 ] | |||||||||||||||
|
I really appreciate the detailed answer giving insight on some technical details that need to be considered when implementing the driver. I suspect you have not really run the code. I admit that my testing code is long and has many functions and that may confuse you. Therefore, I've reduced it to bare minimum (ther's a new zip file) and now you only need to
Please, try to run it and only then tell me that I'm doing something wrong. | |||||||||||||||
| Comment by Vladislav Vaintroub [ 2023-11-27 ] | |||||||||||||||
|
While I do not know any detail about the driver implementation, one thing that I do know is that you can't execute any other query, as long as result set from COM_QUERY/COM_STMT_EXECUTE was not fully read. Protocol does not allow to start a new command before the old one is finished. Now, Roderick, did you succeed with your program using Oracle's MySQL driver, using any tricks that Hibernate guru revealed? | |||||||||||||||
| Comment by Diego Dupin [ 2023-11-27 ] | |||||||||||||||
|
Ah, i've finally understood your problem : basically, when streaming a resultset, you use the same connection to make some update command. In order to run the execution, the connector will then first finish to completly read all the remaining rows of the resultset into memory, and that will result in OOM. Basically, for your example, changing
to
To explain a bit more : mariadb connector and database exchanges when using fetch size are :
Oracle works differently. example with the default fetch size of 10:
Oracle solution is way slower because it requires lots of exchanges between client and server, but does not have the limitation compare mariadb connector have there. This need at least documentation. |