[CONJ-375] Load data infile with large files fails with OutOfMemoryError Created: 2016-10-20 Updated: 2016-10-25 Resolved: 2016-10-25 |
|
| Status: | Closed |
| Project: | MariaDB Connector/J |
| Component/s: | Other |
| Affects Version/s: | 1.5.2 |
| Fix Version/s: | 1.5.5 |
| Type: | Bug | Priority: | Major |
| Reporter: | Andreas Palm | Assignee: | Diego Dupin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | infile | ||
| Environment: |
MariaDB server version: 5.5.44-MariaDB |
||
| Attachments: |
|
| Description |
|
Summary: Detailed problem description: The query that is run is the following:
The INFILE stream is set using code similar to this:
The table definition and example code is attached to this ticket. From what I've seen so far it seems that the Connector/J driver attempts to allocate a buffer (org.mariadb.jdbc.internal.stream.PacketOutputStream::increase(int)) large enough to fit the entire file before sending anything to the server. This allocation fails for large files. For smaller files it seems to fail because the "max_allowed_packet" is smaller than the buffer. No rows were created and nothing was logged in the access or error log in the two cases that caused errors to occur. I have tested simulating imports with different file sizes, please see the attached java file. Expected behavior: Background information: The process works like this: the user identifies some property (column) available in our databases they want to add additional data to, for example cellphone model. The user then creates a file in a special format that maps some number of cellphone models to the data they wish to add. The file does not have to contain every value that exists in the database, it may even reference values that currently does not exist. The user then notifies our system that they wish to add the data. The system in turn does some processing of the file, for example transforming the values to the appropriate format. The end result is that the system produces a CSV file that will be then be inserted to one or more MariaDB databases. The data is placed in its own table, our original data is never modified, this is partly because the data generation part of the system does not have knowledge of the data. After the data has been inserted to the databases the system will perform a JOIN on the created tables whenever a user asks for this additional data. Currently this data insertion is performed by batching inserts in small volumes, this is too slow, thus we are attempting to use LOAD DATA INFILE. |
| Comments |
| Comment by Diego Dupin [ 2016-10-25 ] | |||||||||||||||
|
commit : https://github.com/MariaDB/mariadb-connector-j/commit/01ad520a07f4aae92ff2e554fd65be3fbef99c4b JMH Benchmark result : will be release in next correction release 1.5.5. correction available in maven with :
|