How to reproduce:
- Have a relative fast computer. I have an Intel 3770S (3,1GHz) and an SSD drive.
- Run the attached test case GiganticLoadDataInfileTest.java
- Create a table with two columns: id and name.
- Create a LOAD DATA INFILE file that is gigantic. I used 30 million rows and a total byte size of 637777795 bytes.
- Run a LOAD DATA LOCAL INFILE query to insert the data into the table
After about 58 seconds about 25402152 rows are inserted into the table and you get this exception:
The test case should pass and there should be 30 million rows in the table.
This test case works with Mysql JDBC (I tested with version 5.1.26). It took about 67 seconds to execute.
It should be possible to send up more data to the server (I'm not sure about the exact limit, maybe 1 GiB). This LOAD DATA INFILE was only 637777795 bytes (about 608 MiB).
I’m probably part guilty to this error. I sent a patch (via e-mail to Anders Karlsson) that caused the commit http://bazaar.launchpad.net/~maria-captains/mariadb-java-client/trunk/revision/505. I also pointed to a way to fix for LOAD DATA INFILE (http://bazaar.launchpad.net/~maria-captains/mariadb-java-client/trunk/revision/506) without testing it for the extreme cases.
To understand this problem you must first understand a Mysql packet: http://dev.mysql.com/doc/internals/en/mysql-packet.html.
The payload length field has the data type “type 3”: 3 byte Protocol::FixedLengthInteger
There is a sequence id/number that has the data type “type 1”:1 byte Protocol::FixedLengthInteger.
If more than 16 MB of data is to be sent the packet is to be split in 16MB chunks: http://dev.mysql.com/doc/internals/en/sending-more-than-16mbyte.html. The Sequence ID should be incremented for each chunk.
The documentation is not very clear about a maximum size of a query. One interpretation is that the largest query that can be sent is 16 MiB * 256 = 4096 MiB = 4 GiB. That is, the query is split into 256 smaller chunks, each of 16 MiB in size. But since http://dev.mysql.com/doc/internals/en/sequence-id.html states “and may wrap around” I’m not sure. There also the page http://dev.mysql.com/doc/refman/5.5/en/packet-too-large.html that states “The largest possible packet that can be transmitted to or from a MySQL 5.5 server or client is 1GB.”
From what I can see in the code (http://bazaar.launchpad.net/~maria-captains/mariadb-java-client/trunk/view/head:/src/main/java/org/mariadb/jdbc/internal/common/packet/PacketOutputStream.java) the method sendFile() creates a buffer of size 8192 bytes. Each 8 KiB chunk is causing a write and a flush, thus increasing the seqNo attribute. When seqNo is 0xffff the Exception “MySQL protocol limit reached, you cannot send more than 4GB of data” is thrown. In total 8 KiB * 0xffff = 536870912 bytes = 512 GiB has been sent.
The commit http://bazaar.launchpad.net/~maria-captains/mariadb-java-client/trunk/revision/505 changed the max seqNo from 256 to 65536. I’m not sure if that is correct or not. However only the least significant 8 bits are sent in the packet since it is truncated at row 121:
My small tests shows that the server has no problem handling the wrapping of Sequence ID.
A naive solution to this is to change the buffer in sendFile() to 16MiB. But that does not work very well if the max_allowed_packet my.cnf variable is set lower. I tried to increase max_allowed_packet to 128M but even then got problem.
It would be possible get the max_allowed_packet from the session and/or compare with a preset value (for example 1 MiB).
Other solutions would be to:
Increase MAX_SEQNO from 65536 to 524288 (4 GiB/8192)
Increase MAX_SEQNO from 65536 to 131072 (1 GiB/8192).
From a performance view a slightly larger buffer could be good for larger network ping distances.