[CONJ-115] Big (> 512 MiB) LOAD DATA LOCAL INFILE cannot be sent to server Created: 2014-10-01  Updated: 2014-10-23  Resolved: 2014-10-23

Status: Closed
Project: MariaDB Connector/J
Component/s: None
Affects Version/s: 1.1.7
Fix Version/s: 1.1.8

Type: Bug Priority: Minor
Reporter: Lennart Schedin Assignee: Massimo Siani (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Attachments: Text File CONJ-115_lennart_simple_solution.patch     Java Source File GiganticLoadDataInfileTest.java    

 Description   

How to reproduce:

  1. Have a relative fast computer. I have an Intel 3770S (3,1GHz) and an SSD drive.
  2. Run the attached test case GiganticLoadDataInfileTest.java

Psedo code:

  1. Create a table with two columns: id and name.
  2. Create a LOAD DATA INFILE file that is gigantic. I used 30 million rows and a total byte size of 637777795 bytes.
  3. Run a LOAD DATA LOCAL INFILE query to insert the data into the table

Actual:
After about 58 seconds about 25402152 rows are inserted into the table and you get this exception:

java.sql.SQLNonTransientConnectionException: Could not read resultset: MySQL protocol limit reached, you cannot send more than 4GB of data
	at org.mariadb.jdbc.internal.SQLExceptionMapper.get(SQLExceptionMapper.java:136)
	at org.mariadb.jdbc.internal.SQLExceptionMapper.throwException(SQLExceptionMapper.java:106)
	at org.mariadb.jdbc.MySQLStatement.executeQueryEpilog(MySQLStatement.java:264)
	at org.mariadb.jdbc.MySQLStatement.execute(MySQLStatement.java:291)
	at org.mariadb.jdbc.MySQLStatement.execute(MySQLStatement.java:341)
	at org.mariadb.jdbc.GiganticLoadDataInfileTest.giganticLoadDataInfileTest(GiganticLoadDataInfileTest.java:52)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:601)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:44)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:180)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:41)
	at org.junit.runners.ParentRunner$1.evaluate(ParentRunner.java:173)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:220)
	at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
	at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:675)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)
Caused by: org.mariadb.jdbc.internal.common.QueryException: Could not read resultset: MySQL protocol limit reached, you cannot send more than 4GB of data
	at org.mariadb.jdbc.internal.mysql.MySQLProtocol.getResult(MySQLProtocol.java:930)
	at org.mariadb.jdbc.internal.mysql.MySQLProtocol.executeQuery(MySQLProtocol.java:995)
	at org.mariadb.jdbc.MySQLStatement.execute(MySQLStatement.java:284)
	... 26 more
Caused by: java.io.IOException: MySQL protocol limit reached, you cannot send more than 4GB of data
	at org.mariadb.jdbc.internal.common.packet.PacketOutputStream.write(PacketOutputStream.java:84)
	at org.mariadb.jdbc.internal.common.packet.PacketOutputStream.sendFile(PacketOutputStream.java:63)
	at org.mariadb.jdbc.internal.mysql.MySQLProtocol.getResult(MySQLProtocol.java:923)
	... 28 more

Expected:
The test case should pass and there should be 30 million rows in the table.

This test case works with Mysql JDBC (I tested with version 5.1.26). It took about 67 seconds to execute.

It should be possible to send up more data to the server (I'm not sure about the exact limit, maybe 1 GiB). This LOAD DATA INFILE was only 637777795 bytes (about 608 MiB).

Analysis:
I’m probably part guilty to this error. I sent a patch (via e-mail to Anders Karlsson) that caused the commit http://bazaar.launchpad.net/~maria-captains/mariadb-java-client/trunk/revision/505. I also pointed to a way to fix for LOAD DATA INFILE (http://bazaar.launchpad.net/~maria-captains/mariadb-java-client/trunk/revision/506) without testing it for the extreme cases.

To understand this problem you must first understand a Mysql packet: http://dev.mysql.com/doc/internals/en/mysql-packet.html.
The payload length field has the data type “type 3”: 3 byte Protocol::FixedLengthInteger
There is a sequence id/number that has the data type “type 1”:1 byte Protocol::FixedLengthInteger.

If more than 16 MB of data is to be sent the packet is to be split in 16MB chunks: http://dev.mysql.com/doc/internals/en/sending-more-than-16mbyte.html. The Sequence ID should be incremented for each chunk.

The documentation is not very clear about a maximum size of a query. One interpretation is that the largest query that can be sent is 16 MiB * 256 = 4096 MiB = 4 GiB. That is, the query is split into 256 smaller chunks, each of 16 MiB in size. But since http://dev.mysql.com/doc/internals/en/sequence-id.html states “and may wrap around” I’m not sure. There also the page http://dev.mysql.com/doc/refman/5.5/en/packet-too-large.html that states “The largest possible packet that can be transmitted to or from a MySQL 5.5 server or client is 1GB.”

From what I can see in the code (http://bazaar.launchpad.net/~maria-captains/mariadb-java-client/trunk/view/head:/src/main/java/org/mariadb/jdbc/internal/common/packet/PacketOutputStream.java) the method sendFile() creates a buffer of size 8192 bytes. Each 8 KiB chunk is causing a write and a flush, thus increasing the seqNo attribute. When seqNo is 0xffff the Exception “MySQL protocol limit reached, you cannot send more than 4GB of data” is thrown. In total 8 KiB * 0xffff = 536870912 bytes = 512 GiB has been sent.

The commit http://bazaar.launchpad.net/~maria-captains/mariadb-java-client/trunk/revision/505 changed the max seqNo from 256 to 65536. I’m not sure if that is correct or not. However only the least significant 8 bits are sent in the packet since it is truncated at row 121:

byteBuffer[SEQNO_OFFSET] = (byte)seqNo;

My small tests shows that the server has no problem handling the wrapping of Sequence ID.

Suggested Solution:
A naive solution to this is to change the buffer in sendFile() to 16MiB. But that does not work very well if the max_allowed_packet my.cnf variable is set lower. I tried to increase max_allowed_packet to 128M but even then got problem.

It would be possible get the max_allowed_packet from the session and/or compare with a preset value (for example 1 MiB).

Other solutions would be to:
Increase MAX_SEQNO from 65536 to 524288 (4 GiB/8192)
Increase MAX_SEQNO from 65536 to 131072 (1 GiB/8192).

From a performance view a slightly larger buffer could be good for larger network ping distances.



 Comments   
Comment by Massimo Siani (Inactive) [ 2014-10-01 ]

Review requested for
https://code.launchpad.net/~massimo-siani/mariadb-java-client/CONJ-115

Comment by Lennart Schedin [ 2014-10-02 ]

I edited the description. I had forgot a "not". The correct text should be "I'm not sure about the exact limit, maybe 1 GiB".

I think the suggested fix in https://code.launchpad.net/~massimo-siani/mariadb-java-client/CONJ-115 is acceptable. This will make the maximum LOAD DATA INFILE be 65536 * 8192 * 2 = 1073741824 bytes = 1 GiB. The downside of the fix is that is does not address the Exception text at row 84 of PacketOutputStream.java:

MySQL protocol limit reached, you cannot send more than 4GB of data

Comment by Lennart Schedin [ 2014-10-02 ]

I might have changed my mind about if the fix is acceptable. When running my test case using Mysql JDBC (5.1.26) with different values it passed above 1 GiB of data:

60000000  rows: 1297777795 bytes, 196 sec
120000000 rows: 2657777797 bytes, 268 sec

(I also had to change the data type totalNumberOfBytesRead from int to long to be able to count the number of bytes above 2 GiB)

Comment by Lennart Schedin [ 2014-10-02 ]

I’m beginning to think the there is no limit or that the limit is very high. I used Mysql JDBC 5.1.26 and 5.5.33a-MariaDB server and ran tests:

 60000000 rows:  1297777795 bytes,  196 sec
120000000 rows:  2657777797 bytes,  268 sec
180000000 rows:  4097777797 bytes,  426 sec
200000000 rows:  4577777797 bytes,  484 sec
300000000 rows:  6977777797 bytes,  710 sec
500000000 rows: 11777777797 bytes, 1192 sec
800000000 rows: 18977777797 bytes, 1952 sec

I used max_allowed_packet=128M. Don’t know if that setting matters.

I stopped running at about 18 GiB because my I was running out of hard drive and time to wait for the test to execute. At 18 GiB of LOAD DATA INFILE my MYD file was 48 GiB. To be able to run larger files I would have to optimize GiganticLoadDataInfileTest.java (probably by adding more or larger columns to get a higher throughput) and perhaps also using a faster SSD.

So If Mysql JDBC has no limit (or a super high limit), shouldn’t MariaDB JDBC also behave the same?

Comment by Vladislav Vaintroub [ 2014-10-02 ]

There is no limit, I believe. IIRC, seqno just rolls over from 255 to 0. Massimo, may I suggest NOT to include a unit test with gigabaytes of load data infile? You can still leave the code but at least mark it as @Ignored or whatever was used to ignore the tests.

Comment by Lennart Schedin [ 2014-10-03 ]

If there really is no limit I have a simple (a bit naive) solution in the attached file CONJ-115_lennart_simple_solution.patch. My solution is to simply remove the throwing of “MySQL protocol limit”-Exception. My patch also increases the LOAD DATA INFILE send buffer a bit.

As I look around in the code I see few test cases that tests PacketOutputStream directly, so I would be very careful before including my patch in the production code. If I worked on the code at full time I think I would write a JUnit test with a server mock up (that simply reads bytes from the stream and throws Exception if something strange is found).

(The patch is in Git diff format, but can be applied in Eclipse by selecting “Ignore leading path name segments” to 1).

Comment by Vladislav Vaintroub [ 2014-10-03 ]

I do not know how you would write a test checking for "no limit". Should Junit, part of unit test suite , part of normal build, fill the disk completely, will this be considered success ? For that type of stuff , one-time test is absolutely enough, on a single developer box IMO.

Comment by Lennart Schedin [ 2014-10-03 ]

Yeah, a "no limit" test case can be a bit hard to write. My main concern about lacking tests is to catch regressions, that is other errors that may be caused if my patch is not perfect. For example lets say my patch makes it impossible to send a large INSERT INTO table VALUES()-statement. Or if my code makes the seqNo a non-continues number sequence.

Comment by Lennart Schedin [ 2014-10-03 ]

I have tested my patch on a live-ish test system with a lot of traffic. From what I can see it works fine. I had also added a small logger to log the sizes. The largest LOAD DATA INFILE I saw was 1381971038 bytes = 1,3 GiB. So in my case I would require a fix that can use larger than 1 GiB of data.

Why I'm I using so large files? I don't know. I think the code I'm working in has a row limit, but some tables have so many columns that the byte count get very large.

Comment by Georg Richter [ 2014-10-03 ]

I think the problem is, that we use a fixed buffer of 8192 bytes instead of filling internal buffer (MAX_PACKET_LENGTH), e.g.

    public void sendFile(InputStream is, int seq) throws IOException{
        byte[] buffer = new byte[8192];
        int len;
        startPacket(seq++);
        while((len = is.read(buffer)) > 0) {
          write(buffer, 0, len);
        }
        finishPacket();
        writeEmptyPacket(seq);
    }

Comment by Lennart Schedin [ 2014-10-03 ]

Please don't implement a solution like this:

        byte[] buffer = new byte[MAX_PACKET_LENGTH - HEADER_LENGTH];

It won't work for servers that has a smaller max_allowed_packet setting.

Comment by Massimo Siani (Inactive) [ 2014-10-03 ]

I'm not sure this is the best solution, but the code in the CONJ-115 branch sets the buffer size according to the max_allowed_packet variable. This may be of some help for CONJ-116, too.

Comment by Vladislav Vaintroub [ 2014-10-03 ]

"Load data" packet size can be arbitrary small, like 1K ,which is guaranteed to work with any max_allowed_packet setting. The EOF is detected on the server side by the last packet.

Comment by Lennart Schedin [ 2014-10-06 ]

I see that https://code.launchpad.net/~massimo-siani/mariadb-java-client/CONJ-115 has been updated since I last checked. I like the overall idea of the new solution.

The solution asks the server for the max_allowed_packet during connect. This will cause some overhead making connect a bit slower. For me this is not a problem since I use a connection pool to manage the connections, but it is a thing to consider.

The solution uses global state: the variable MySQLProtocol.maxAllowedPacket is a static variable that only has one value per JVM. This will make the solution not work when connecting to two different database servers that has different max_allowed_packet. I would recommend, as a general programming practice never to use global changeable state. To solve this specific problem maybe the size can be injected as the PacketOutputStream is created in MySQLProtocol.java?

The solution does not work if I set a higher max_allowed_packet than PacketOutputStream.MAX_PACKET_LENGTH. If I set

set GLOBAL max_allowed_packet = 128*1024*1024;

I will get a

java.sql.SQLNonTransientConnectionException: Got packets out of order
	at org.mariadb.jdbc.internal.SQLExceptionMapper.get(SQLExceptionMapper.java:136)
	at org.mariadb.jdbc.internal.SQLExceptionMapper.throwException(SQLExceptionMapper.java:106)
	at org.mariadb.jdbc.MySQLStatement.executeQueryEpilog(MySQLStatement.java:264)
	at org.mariadb.jdbc.MySQLStatement.execute(MySQLStatement.java:291)
	at org.mariadb.jdbc.MySQLStatement.execute(MySQLStatement.java:341)
	at org.mariadb.jdbc.GiganticLoadDataInfileTest.giganticLoadDataInfileTest(GiganticLoadDataInfileTest.java:52)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:601)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:44)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:180)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:41)
	at org.junit.runners.ParentRunner$1.evaluate(ParentRunner.java:173)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:220)
	at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
	at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:459)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:675)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:382)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:192)
Caused by: org.mariadb.jdbc.internal.common.QueryException: Got packets out of order
	at org.mariadb.jdbc.internal.mysql.MySQLProtocol.getResult(MySQLProtocol.java:947)
	at org.mariadb.jdbc.internal.mysql.MySQLProtocol.executeQuery(MySQLProtocol.java:996)
	at org.mariadb.jdbc.MySQLStatement.execute(MySQLStatement.java:284)
	... 26 more
 

My suggested fix for this is to use a Math.min(MAX_PACKET_LENGTH, MySQLProtocol.maxAllowedPacket)

Comment by Massimo Siani (Inactive) [ 2014-10-08 ]

Thanks for the feedback.

Generated at Thu Feb 08 03:13:17 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.