[CONJ-484] Aurora failover => ArrayIndexOutOfBoundsException / AccessDenied Created: 2017-05-31  Updated: 2019-03-13  Resolved: 2019-03-13

Status: Closed
Project: MariaDB Connector/J
Component/s: aurora
Affects Version/s: 1.5.9
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: Xing Du Assignee: Diego Dupin
Resolution: Incomplete Votes: 0
Labels: None
Environment:

Aurora



 Description   

Hi all,

I'm running Aurora failover testing with the help of MariaDB connector recently and discovered this weird issue.

Preparation:

  • Aurora cluster with 2 instances: test-01 and test-02, cluster end point is test-cluster.
  • jdbcUrl="jdbc:mysql:aurora://test-cluster.rds.amazonaws.com:3306/Test?log=true"
  • Using MariaDB connector version 1.5.9 due to concern on usePipeAuth.
  • Java server application have all kinds of queries hitting db server.

Test1:

  • test-01 is the writer and test-02 is the reader.
  • launch server application.
    Result: fine.
    (Clean up by terminating server application)

Test2:

  • Click failover in AWS console for test cluster.
  • Wait until it completes: test-02 is the writer and test-01 is the reader
  • launch server application
    Result: getting exception.
    (Clean up by terminating server application)

Exception:

Caused by: java.lang.ArrayIndexOutOfBoundsException: -10
    at org.mariadb.jdbc.internal.util.buffer.Buffer.readShort(Buffer.java:98) ~[server.jar:?]
    at org.mariadb.jdbc.internal.packet.dao.ColumnInformation.<init>(ColumnInformation.java:148) ~[server.jar:?]
    at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.readResultSet(AbstractQueryProtocol.java:1195) ~[server.jar:?]
    at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.readPacket(AbstractQueryProtocol.java:1089) ~[server.jar:?]
    at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.getResult(AbstractQueryProtocol.java:1031) ~[server.jar:?]
    at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.executeQuery(AbstractQueryProtocol.java:203) ~[server.jar:?]
    at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source) ~[?:?]
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_121]
    at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_121]
    at org.mariadb.jdbc.internal.failover.AbstractMastersListener.invoke(AbstractMastersListener.java:355) ~[server.jar:?]
    at org.mariadb.jdbc.internal.failover.FailoverProxy.executeInvocation(FailoverProxy.java:200) ~[server.jar:?]
    at org.mariadb.jdbc.internal.failover.FailoverProxy.invoke(FailoverProxy.java:192) ~[server.jar:?]
    at com.sun.proxy.$Proxy42.executeQuery(Unknown Source) ~[?:?]
    at org.mariadb.jdbc.MariaDbClientPreparedStatement.executeInternal(MariaDbClientPreparedStatement.java:224) ~[server.jar:?]
    at org.mariadb.jdbc.MariaDbClientPreparedStatement.executeQuery(MariaDbClientPreparedStatement.java:177) ~[server.jar:?]

Note:

  • Tested on the same environment with com.mysql.jdbc driver and removed "aurora" from jdbc url and things are working fine => issue comes from mariadb driver.
  • My server application has nothing instance specific: the only information aurora specific is the jdbc url and it was using cluster end point
  • For the query that triggered this, I've compared the table description and data between test-01 and test-02 and they are identical.
  • Query="select * from table_timestamp"
  • Table description(table_timestamp):

    +--------------+------------------+------+-----+-------------------+-----------------------------+
    | Field        | Type             | Null | Key | Default           | Extra                       |
    +--------------+------------------+------+-----+-------------------+-----------------------------+
    | last_updated | timestamp        | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
    | update_count | int(10) unsigned | NO   |     | 0                 |                             |
    +--------------+------------------+------+-----+-------------------+-----------------------------+
    2 rows in set (0.00 sec)
    

  • Manually tested the same query using mysql command line, getting identical results from test-01 and test-02
  • I also got another different exception, but it's from the same query.

java.sql.SQLException: (conn:4027) Packets out of order when reading field packets, expected was EOF stream. Packet contents (hex) = 03
Query is: CALL Test.sp_get_timestamp()
    at org.mariadb.jdbc.internal.util.ExceptionMapper.get(ExceptionMapper.java:150) ~[server.jar:?]
    at org.mariadb.jdbc.internal.util.ExceptionMapper.getException(ExceptionMapper.java:101) ~[server.jar:?]
    at org.mariadb.jdbc.internal.util.ExceptionMapper.throwAndLogException(ExceptionMapper.java:77) ~[server.jar:?]
    at org.mariadb.jdbc.MariaDbStatement.executeQueryEpilog(MariaDbStatement.java:226) ~[server.jar:?]
    at org.mariadb.jdbc.MariaDbServerPreparedStatement.executeInternal(MariaDbServerPreparedStatement.java:413) ~[server.jar:?]
    at org.mariadb.jdbc.MariaDbProcedureStatement.execute(MariaDbProcedureStatement.java:168) ~[server.jar:?]
    at org.mariadb.jdbc.MariaDbServerPreparedStatement.executeQuery(MariaDbServerPreparedStatement.java:343) ~[server.jar:?]

  • The sproc is as simple as one select query as listed above.

    BEGIN
      select * from gift_timestamp;
    END
    

I've tested with other slightly older versions as well and have seen the same result.
Can someone help me out here? Many thanks in advance!

Please don't hesitate to reach out to me if you need more information. I'm subscribed to this thread and should be updated if you leave a comment.

Update
With the same test I'm also seeing this exception periodically:

Caused by: org.mariadb.jdbc.internal.util.dao.QueryException: Access denied for user 'someusername'@'10.%' (using password: YES)
Query is: CALL Test.sp_get_timestamp()
    at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.readErrorPacket(AbstractQueryProtocol.java:1163) ~[server.jar:?]
    at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.readPacket(AbstractQueryProtocol.java:1093) ~[server.jar:?]
    at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.getResult(AbstractQueryProtocol.java:1044) ~[server.jar:?]
    at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.executePreparedQuery(AbstractQueryProtocol.java:628) ~[server.jar:?]
    at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) ~[?:?]
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_121]
    at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_121]
    at org.mariadb.jdbc.internal.failover.AbstractMastersListener.invoke(AbstractMastersListener.java:351) ~[server.jar:?]
    at org.mariadb.jdbc.internal.failover.FailoverProxy.invoke(FailoverProxy.java:156) ~[server.jar:?]
    at com.sun.proxy.$Proxy43.executePreparedQuery(Unknown Source) ~[?:?]
    at org.mariadb.jdbc.MariaDbServerPreparedStatement.executeInternal(MariaDbServerPreparedStatement.java:401) ~[server.jar:?]
    ... 36 more

Note:
This is solvable by restarting the server several times but it will be a blocker for me to deploy mariadb driver to production.
Can someone fix this as well?



 Comments   
Comment by Xing Du [ 2017-06-05 ]

I did some debugging with by importing the source code of version 1.5.9 and adding debug statement.

it seems to me that the problem can be somewhere in ReadPacketFetcher.getReusableBuffer()

in FailoverProxy.executeInvocation I'm getting a invocation target exception, which is caused by:

org.mariadb.jdbc.internal.util.dao.QueryException: Unexpected packet returned by server, first byte -2
on HostAddress{host='test02.amazonaws.com', port=3306},master=true
    at org.mariadb.jdbc.internal.packet.ComStmtPrepare.read(ComStmtPrepare.java:171) ~[server.jar:?]
    at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.prepare(AbstractQueryProtocol.java:355) ~[server.jar:?]

And in ComStmtPrepare.read the exception is caused by invalid first byte from the buffer.

Buffer buffer = packetFetcher.getPacket();
packetFetcher.getReusableBuffer();
byte firstByte = buffer.getByteAt(0);
if (firstByte == Packet.ERROR) {
  // throw another different exception
}
 
if (firstByte == Packet.OK) {
  // handle
} else {
  // throw the exception i'm seeing.
}

I was wondering if you can spend some time digging into the recycling logic of buffers and see if there is a potential issue, given that you may not be able to reproduce the exception that I'm seeing.
If there's something you'd like me to try for you, please let me know and I can see if that fixes this issue.

Thanks in advance!

-X

Comment by Diego Dupin [ 2017-06-05 ]

Hi xdu This is due to the same issue than for CONJ-477 : pipelining :
disabling option useBatchMultiSend and usePipelineAuth solve those issues : see description : https://mariadb.com/kb/en/mariadb/option-batchmultisend-description/
I'm guessing Aurora has a proxy that drains the socket with non-blocking reads, but only processes the very first protocol packet, discarding the rest of the input. and according to the speed the server process the input frequency of this issue occur more or less often.

Those option will now be set to false when driver discover that server is aurora to avoid thoses issues. Problem is send to amazon, so we can discuss correction.

Comment by Diego Dupin [ 2017-06-05 ]

After a second thought, that need some testing, to my understanding, the error "ArrayIndexOutOfBoundsException: -10" must not normally occur.

Error would normally then be similar to :

Caused by: org.mariadb.jdbc.internal.util.dao.QueryException: Communications link failure with primary host xxx.rds.amazonaws.com:3306. Could not read packet: unexpected end of stream, read 0 bytes from 4
Query is : call sp_get_timestamp()
on HostAddress{host='xxx.rds.amazonaws.com', port=3306},master=true.  Driver has reconnect connection
	at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.readPacket(AbstractQueryProtocol.java:1060)
	at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.getResult(AbstractQueryProtocol.java:1031)
	at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.executeQuery(AbstractQueryProtocol.java:152)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)

Having an "ArrayIndexOutOfBoundsException: -10" error mean that exchange with server has interrupted without the killing of the socket ?! Aurora must have added some proxy of some sort.

I'm checking that now, and if so, exchange interruption without socket failure must have a better implementation to have a proper error.

Comment by Diego Dupin [ 2017-06-05 ]

After testing, i didn't reproduced any error of this kind.
Doesn't you use executeBatch just before that with the "useBatchMultiSend" active ? (active by default).
Do you reproduce the issue easily ? if so, can you disable "useBatchMultiSend" an retry ?

Comment by Xing Du [ 2017-06-05 ]

Hi Diego,

thanks for getting back to me! I really appreciated it.

> Do you reproduce the issue easily ? if so, can you disable "useBatchMultiSend" an retry ?
I get all of these exceptions every time I launch (as long as I have test-02 as writer and test-01 as reader, no exception the other way around)
I've noticed "useBatchMultiSend" was default to false if I specifcy aurora in the URL from your documentation so I did not disable that explicitly.
And thoughts on that: if "useBatchMultiSend" is the culprit, I should receive the same exceptions when test01 being writer and test02 being writer.

Did you get a chance to review the logic in recycling buffers? (getReusableBuffer)
I was wondering if you can point me a way to issue a new buffer instead of getting a recycled one, so that I can verify if this is the cause.

Comment by Xing Du [ 2017-06-06 ]

From tcpdump I see mariadb driver is fetching the instances behind the aurora cluster end point. (stored in a table in aurora)
given this issue only happens when I switch the roles, I was wondering if you can point me to any failover specific places that may have caused this.

Comment by Diego Dupin [ 2017-06-06 ]

I cannot point to a specific place, the method "executeQuery" in the error log is one of the most used method.
if you could post the tcpdump / log with trace level here, that would be great. That would permit to identify that place.
Because strangely it seems a protocol issue.

Comment by Xing Du [ 2017-06-06 ]

I agree with you. what format of tcpdump result are you most comfortable with? if you have a preferred group of tcpdump args, please let me know.

Comment by Diego Dupin [ 2017-06-06 ]

any will do with the -X option (packet content with hexa and ASCII) if i remember that right

Comment by Xing Du [ 2017-06-19 ]

sry for the late reply. I was out of office and did not get a chance to do this.
my test environment was tore apart and I need to set those up again to reproduce this issue. will attach tcpdump info when I have that.

Comment by Diego Dupin [ 2019-03-13 ]

closing since no news and not reproduced

Generated at Thu Feb 08 03:16:00 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.