[CONJ-484] Aurora failover => ArrayIndexOutOfBoundsException / AccessDenied - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Incomplete
Affects Version/s: 1.5.9
Fix Version/s: N/A
Component/s: aurora
Labels:
None
Environment:
Aurora

Description

Hi all,

I'm running Aurora failover testing with the help of MariaDB connector recently and discovered this weird issue.

Preparation:

Aurora cluster with 2 instances: test-01 and test-02, cluster end point is test-cluster.
jdbcUrl="jdbc:mysql:aurora://test-cluster.rds.amazonaws.com:3306/Test?log=true"
Using MariaDB connector version 1.5.9 due to concern on usePipeAuth.
Java server application have all kinds of queries hitting db server.

Test1:

test-01 is the writer and test-02 is the reader.
launch server application.
Result: fine.
(Clean up by terminating server application)

Test2:

Click failover in AWS console for test cluster.
Wait until it completes: test-02 is the writer and test-01 is the reader
launch server application
Result: getting exception.
(Clean up by terminating server application)

Exception:

Caused by: java.lang.ArrayIndexOutOfBoundsException: -10

    at org.mariadb.jdbc.internal.util.buffer.Buffer.readShort(Buffer.java:98) ~[server.jar:?]

    at org.mariadb.jdbc.internal.packet.dao.ColumnInformation.<init>(ColumnInformation.java:148) ~[server.jar:?]

    at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.readResultSet(AbstractQueryProtocol.java:1195) ~[server.jar:?]

    at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.readPacket(AbstractQueryProtocol.java:1089) ~[server.jar:?]

    at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.getResult(AbstractQueryProtocol.java:1031) ~[server.jar:?]

    at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.executeQuery(AbstractQueryProtocol.java:203) ~[server.jar:?]

    at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source) ~[?:?]

    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_121]

    at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_121]

    at org.mariadb.jdbc.internal.failover.AbstractMastersListener.invoke(AbstractMastersListener.java:355) ~[server.jar:?]

    at org.mariadb.jdbc.internal.failover.FailoverProxy.executeInvocation(FailoverProxy.java:200) ~[server.jar:?]

    at org.mariadb.jdbc.internal.failover.FailoverProxy.invoke(FailoverProxy.java:192) ~[server.jar:?]

    at com.sun.proxy.$Proxy42.executeQuery(Unknown Source) ~[?:?]

    at org.mariadb.jdbc.MariaDbClientPreparedStatement.executeInternal(MariaDbClientPreparedStatement.java:224) ~[server.jar:?]

    at org.mariadb.jdbc.MariaDbClientPreparedStatement.executeQuery(MariaDbClientPreparedStatement.java:177) ~[server.jar:?]

Note:

Tested on the same environment with com.mysql.jdbc driver and removed "aurora" from jdbc url and things are working fine => issue comes from mariadb driver.
My server application has nothing instance specific: the only information aurora specific is the jdbc url and it was using cluster end point
For the query that triggered this, I've compared the table description and data between test-01 and test-02 and they are identical.
Query="select * from table_timestamp"

Table description(table_timestamp):

+--------------+------------------+------+-----+-------------------+-----------------------------+

| Field        | Type             | Null | Key | Default           | Extra                       |

+--------------+------------------+------+-----+-------------------+-----------------------------+

| last_updated | timestamp        | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |

| update_count | int(10) unsigned | NO   |     | 0                 |                             |

+--------------+------------------+------+-----+-------------------+-----------------------------+

2 rows in set (0.00 sec)

Manually tested the same query using mysql command line, getting identical results from test-01 and test-02
I also got another different exception, but it's from the same query.

java.sql.SQLException: (conn:4027) Packets out of order when reading field packets, expected was EOF stream. Packet contents (hex) = 03

Query is: CALL Test.sp_get_timestamp()

    at org.mariadb.jdbc.internal.util.ExceptionMapper.get(ExceptionMapper.java:150) ~[server.jar:?]

    at org.mariadb.jdbc.internal.util.ExceptionMapper.getException(ExceptionMapper.java:101) ~[server.jar:?]

    at org.mariadb.jdbc.internal.util.ExceptionMapper.throwAndLogException(ExceptionMapper.java:77) ~[server.jar:?]

    at org.mariadb.jdbc.MariaDbStatement.executeQueryEpilog(MariaDbStatement.java:226) ~[server.jar:?]

    at org.mariadb.jdbc.MariaDbServerPreparedStatement.executeInternal(MariaDbServerPreparedStatement.java:413) ~[server.jar:?]

    at org.mariadb.jdbc.MariaDbProcedureStatement.execute(MariaDbProcedureStatement.java:168) ~[server.jar:?]

    at org.mariadb.jdbc.MariaDbServerPreparedStatement.executeQuery(MariaDbServerPreparedStatement.java:343) ~[server.jar:?]

The sproc is as simple as one select query as listed above.
BEGIN
select * from gift_timestamp;
END

I've tested with other slightly older versions as well and have seen the same result.
Can someone help me out here? Many thanks in advance!

Please don't hesitate to reach out to me if you need more information. I'm subscribed to this thread and should be updated if you leave a comment.

Update
With the same test I'm also seeing this exception periodically:

Caused by: org.mariadb.jdbc.internal.util.dao.QueryException: Access denied for user 'someusername'@'10.%' (using password: YES)

Query is: CALL Test.sp_get_timestamp()

    at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.readErrorPacket(AbstractQueryProtocol.java:1163) ~[server.jar:?]

    at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.readPacket(AbstractQueryProtocol.java:1093) ~[server.jar:?]

    at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.getResult(AbstractQueryProtocol.java:1044) ~[server.jar:?]

    at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.executePreparedQuery(AbstractQueryProtocol.java:628) ~[server.jar:?]

    at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) ~[?:?]

    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_121]

    at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_121]

    at org.mariadb.jdbc.internal.failover.AbstractMastersListener.invoke(AbstractMastersListener.java:351) ~[server.jar:?]

    at org.mariadb.jdbc.internal.failover.FailoverProxy.invoke(FailoverProxy.java:156) ~[server.jar:?]

    at com.sun.proxy.$Proxy43.executePreparedQuery(Unknown Source) ~[?:?]

    at org.mariadb.jdbc.MariaDbServerPreparedStatement.executeInternal(MariaDbServerPreparedStatement.java:401) ~[server.jar:?]

    ... 36 more

Note:
This is solvable by restarting the server several times but it will be a blocker for me to deploy mariadb driver to production.
Can someone fix this as well?

Attachments

Activity

Ascending order - Click to sort in descending order

Xing Du added a comment - 2017-06-05 03:26

I did some debugging with by importing the source code of version 1.5.9 and adding debug statement.

it seems to me that the problem can be somewhere in ReadPacketFetcher.getReusableBuffer()

in FailoverProxy.executeInvocation I'm getting a invocation target exception, which is caused by:

org.mariadb.jdbc.internal.util.dao.QueryException: Unexpected packet returned by server, first byte -2

on HostAddress{host='test02.amazonaws.com', port=3306},master=true

    at org.mariadb.jdbc.internal.packet.ComStmtPrepare.read(ComStmtPrepare.java:171) ~[server.jar:?]

    at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.prepare(AbstractQueryProtocol.java:355) ~[server.jar:?]

And in ComStmtPrepare.read the exception is caused by invalid first byte from the buffer.

Buffer buffer = packetFetcher.getPacket();

packetFetcher.getReusableBuffer();

byte firstByte = buffer.getByteAt(0);

if (firstByte == Packet.ERROR) {

  // throw another different exception

if (firstByte == Packet.OK) {

  // handle

} else {

  // throw the exception i'm seeing.

I was wondering if you can spend some time digging into the recycling logic of buffers and see if there is a potential issue, given that you may not be able to reproduce the exception that I'm seeing.
If there's something you'd like me to try for you, please let me know and I can see if that fixes this issue.

Thanks in advance!

-X

Xing Du added a comment - 2017-06-05 03:26 I did some debugging with by importing the source code of version 1.5.9 and adding debug statement. it seems to me that the problem can be somewhere in ReadPacketFetcher.getReusableBuffer() in FailoverProxy.executeInvocation I'm getting a invocation target exception, which is caused by: org.mariadb.jdbc.internal.util.dao.QueryException: Unexpected packet returned by server, first byte - 2 on HostAddress{host= 'test02.amazonaws.com' , port= 3306 },master= true at org.mariadb.jdbc.internal.packet.ComStmtPrepare.read(ComStmtPrepare.java: 171 ) ~[server.jar:?] at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.prepare(AbstractQueryProtocol.java: 355 ) ~[server.jar:?] And in ComStmtPrepare.read the exception is caused by invalid first byte from the buffer. Buffer buffer = packetFetcher.getPacket(); packetFetcher.getReusableBuffer(); byte firstByte = buffer.getByteAt( 0 ); if (firstByte == Packet.ERROR) { // throw another different exception } if (firstByte == Packet.OK) { // handle } else { // throw the exception i'm seeing. } I was wondering if you can spend some time digging into the recycling logic of buffers and see if there is a potential issue, given that you may not be able to reproduce the exception that I'm seeing. If there's something you'd like me to try for you, please let me know and I can see if that fixes this issue. Thanks in advance! -X

Diego Dupin added a comment - 2017-06-05 07:57 - edited

Hi xdu This is due to the same issue than for ~~CONJ-477~~ : pipelining :
disabling option useBatchMultiSend and usePipelineAuth solve those issues : see description : https://mariadb.com/kb/en/mariadb/option-batchmultisend-description/
I'm guessing Aurora has a proxy that drains the socket with non-blocking reads, but only processes the very first protocol packet, discarding the rest of the input. and according to the speed the server process the input frequency of this issue occur more or less often.

Those option will now be set to false when driver discover that server is aurora to avoid thoses issues. Problem is send to amazon, so we can discuss correction.

Diego Dupin added a comment - 2017-06-05 07:57 - edited Hi xdu This is due to the same issue than for CONJ-477 : pipelining : disabling option useBatchMultiSend and usePipelineAuth solve those issues : see description : https://mariadb.com/kb/en/mariadb/option-batchmultisend-description/ I'm guessing Aurora has a proxy that drains the socket with non-blocking reads, but only processes the very first protocol packet, discarding the rest of the input. and according to the speed the server process the input frequency of this issue occur more or less often. Those option will now be set to false when driver discover that server is aurora to avoid thoses issues. Problem is send to amazon, so we can discuss correction.

Diego Dupin added a comment - 2017-06-05 10:42 - edited

After a second thought, that need some testing, to my understanding, the error "ArrayIndexOutOfBoundsException: -10" must not normally occur.

Error would normally then be similar to :

Caused by: org.mariadb.jdbc.internal.util.dao.QueryException: Communications link failure with primary host xxx.rds.amazonaws.com:3306. Could not read packet: unexpected end of stream, read 0 bytes from 4

Query is : call sp_get_timestamp()

on HostAddress{host='xxx.rds.amazonaws.com', port=3306},master=true.  Driver has reconnect connection

	at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.readPacket(AbstractQueryProtocol.java:1060)

	at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.getResult(AbstractQueryProtocol.java:1031)

	at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.executeQuery(AbstractQueryProtocol.java:152)

	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

	at java.lang.reflect.Method.invoke(Method.java:498)

Having an "ArrayIndexOutOfBoundsException: -10" error mean that exchange with server has interrupted without the killing of the socket ?! Aurora must have added some proxy of some sort.

I'm checking that now, and if so, exchange interruption without socket failure must have a better implementation to have a proper error.

Diego Dupin added a comment - 2017-06-05 10:42 - edited After a second thought, that need some testing, to my understanding, the error "ArrayIndexOutOfBoundsException: -10" must not normally occur. Error would normally then be similar to : Caused by: org.mariadb.jdbc.internal.util.dao.QueryException: Communications link failure with primary host xxx.rds.amazonaws.com: 3306 . Could not read packet: unexpected end of stream, read 0 bytes from 4 Query is : call sp_get_timestamp() on HostAddress{host= 'xxx.rds.amazonaws.com' , port= 3306 },master= true . Driver has reconnect connection at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.readPacket(AbstractQueryProtocol.java: 1060 ) at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.getResult(AbstractQueryProtocol.java: 1031 ) at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.executeQuery(AbstractQueryProtocol.java: 152 ) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java: 62 ) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java: 43 ) at java.lang.reflect.Method.invoke(Method.java: 498 ) Having an "ArrayIndexOutOfBoundsException: -10" error mean that exchange with server has interrupted without the killing of the socket ?! Aurora must have added some proxy of some sort. I'm checking that now, and if so, exchange interruption without socket failure must have a better implementation to have a proper error.

Diego Dupin added a comment - 2017-06-05 13:31

After testing, i didn't reproduced any error of this kind.
Doesn't you use executeBatch just before that with the "useBatchMultiSend" active ? (active by default).
Do you reproduce the issue easily ? if so, can you disable "useBatchMultiSend" an retry ?

Diego Dupin added a comment - 2017-06-05 13:31 After testing, i didn't reproduced any error of this kind. Doesn't you use executeBatch just before that with the "useBatchMultiSend" active ? (active by default). Do you reproduce the issue easily ? if so, can you disable "useBatchMultiSend" an retry ?

Xing Du added a comment - 2017-06-05 17:09

Hi Diego,

thanks for getting back to me! I really appreciated it.

> Do you reproduce the issue easily ? if so, can you disable "useBatchMultiSend" an retry ?
I get all of these exceptions every time I launch (as long as I have test-02 as writer and test-01 as reader, no exception the other way around)
I've noticed "useBatchMultiSend" was default to false if I specifcy aurora in the URL from your documentation so I did not disable that explicitly.
And thoughts on that: if "useBatchMultiSend" is the culprit, I should receive the same exceptions when test01 being writer and test02 being writer.

Did you get a chance to review the logic in recycling buffers? (getReusableBuffer)
I was wondering if you can point me a way to issue a new buffer instead of getting a recycled one, so that I can verify if this is the cause.

Xing Du added a comment - 2017-06-05 17:09 Hi Diego, thanks for getting back to me! I really appreciated it. > Do you reproduce the issue easily ? if so, can you disable "useBatchMultiSend" an retry ? I get all of these exceptions every time I launch (as long as I have test-02 as writer and test-01 as reader, no exception the other way around) I've noticed "useBatchMultiSend" was default to false if I specifcy aurora in the URL from your documentation so I did not disable that explicitly. And thoughts on that: if "useBatchMultiSend" is the culprit, I should receive the same exceptions when test01 being writer and test02 being writer. Did you get a chance to review the logic in recycling buffers? (getReusableBuffer) I was wondering if you can point me a way to issue a new buffer instead of getting a recycled one, so that I can verify if this is the cause.

Xing Du added a comment - 2017-06-06 17:20

From tcpdump I see mariadb driver is fetching the instances behind the aurora cluster end point. (stored in a table in aurora)
given this issue only happens when I switch the roles, I was wondering if you can point me to any failover specific places that may have caused this.

Xing Du added a comment - 2017-06-06 17:20 From tcpdump I see mariadb driver is fetching the instances behind the aurora cluster end point. (stored in a table in aurora) given this issue only happens when I switch the roles, I was wondering if you can point me to any failover specific places that may have caused this.

Diego Dupin added a comment - 2017-06-06 18:49

I cannot point to a specific place, the method "executeQuery" in the error log is one of the most used method.
if you could post the tcpdump / log with trace level here, that would be great. That would permit to identify that place.
Because strangely it seems a protocol issue.

Diego Dupin added a comment - 2017-06-06 18:49 I cannot point to a specific place, the method "executeQuery" in the error log is one of the most used method. if you could post the tcpdump / log with trace level here, that would be great. That would permit to identify that place. Because strangely it seems a protocol issue.

Xing Du added a comment - 2017-06-06 19:03

I agree with you. what format of tcpdump result are you most comfortable with? if you have a preferred group of tcpdump args, please let me know.

Xing Du added a comment - 2017-06-06 19:03 I agree with you. what format of tcpdump result are you most comfortable with? if you have a preferred group of tcpdump args, please let me know.

Diego Dupin added a comment - 2017-06-06 19:27

any will do with the -X option (packet content with hexa and ASCII) if i remember that right

Diego Dupin added a comment - 2017-06-06 19:27 any will do with the -X option (packet content with hexa and ASCII) if i remember that right

Xing Du added a comment - 2017-06-19 20:24

sry for the late reply. I was out of office and did not get a chance to do this.
my test environment was tore apart and I need to set those up again to reproduce this issue. will attach tcpdump info when I have that.

Xing Du added a comment - 2017-06-19 20:24 sry for the late reply. I was out of office and did not get a chance to do this. my test environment was tore apart and I need to set those up again to reproduce this issue. will attach tcpdump info when I have that.

Diego Dupin added a comment - 2019-03-13 09:47

closing since no news and not reproduced

Diego Dupin added a comment - 2019-03-13 09:47 closing since no news and not reproduced

People

Assignee:: Diego Dupin

Reporter:: Xing Du

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2017-05-31 22:53

Updated:: 2019-03-13 09:47

Resolved:: 2019-03-13 09:47

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Connector/J