Uploaded image for project: 'MariaDB Connector/J'
  1. MariaDB Connector/J
  2. CONJ-484

Aurora failover => ArrayIndexOutOfBoundsException / AccessDenied

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Incomplete
    • 1.5.9
    • N/A
    • aurora
    • None
    • Aurora

    Description

      Hi all,

      I'm running Aurora failover testing with the help of MariaDB connector recently and discovered this weird issue.

      Preparation:

      • Aurora cluster with 2 instances: test-01 and test-02, cluster end point is test-cluster.
      • jdbcUrl="jdbc:mysql:aurora://test-cluster.rds.amazonaws.com:3306/Test?log=true"
      • Using MariaDB connector version 1.5.9 due to concern on usePipeAuth.
      • Java server application have all kinds of queries hitting db server.

      Test1:

      • test-01 is the writer and test-02 is the reader.
      • launch server application.
        Result: fine.
        (Clean up by terminating server application)

      Test2:

      • Click failover in AWS console for test cluster.
      • Wait until it completes: test-02 is the writer and test-01 is the reader
      • launch server application
        Result: getting exception.
        (Clean up by terminating server application)

      Exception:

      Caused by: java.lang.ArrayIndexOutOfBoundsException: -10
          at org.mariadb.jdbc.internal.util.buffer.Buffer.readShort(Buffer.java:98) ~[server.jar:?]
          at org.mariadb.jdbc.internal.packet.dao.ColumnInformation.<init>(ColumnInformation.java:148) ~[server.jar:?]
          at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.readResultSet(AbstractQueryProtocol.java:1195) ~[server.jar:?]
          at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.readPacket(AbstractQueryProtocol.java:1089) ~[server.jar:?]
          at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.getResult(AbstractQueryProtocol.java:1031) ~[server.jar:?]
          at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.executeQuery(AbstractQueryProtocol.java:203) ~[server.jar:?]
          at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source) ~[?:?]
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_121]
          at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_121]
          at org.mariadb.jdbc.internal.failover.AbstractMastersListener.invoke(AbstractMastersListener.java:355) ~[server.jar:?]
          at org.mariadb.jdbc.internal.failover.FailoverProxy.executeInvocation(FailoverProxy.java:200) ~[server.jar:?]
          at org.mariadb.jdbc.internal.failover.FailoverProxy.invoke(FailoverProxy.java:192) ~[server.jar:?]
          at com.sun.proxy.$Proxy42.executeQuery(Unknown Source) ~[?:?]
          at org.mariadb.jdbc.MariaDbClientPreparedStatement.executeInternal(MariaDbClientPreparedStatement.java:224) ~[server.jar:?]
          at org.mariadb.jdbc.MariaDbClientPreparedStatement.executeQuery(MariaDbClientPreparedStatement.java:177) ~[server.jar:?]
      

      Note:

      • Tested on the same environment with com.mysql.jdbc driver and removed "aurora" from jdbc url and things are working fine => issue comes from mariadb driver.
      • My server application has nothing instance specific: the only information aurora specific is the jdbc url and it was using cluster end point
      • For the query that triggered this, I've compared the table description and data between test-01 and test-02 and they are identical.
      • Query="select * from table_timestamp"
      • Table description(table_timestamp):

        +--------------+------------------+------+-----+-------------------+-----------------------------+
        | Field        | Type             | Null | Key | Default           | Extra                       |
        +--------------+------------------+------+-----+-------------------+-----------------------------+
        | last_updated | timestamp        | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
        | update_count | int(10) unsigned | NO   |     | 0                 |                             |
        +--------------+------------------+------+-----+-------------------+-----------------------------+
        2 rows in set (0.00 sec)
        

      • Manually tested the same query using mysql command line, getting identical results from test-01 and test-02
      • I also got another different exception, but it's from the same query.

      java.sql.SQLException: (conn:4027) Packets out of order when reading field packets, expected was EOF stream. Packet contents (hex) = 03
      Query is: CALL Test.sp_get_timestamp()
          at org.mariadb.jdbc.internal.util.ExceptionMapper.get(ExceptionMapper.java:150) ~[server.jar:?]
          at org.mariadb.jdbc.internal.util.ExceptionMapper.getException(ExceptionMapper.java:101) ~[server.jar:?]
          at org.mariadb.jdbc.internal.util.ExceptionMapper.throwAndLogException(ExceptionMapper.java:77) ~[server.jar:?]
          at org.mariadb.jdbc.MariaDbStatement.executeQueryEpilog(MariaDbStatement.java:226) ~[server.jar:?]
          at org.mariadb.jdbc.MariaDbServerPreparedStatement.executeInternal(MariaDbServerPreparedStatement.java:413) ~[server.jar:?]
          at org.mariadb.jdbc.MariaDbProcedureStatement.execute(MariaDbProcedureStatement.java:168) ~[server.jar:?]
          at org.mariadb.jdbc.MariaDbServerPreparedStatement.executeQuery(MariaDbServerPreparedStatement.java:343) ~[server.jar:?]
      

      • The sproc is as simple as one select query as listed above.

        BEGIN
          select * from gift_timestamp;
        END
        

      I've tested with other slightly older versions as well and have seen the same result.
      Can someone help me out here? Many thanks in advance!

      Please don't hesitate to reach out to me if you need more information. I'm subscribed to this thread and should be updated if you leave a comment.

      Update
      With the same test I'm also seeing this exception periodically:

      Caused by: org.mariadb.jdbc.internal.util.dao.QueryException: Access denied for user 'someusername'@'10.%' (using password: YES)
      Query is: CALL Test.sp_get_timestamp()
          at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.readErrorPacket(AbstractQueryProtocol.java:1163) ~[server.jar:?]
          at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.readPacket(AbstractQueryProtocol.java:1093) ~[server.jar:?]
          at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.getResult(AbstractQueryProtocol.java:1044) ~[server.jar:?]
          at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.executePreparedQuery(AbstractQueryProtocol.java:628) ~[server.jar:?]
          at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) ~[?:?]
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_121]
          at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_121]
          at org.mariadb.jdbc.internal.failover.AbstractMastersListener.invoke(AbstractMastersListener.java:351) ~[server.jar:?]
          at org.mariadb.jdbc.internal.failover.FailoverProxy.invoke(FailoverProxy.java:156) ~[server.jar:?]
          at com.sun.proxy.$Proxy43.executePreparedQuery(Unknown Source) ~[?:?]
          at org.mariadb.jdbc.MariaDbServerPreparedStatement.executeInternal(MariaDbServerPreparedStatement.java:401) ~[server.jar:?]
          ... 36 more
      

      Note:
      This is solvable by restarting the server several times but it will be a blocker for me to deploy mariadb driver to production.
      Can someone fix this as well?

      Attachments

        Activity

          xdu Xing Du added a comment -

          I did some debugging with by importing the source code of version 1.5.9 and adding debug statement.

          it seems to me that the problem can be somewhere in ReadPacketFetcher.getReusableBuffer()

          in FailoverProxy.executeInvocation I'm getting a invocation target exception, which is caused by:

          org.mariadb.jdbc.internal.util.dao.QueryException: Unexpected packet returned by server, first byte -2
          on HostAddress{host='test02.amazonaws.com', port=3306},master=true
              at org.mariadb.jdbc.internal.packet.ComStmtPrepare.read(ComStmtPrepare.java:171) ~[server.jar:?]
              at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.prepare(AbstractQueryProtocol.java:355) ~[server.jar:?]
          

          And in ComStmtPrepare.read the exception is caused by invalid first byte from the buffer.

          Buffer buffer = packetFetcher.getPacket();
          packetFetcher.getReusableBuffer();
          byte firstByte = buffer.getByteAt(0);
          if (firstByte == Packet.ERROR) {
            // throw another different exception
          }
           
          if (firstByte == Packet.OK) {
            // handle
          } else {
            // throw the exception i'm seeing.
          }
          

          I was wondering if you can spend some time digging into the recycling logic of buffers and see if there is a potential issue, given that you may not be able to reproduce the exception that I'm seeing.
          If there's something you'd like me to try for you, please let me know and I can see if that fixes this issue.

          Thanks in advance!

          -X

          xdu Xing Du added a comment - I did some debugging with by importing the source code of version 1.5.9 and adding debug statement. it seems to me that the problem can be somewhere in ReadPacketFetcher.getReusableBuffer() in FailoverProxy.executeInvocation I'm getting a invocation target exception, which is caused by: org.mariadb.jdbc.internal.util.dao.QueryException: Unexpected packet returned by server, first byte - 2 on HostAddress{host= 'test02.amazonaws.com' , port= 3306 },master= true at org.mariadb.jdbc.internal.packet.ComStmtPrepare.read(ComStmtPrepare.java: 171 ) ~[server.jar:?] at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.prepare(AbstractQueryProtocol.java: 355 ) ~[server.jar:?] And in ComStmtPrepare.read the exception is caused by invalid first byte from the buffer. Buffer buffer = packetFetcher.getPacket(); packetFetcher.getReusableBuffer(); byte firstByte = buffer.getByteAt( 0 ); if (firstByte == Packet.ERROR) { // throw another different exception }   if (firstByte == Packet.OK) { // handle } else { // throw the exception i'm seeing. } I was wondering if you can spend some time digging into the recycling logic of buffers and see if there is a potential issue, given that you may not be able to reproduce the exception that I'm seeing. If there's something you'd like me to try for you, please let me know and I can see if that fixes this issue. Thanks in advance! -X
          diego dupin Diego Dupin added a comment - - edited

          Hi xdu This is due to the same issue than for CONJ-477 : pipelining :
          disabling option useBatchMultiSend and usePipelineAuth solve those issues : see description : https://mariadb.com/kb/en/mariadb/option-batchmultisend-description/
          I'm guessing Aurora has a proxy that drains the socket with non-blocking reads, but only processes the very first protocol packet, discarding the rest of the input. and according to the speed the server process the input frequency of this issue occur more or less often.

          Those option will now be set to false when driver discover that server is aurora to avoid thoses issues. Problem is send to amazon, so we can discuss correction.

          diego dupin Diego Dupin added a comment - - edited Hi xdu This is due to the same issue than for CONJ-477 : pipelining : disabling option useBatchMultiSend and usePipelineAuth solve those issues : see description : https://mariadb.com/kb/en/mariadb/option-batchmultisend-description/ I'm guessing Aurora has a proxy that drains the socket with non-blocking reads, but only processes the very first protocol packet, discarding the rest of the input. and according to the speed the server process the input frequency of this issue occur more or less often. Those option will now be set to false when driver discover that server is aurora to avoid thoses issues. Problem is send to amazon, so we can discuss correction.
          diego dupin Diego Dupin added a comment - - edited

          After a second thought, that need some testing, to my understanding, the error "ArrayIndexOutOfBoundsException: -10" must not normally occur.

          Error would normally then be similar to :

          Caused by: org.mariadb.jdbc.internal.util.dao.QueryException: Communications link failure with primary host xxx.rds.amazonaws.com:3306. Could not read packet: unexpected end of stream, read 0 bytes from 4
          Query is : call sp_get_timestamp()
          on HostAddress{host='xxx.rds.amazonaws.com', port=3306},master=true.  Driver has reconnect connection
          	at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.readPacket(AbstractQueryProtocol.java:1060)
          	at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.getResult(AbstractQueryProtocol.java:1031)
          	at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.executeQuery(AbstractQueryProtocol.java:152)
          	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
          	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          	at java.lang.reflect.Method.invoke(Method.java:498)
          

          Having an "ArrayIndexOutOfBoundsException: -10" error mean that exchange with server has interrupted without the killing of the socket ?! Aurora must have added some proxy of some sort.

          I'm checking that now, and if so, exchange interruption without socket failure must have a better implementation to have a proper error.

          diego dupin Diego Dupin added a comment - - edited After a second thought, that need some testing, to my understanding, the error "ArrayIndexOutOfBoundsException: -10" must not normally occur. Error would normally then be similar to : Caused by: org.mariadb.jdbc.internal.util.dao.QueryException: Communications link failure with primary host xxx.rds.amazonaws.com: 3306 . Could not read packet: unexpected end of stream, read 0 bytes from 4 Query is : call sp_get_timestamp() on HostAddress{host= 'xxx.rds.amazonaws.com' , port= 3306 },master= true . Driver has reconnect connection at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.readPacket(AbstractQueryProtocol.java: 1060 ) at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.getResult(AbstractQueryProtocol.java: 1031 ) at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.executeQuery(AbstractQueryProtocol.java: 152 ) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java: 62 ) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java: 43 ) at java.lang.reflect.Method.invoke(Method.java: 498 ) Having an "ArrayIndexOutOfBoundsException: -10" error mean that exchange with server has interrupted without the killing of the socket ?! Aurora must have added some proxy of some sort. I'm checking that now, and if so, exchange interruption without socket failure must have a better implementation to have a proper error.
          diego dupin Diego Dupin added a comment -

          After testing, i didn't reproduced any error of this kind.
          Doesn't you use executeBatch just before that with the "useBatchMultiSend" active ? (active by default).
          Do you reproduce the issue easily ? if so, can you disable "useBatchMultiSend" an retry ?

          diego dupin Diego Dupin added a comment - After testing, i didn't reproduced any error of this kind. Doesn't you use executeBatch just before that with the "useBatchMultiSend" active ? (active by default). Do you reproduce the issue easily ? if so, can you disable "useBatchMultiSend" an retry ?
          xdu Xing Du added a comment -

          Hi Diego,

          thanks for getting back to me! I really appreciated it.

          > Do you reproduce the issue easily ? if so, can you disable "useBatchMultiSend" an retry ?
          I get all of these exceptions every time I launch (as long as I have test-02 as writer and test-01 as reader, no exception the other way around)
          I've noticed "useBatchMultiSend" was default to false if I specifcy aurora in the URL from your documentation so I did not disable that explicitly.
          And thoughts on that: if "useBatchMultiSend" is the culprit, I should receive the same exceptions when test01 being writer and test02 being writer.

          Did you get a chance to review the logic in recycling buffers? (getReusableBuffer)
          I was wondering if you can point me a way to issue a new buffer instead of getting a recycled one, so that I can verify if this is the cause.

          xdu Xing Du added a comment - Hi Diego, thanks for getting back to me! I really appreciated it. > Do you reproduce the issue easily ? if so, can you disable "useBatchMultiSend" an retry ? I get all of these exceptions every time I launch (as long as I have test-02 as writer and test-01 as reader, no exception the other way around) I've noticed "useBatchMultiSend" was default to false if I specifcy aurora in the URL from your documentation so I did not disable that explicitly. And thoughts on that: if "useBatchMultiSend" is the culprit, I should receive the same exceptions when test01 being writer and test02 being writer. Did you get a chance to review the logic in recycling buffers? (getReusableBuffer) I was wondering if you can point me a way to issue a new buffer instead of getting a recycled one, so that I can verify if this is the cause.
          xdu Xing Du added a comment -

          From tcpdump I see mariadb driver is fetching the instances behind the aurora cluster end point. (stored in a table in aurora)
          given this issue only happens when I switch the roles, I was wondering if you can point me to any failover specific places that may have caused this.

          xdu Xing Du added a comment - From tcpdump I see mariadb driver is fetching the instances behind the aurora cluster end point. (stored in a table in aurora) given this issue only happens when I switch the roles, I was wondering if you can point me to any failover specific places that may have caused this.
          diego dupin Diego Dupin added a comment -

          I cannot point to a specific place, the method "executeQuery" in the error log is one of the most used method.
          if you could post the tcpdump / log with trace level here, that would be great. That would permit to identify that place.
          Because strangely it seems a protocol issue.

          diego dupin Diego Dupin added a comment - I cannot point to a specific place, the method "executeQuery" in the error log is one of the most used method. if you could post the tcpdump / log with trace level here, that would be great. That would permit to identify that place. Because strangely it seems a protocol issue.
          xdu Xing Du added a comment -

          I agree with you. what format of tcpdump result are you most comfortable with? if you have a preferred group of tcpdump args, please let me know.

          xdu Xing Du added a comment - I agree with you. what format of tcpdump result are you most comfortable with? if you have a preferred group of tcpdump args, please let me know.
          diego dupin Diego Dupin added a comment -

          any will do with the -X option (packet content with hexa and ASCII) if i remember that right

          diego dupin Diego Dupin added a comment - any will do with the -X option (packet content with hexa and ASCII) if i remember that right
          xdu Xing Du added a comment -

          sry for the late reply. I was out of office and did not get a chance to do this.
          my test environment was tore apart and I need to set those up again to reproduce this issue. will attach tcpdump info when I have that.

          xdu Xing Du added a comment - sry for the late reply. I was out of office and did not get a chance to do this. my test environment was tore apart and I need to set those up again to reproduce this issue. will attach tcpdump info when I have that.
          diego dupin Diego Dupin added a comment -

          closing since no news and not reproduced

          diego dupin Diego Dupin added a comment - closing since no news and not reproduced

          People

            diego dupin Diego Dupin
            xdu Xing Du
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.