[CONJ-1091] can't make a connection when the Read Replica DB is in a hang state when SocketTimeout=0 set Created: 2023-07-21  Updated: 2023-08-28  Resolved: 2023-07-26

Status: Closed
Project: MariaDB Connector/J
Component/s: 2.7 compatibility, Failover
Affects Version/s: 2.7.9
Fix Version/s: 3.2.0, 2.7.10

Type: Bug Priority: Major
Reporter: choi heesung Assignee: Diego Dupin
Resolution: Fixed Votes: 0
Labels: None


 Description   

We are using Maria Driver 2.6.2 version to connect MariaDB and Aurora. For Aurora, we configured the primary DB and ReadOnly replica DB using the 'Aurora' keyword, while for MariaDB, we used the 'Replication' keyword for configuration.

Here are the connection details we used:

MariaDB:
jdbc:mariadb:replication://primary1,replica1/test

Aurora:
jdbc:mariadb:aurora://primary1,replica1/test?SocketTimeout=0

[Problem]
We are experiencing issues when replica1 DB is under heavy load, causing the CPU usage to approach 100%. Even if replica1 DB is in an abnormal state, we expected the primary1 DB to be usable. However, the jdbc driver is unable to establish a connection and gets stuck in a hang state.

[Trial 1]
Setting connectTimeout does not resolve this problem.

[Trial 2]
Setting SocketTimeout to a shorter value could resolve this issue, but we need a very long query execution time, so SocketTimeout=0 is required.

[Hang position]
https://github.com/mariadb-corporation/mariadb-connector-j/blob/3021f01f1a8b28558f4083ec6b3feb6dea3ee665/src/main/java/org/mariadb/jdbc/internal/protocol/AbstractConnectProtocol.java#L540

When attempting to create a connection, primary1 is successfully acquired, but replica1 gets stuck in a hang state as it does not receive a response in ReadInitialHandShakePacket(reader).

[Request]
We need a timeout for the process of establishing the connection, including the initialization settings after handshaking. This timeout should not be limited to TCP ConnectTimeout but should include the interval until the initialization settings are completed, as shown in the following section:
https://github.com/mariadb-corporation/mariadb-connector-j/blob/3021f01f1a8b28558f4083ec6b3feb6dea3ee665/src/main/java/org/mariadb/jdbc/internal/protocol/AbstractConnectProtocol.java#L598"



 Comments   
Comment by Diego Dupin [ 2023-07-25 ]

This is a strange case to have socket successful established, but packet not exchanged, but this can clearly occurs when using proxy.
Best would be to set another "connection socket timeout", but for simplicity, connectTimeout that is normally only set for socket creation can be reused for socket timeout until connection is successfully connected, then if set, socketTimeout will normally be set.

Comment by Diego Dupin [ 2023-07-25 ]

correction available using snapshot (3.2.0-SNAPSHOT or 2.7.10-SNAPSHOT)

<repositories>
    <repository>
        <id>sonatype-nexus-snapshots</id>
        <name>Sonatype Nexus Snapshots</name>
        <url>https://oss.sonatype.org/content/repositories/snapshots</url>
    </repository>
</repositories>
 
<dependencies>
    <dependency>
        <groupId>org.mariadb.jdbc</groupId>
        <artifactId>mariadb-java-client</artifactId>
        <version>3.2.0-SNAPSHOT</version>
    </dependency>
</dependencies>

Comment by choi heesung [ 2023-07-25 ]

Thank you for reply.

The snapshot version resolved the problem and worked accurately.

Generated at Thu Feb 08 03:20:33 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.