[CONJ-863] Ensure socket state when SocketTimeout occurs - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 2.7.2
Fix Version/s: 2.7.4
Component/s: aurora
Labels:
None
Environment:
Java 8, Aurora MySQL 5.7

Description

When the Driver encounters a "SocketTimeoutException - Read timed out" it doesn't force close the Socket, leaving the TCP connection open and therefore doesn't rollback previously started transactions on the database.

If the Read timeout happens because of temporary high load or some temporary locks this can cause a situation where zombie transactions keep holding locks / are being executed for a very long time effectively causing a majority of the table to become locked and transactions piling up.

Example Scenarion (default 10s socketTimeout):
00:00 -> BEGIN;
00:00 -> DELETE FROM MyTable WHERE reference = '1234';
00:10 -> Socket read timeout, driver abandons the connection - but doesn't close the Socket
00:15 -> Worst case the query is being retried by the client because it failed before ... (causing even more hanging transactions)
00:25 -> Delete complete - but never committed or rolled back since the TCP connection is abandoned by the driver (but the TCP connection is still Valid/Open).

From our experience there is no recovery from this situation other than a manual database failover / restart of all services to force close all zombie tcp connections by shutting down the JVMs & transactions caused by the driver.

Fix
After a couple incidents on our production system we've ended up manually patching the driver to call the following code in `AbstractQueryProtocol.handleIoException.

public class AbstractQueryProtocol extends AbstractConnectProtocol implements Protocol {

  public SQLException handleIoException(Exception initialException) {

    ....

    else if (initialException instanceof SocketTimeoutException) {

      maxSizeError = false;

      try {

        destroySocket();

      } catch (Exception e) {

        logger.info("Failed to close socket after SocketTimeoutException - " + e.getMessage());

    ....

Since MySQL 5.7/Aurora doesn't support the `max_statement_time` of MariaDB there is currently no good way of dealing with this issue from our knowledge (setting a wait_timeout of 10s would have the same effect, but this causes other problems with connection pooling).

Reproduction
1. Create MyTable
2. Connect with MariaDB/J driver - low socketTimeout
3. Manually execute `LOCK TABLES MyTable WRITE`
4. Run `BEGIN; INSERT INTO MyTable ..;` from Java
5. After Read timeout has occured check `SHOW PROCESSLIST`, our transaction is still there, but it will never get committed or rolled back by the driver.
Note: Same behaviour when using a very large table and attempt to execute any DML that exceeds the socketTimeout.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

mariadb-zombie-connections.zip
4.33 MB
2021-07-28 23:07

Activity

Transition	Time In Source Status	Execution Times

Diego Dupin made transition - 2021-07-29 09:22

Open

Closed

137d 15h 5m

People

Assignee:: Diego Dupin

Reporter:: Emil von Wattenwyl

Votes:: 1 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2021-03-13 18:17

Updated:: 2021-07-29 16:32

Resolved:: 2021-07-29 09:22

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Connector/J

Details

Description

Attachments

Attachments

Activity

People

Dates

Git Integration