[MDEV-28341] Query is not killed when client TCP connection is dropped Created: 2022-04-18  Updated: 2022-04-18

Status: Open
Project: MariaDB Server
Component/s: Server
Affects Version/s: 10.5.15
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Archie Cobbs Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Environment:

openSUSE Leap 15.3

libmariadb3-3.1.13-3.30.1.x86_64
mariadb-10.5.15-150300.3.15.1.x86_64
mariadb-client-10.5.15-150300.3.15.1.x86_64
mariadb-errormessages-10.5.15-150300.3.15.1.noarch
mariadb-tools-10.5.15-150300.3.15.1.x86_64



 Description   

I have a web application running under tomcat talking to MariaDB running on the same machine. The connection is via Connector/J via an URL like this:

jdbc:mysql://127.0.0.1:3306/mydb
  ?jdbcCompliantTruncation=false
  &cachePrepStmts=true
  &prepStmtCacheSize=200
  &prepStmtCacheSqlLimit=4096
  &cacheCallableStmts=true
  &cacheResultSetMetadata=true
  &useUnicode=true
  &cacheServerConfiguration=true
  &logger=Slf4JLogger
  &includeThreadNamesAsStatementComment=true

Note we have a TCP connection via localhost:3306.

Occasionally, when restarting the application (note, we're talking about a COMPLETE RESTART of tomcat), the startup process will hang.

It turns out the reason for the hang is that at startup there is a sequence of standard SQL queries that always run. The one that it blocks on is DROP FUNCTION IF EXISTS `mydb`.REGMATCH, and this query is being blocked due to locks being held by another long-running query that is STILL RUNNING even though it was started by the previous invocation of tomcat, which is long ago dead and gone.

Using a little inference, the problem here must be that the mariadb server does not detect an error on client TCP sockets until it actually attempts to read or write those sockets.

This means that if you connect to the server with a process X, initiate a long-running query Q, and terminate process X, the query Q is NOT killed immediately, but will only die some time in the future whenever the server finally gets around to actually trying to return results and therefore writing to the socket.

Obviously, the query is already doomed as soon as process X disconnects the TCP socket, so there's certainly no point in query Q continuing to exist after that point.

The fix is that there needs to be better asynchronous monitoring on the TCP socket to detect error conditions that occur while the main thread reading and writing to the TCP socket and doing the actual query is off doing other things, so that it can be killed immediately if the socket disconnects.

I would think the socket is already being monitored for incoming data, because isn't it possible to asynchronously kill queries via JDBC (which would imply this)?

Or maybe the problem is this one?



 Comments   
Comment by Archie Cobbs [ 2022-04-18 ]

Just a little more background info on possible cause:

Quoting poll(2):

NOTES
       For a discussion of what may happen if a file descriptor being monitored by poll()  is  closed  in  another  thread,  see
       select(2).

Quoting select(2):

NOTES
   Multithreaded applications
       If a file descriptor being monitored by select() is closed in another thread, the result is unspecified.   On  some  UNIX
       systems,  select() unblocks and returns, with an indication that the file descriptor is ready (a subsequent I/O operation
       will likely fail with an error, unless another the file descriptor reopened between the time select()  returned  and  the
       I/O  operations  was performed).  On Linux (and some other systems), closing the file descriptor in another thread has no
       effect on select().  In summary, any application that relies on a particular behavior in this scenario must be considered
       buggy.

Generated at Thu Feb 08 09:59:57 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.