Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.3.20, 10.4.10
-
None
-
Linux
Description
When using pool-of-threads, clients that have their latest query still in the thread pool queue, waiting for a free execution thread, do not have that query shown in the SHOW PROCESSLIST output, instead they show with State: Sleep.
Apparently they are also considered idle by the server itself, as after a query has been waiting in the thread pool queue for more than wait_timeout seconds the connection gets terminated with a timeout message in the servers error log stating a read timeout (if log_warning>2).
The client just receives an unspecific "Lost connection to MySQL server during query" error without actual information why the query and the connection was terminated.
Even worse: sending the query to the server does not even reset the idle timer. With wait_timeout=60 and a client that has already idle for 30 seconds, query and connection are terminated after another 30 seconds already.
IMHO this is wrong as the client is not actually idle, the responsibility for the inactivity is totally on the server side, and even if we consider to terminate a query after a certain wait time in the pool-of-threads queue, there should be a more meaningful error message for this, and a separately configurable timeout for this.
Attachments
Issue Links
- relates to
-
MDEV-21103 Show thread pool wait status in SHOW PROCESSLIST
-
- Open
-
What settings do you use , so that queueing time actually matters? 30 sec queueing as in your example is a long time.
The responsibility of inactivity can be on the server side, due to misconfiguration, or network can be blamed, too. Server cannot accurately count client's idle time, it is not the same process, and is often a separate machine .
We do not know for sure when the client request arrives either. It can stay in the epoll queue, until it is picked up by listener. And listener might temporarily assume "worker" responsibility, so that a thread group has no listener at all during that time. Usually this period is short, but this of course can be misconfigured with huge stall_limit.