[MDEV-21101] Clients run into wait_timeout while waiting for a free thread with pool-of-threads Created: 2019-11-20 Updated: 2020-07-30 Resolved: 2020-07-30 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Server |
| Affects Version/s: | 10.3.20, 10.4.10 |
| Fix Version/s: | 10.3.24, 10.4.14, 10.5.5 |
| Type: | Bug | Priority: | Major |
| Reporter: | Hartmut Holzgraefe | Assignee: | Vladislav Vaintroub |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Linux |
||
| Issue Links: |
|
||||||||
| Description |
|
When using pool-of-threads, clients that have their latest query still in the thread pool queue, waiting for a free execution thread, do not have that query shown in the SHOW PROCESSLIST output, instead they show with State: Sleep. Apparently they are also considered idle by the server itself, as after a query has been waiting in the thread pool queue for more than wait_timeout seconds the connection gets terminated with a timeout message in the servers error log stating a read timeout (if log_warning>2). The client just receives an unspecific "Lost connection to MySQL server during query" error without actual information why the query and the connection was terminated. Even worse: sending the query to the server does not even reset the idle timer. With wait_timeout=60 and a client that has already idle for 30 seconds, query and connection are terminated after another 30 seconds already. IMHO this is wrong as the client is not actually idle, the responsibility for the inactivity is totally on the server side, and even if we consider to terminate a query after a certain wait time in the pool-of-threads queue, there should be a more meaningful error message for this, and a separately configurable timeout for this. |
| Comments |
| Comment by Vladislav Vaintroub [ 2019-11-20 ] |
|
What settings do you use , so that queueing time actually matters? 30 sec queueing as in your example is a long time. |
| Comment by Vladislav Vaintroub [ 2020-07-30 ] |
|
hholzgra, to the general problem about why does not server writes the last goodbye message to peer when closing the connection. In many cases, such as pooled connections, client won't be actively reading, and server sends a message, and closes the socket, and since there is data on that connection, and the client does not close it, the connection will continue to persist in TCP, using its resources, and client's ephemeral ports. Another example: So, the correct thing to do, in most cases, where server closes the connection first, is I think is just to close the connection and do it "forcefully", with TCP reset. https://jira.mariadb.org/browse/MDEV-14113 discusses that in some more detail. Closing connection gracefully will only result in client reading 0 bytes, and TCP ephemeral port shortage on client, and other TCP resources on server, due to TIME_WAIT state. There are some exception to this, when client connection is active, i.e query is running, and connection is being killed. A notable case where server should send a message are problems during connection setup, "Too many connections", "user blocked", "authentication error", and such. client is actively reading server response in this case. But otherwise, I guess logging the error server-side, and killing TCP connection without the last goodbye, would be the correct thing to do. Finally, there used to be some problems with "out-of-band" error messages, misinterpreted by multiple client drivers as protocol breakage, see Also, the well-meant "connection was killed" message turned up pretty bad for the client connection pools. https://github.com/sqlalchemy/sqlalchemy/issues/4945. TCP close with reset would be much-much better. |
| Comment by Hartmut Holzgraefe [ 2020-07-30 ] |
|
Well, in this case the client is actively waiting, it has sent a query and is waiting for the results. So it should not run into wait_timeout as it did not let the server wait for new client activity for longer than wait_timeout, it is actually the server that has been starving it for more than wait_timeout seconds ... |
| Comment by Vladislav Vaintroub [ 2020-07-30 ] |
|
yes, this case is fixed. I was commenting on the general rant about server being quiet on why it closed the client. |