[MXS-4953] Lost connection to backend server: Network error: 104, Connection reset by peer Created: 2024-01-23 Updated: 2024-01-24 |
|
| Status: | Open |
| Project: | MariaDB MaxScale |
| Component/s: | readconnroute |
| Affects Version/s: | 23.08.4 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Patrick Vandenbosch | Assignee: | markus makela |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
DEV |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
Hello, We recently upgraded from maxscale 6.4 to 23.08. 2024-01-23 11:29:07 error : (2039) (IM46130D-Service); Lost connection to backend server: Network error: 104, Connection reset by peer (IM46130D1, session=2039, conn_id=4531072) It seems whenever a connection reaches interactive_timeout it is then reported as an error in the log file (I can see it clearly by doing some "list sessions", when 600s reached --> reported as error). I've tried to set connection_keepalive=30s to the service in maxscale config file but it does not help. On the backend mariadb : Service in maxscale : I've found a similar issue saying it is fixed but I'm still having the problem. Here : Could you have a look? Thank you |
| Comments |
| Comment by markus makela [ 2024-01-23 ] |
|
Can you confirm from the server error logs that these are indeed idle timeouts? You should find a log entry with a connection id that matches the conn_id=<number> part. If the client is truly idle, MaxScale won't send a connection keepalive ping unless the force_connection_keepalive parameter is set to true. Can you try if turning that on solves the problem for you? |
| Comment by Patrick Vandenbosch [ 2024-01-23 ] |
|
Yes I confirm these are idle timeouts. See session id 9995 I highlighted in yellow in attached screenshots. At the time of doing "list sessions" that session had 30sec left before reaching idle timeout (max1.png), 30sec later it appears in the maxscale log as an error (max2.png). I tried setting force_connection_keepalive parameter as you suggested, with that I do not have anymore any error reported in the maxscale log but when I run a "list sessions" I see sessions going above the 600sec allowed timeout by the db, is this normal? (max3.png) It seems there is a difference of behavior between 6.4 & 23.08, is this intended? Shouldn't the sessions reaching idle timeouts be terminated without reporting it as an error in the maxscale log ? |
| Comment by markus makela [ 2024-01-23 ] |
|
You most likely have no idle timeouts set in MaxScale and thus MaxScale is not able to evict them before the server evicts them. When the server kills idle clients, it just closes the socket which appears like a broken connection in MaxScale and any other "client" that connects to the database. Thus it is not possible to be certain whether a connection timed out or the connection was broken. Usually it's a timeout if it's above a certain threshold but right now MaxScale just reports them as errors. The reason why MaxScale 23.08 behaves differently is because older 6.4 releases had a bug if you want to get rid of idle connections in MaxScale, you can use the (unfortunately named) connection_timeout service parameter. By default there's no idle timeouts in MaxScale and the usual recommendation I give is to set it below any idle timeouts in the database. This downgrades them into warnings and lets you know which clients are being idle. |
| Comment by markus makela [ 2024-01-23 ] |
|
Patrick Can you also fill in the exact 6.4 version you're upgrading from? |
| Comment by Patrick Vandenbosch [ 2024-01-24 ] |
|
Thanks for the detailed explanation. Indeed we did not have connection_timeout set in maxscale config, I just did set it (well "wait_timeout" as "connection_timeout" is deprecated) to a lower value than backend servers and now there is no more error in the maxscale log, instead it's a warning like this : 2024-01-24 08:46:44 warning: (5) Timing out 'xxx'@'ipaddress', idle for 591 seconds The exact version I'm upgrading from is 6.4.13-1.rhel.8 |