Uploaded image for project: 'MariaDB MaxScale'
  1. MariaDB MaxScale
  2. MXS-4953

Lost connection to backend server: Network error: 104, Connection reset by peer

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Not a Bug
    • 23.08.4
    • N/A
    • N/A
    • None
    • DEV

    Description

      Hello,

      We recently upgraded from maxscale 6.4 to 23.08.
      Since then, we notice in the maxscale logfile error like these :

      2024-01-23 11:29:07 error : (2039) (IM46130D-Service); Lost connection to backend server: Network error: 104, Connection reset by peer (IM46130D1, session=2039, conn_id=4531072)
      2024-01-23 11:29:26 error : (2062) (IM46130D-Service); Lost connection to backend server: Network error: 104, Connection reset by peer (IM46130D1, session=2062, conn_id=4531091)
      2024-01-23 11:29:38 error : (2071) (IM46130D-Service); Lost connection to backend server: Network error: 104, Connection reset by peer (IM46130D1, session=2071, conn_id=4531101)

      It seems whenever a connection reaches interactive_timeout it is then reported as an error in the log file (I can see it clearly by doing some "list sessions", when 600s reached --> reported as error). I've tried to set connection_keepalive=30s to the service in maxscale config file but it does not help.

      On the backend mariadb :
      interactive_timeout=600
      wait_timeout=600

      Service in maxscale :
      [IM46130D-Service]
      type=service
      router=readconnroute
      router_options=master
      servers=IM46130D1,IM46130D2
      user=xxx
      password=xxx
      connection_keepalive=30s

      I've found a similar issue saying it is fixed but I'm still having the problem. Here :
      https://jira.mariadb.org/browse/MXS-4440

      Could you have a look?

      Thank you

      Attachments

        1. max1.PNG
          max1.PNG
          53 kB
        2. max2.PNG
          max2.PNG
          4 kB
        3. max3.PNG
          max3.PNG
          27 kB

        Issue Links

          Activity

            markus makela markus makela added a comment -

            Can you confirm from the server error logs that these are indeed idle timeouts? You should find a log entry with a connection id that matches the conn_id=<number> part.

            If the client is truly idle, MaxScale won't send a connection keepalive ping unless the force_connection_keepalive parameter is set to true. Can you try if turning that on solves the problem for you?

            markus makela markus makela added a comment - Can you confirm from the server error logs that these are indeed idle timeouts? You should find a log entry with a connection id that matches the conn_id=<number> part. If the client is truly idle, MaxScale won't send a connection keepalive ping unless the force_connection_keepalive parameter is set to true. Can you try if turning that on solves the problem for you?

            Yes I confirm these are idle timeouts. See session id 9995 I highlighted in yellow in attached screenshots.

            At the time of doing "list sessions" that session had 30sec left before reaching idle timeout (max1.png), 30sec later it appears in the maxscale log as an error (max2.png).

            I tried setting force_connection_keepalive parameter as you suggested, with that I do not have anymore any error reported in the maxscale log but when I run a "list sessions" I see sessions going above the 600sec allowed timeout by the db, is this normal? (max3.png)

            It seems there is a difference of behavior between 6.4 & 23.08, is this intended? Shouldn't the sessions reaching idle timeouts be terminated without reporting it as an error in the maxscale log ?

            Patrick Patrick Vandenbosch added a comment - Yes I confirm these are idle timeouts. See session id 9995 I highlighted in yellow in attached screenshots. At the time of doing "list sessions" that session had 30sec left before reaching idle timeout (max1.png), 30sec later it appears in the maxscale log as an error (max2.png). I tried setting force_connection_keepalive parameter as you suggested, with that I do not have anymore any error reported in the maxscale log but when I run a "list sessions" I see sessions going above the 600sec allowed timeout by the db, is this normal? (max3.png) It seems there is a difference of behavior between 6.4 & 23.08, is this intended? Shouldn't the sessions reaching idle timeouts be terminated without reporting it as an error in the maxscale log ?
            markus makela markus makela added a comment - - edited

            You most likely have no idle timeouts set in MaxScale and thus MaxScale is not able to evict them before the server evicts them. When the server kills idle clients, it just closes the socket which appears like a broken connection in MaxScale and any other "client" that connects to the database. Thus it is not possible to be certain whether a connection timed out or the connection was broken. Usually it's a timeout if it's above a certain threshold but right now MaxScale just reports them as errors.

            The reason why MaxScale 23.08 behaves differently is because older 6.4 releases had a bug
            (MXS-4139, MXS-4720) that kept pinging sessions every 300 seconds (the default for connection_keepalive)and the actual idleness of the client was not correctly taken into account. This effectively extended the values of wait_timeout and interactive_timeout in the server to infinite values.

            if you want to get rid of idle connections in MaxScale, you can use the (unfortunately named) connection_timeout service parameter. By default there's no idle timeouts in MaxScale and the usual recommendation I give is to set it below any idle timeouts in the database. This downgrades them into warnings and lets you know which clients are being idle.

            markus makela markus makela added a comment - - edited You most likely have no idle timeouts set in MaxScale and thus MaxScale is not able to evict them before the server evicts them. When the server kills idle clients, it just closes the socket which appears like a broken connection in MaxScale and any other "client" that connects to the database. Thus it is not possible to be certain whether a connection timed out or the connection was broken. Usually it's a timeout if it's above a certain threshold but right now MaxScale just reports them as errors. The reason why MaxScale 23.08 behaves differently is because older 6.4 releases had a bug ( MXS-4139 , MXS-4720 ) that kept pinging sessions every 300 seconds (the default for connection_keepalive)and the actual idleness of the client was not correctly taken into account. This effectively extended the values of wait_timeout and interactive_timeout in the server to infinite values. if you want to get rid of idle connections in MaxScale, you can use the (unfortunately named) connection_timeout service parameter. By default there's no idle timeouts in MaxScale and the usual recommendation I give is to set it below any idle timeouts in the database. This downgrades them into warnings and lets you know which clients are being idle.
            markus makela markus makela added a comment -

            Patrick Can you also fill in the exact 6.4 version you're upgrading from?

            markus makela markus makela added a comment - Patrick Can you also fill in the exact 6.4 version you're upgrading from?

            Thanks for the detailed explanation. Indeed we did not have connection_timeout set in maxscale config, I just did set it (well "wait_timeout" as "connection_timeout" is deprecated) to a lower value than backend servers and now there is no more error in the maxscale log, instead it's a warning like this :

            2024-01-24 08:46:44 warning: (5) Timing out 'xxx'@'ipaddress', idle for 591 seconds

            The exact version I'm upgrading from is 6.4.13-1.rhel.8

            Patrick Patrick Vandenbosch added a comment - Thanks for the detailed explanation. Indeed we did not have connection_timeout set in maxscale config, I just did set it (well "wait_timeout" as "connection_timeout" is deprecated) to a lower value than backend servers and now there is no more error in the maxscale log, instead it's a warning like this : 2024-01-24 08:46:44 warning: (5) Timing out 'xxx'@'ipaddress', idle for 591 seconds The exact version I'm upgrading from is 6.4.13-1.rhel.8
            markus makela markus makela added a comment -

            I'll close this as Not a Bug as it looks like it's behaving as expected.

            markus makela markus makela added a comment - I'll close this as Not a Bug as it looks like it's behaving as expected.

            People

              markus makela markus makela
              Patrick Patrick Vandenbosch
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.