Uploaded image for project: 'MariaDB MaxScale'
  1. MariaDB MaxScale
  2. MXS-2860

MaxScale 2.3.16 logs "Lost connection to master server while waiting for a result. Connection has been idle for 0.0 seconds. Error caused by: #HY000: Lost connection to backend server: network error. Last close reason: <none>

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Cannot Reproduce
    • 2.3.16
    • N/A
    • N/A
    • MXS-SPRINT-99, MXS-SPRINT-101

    Description

      It looks like: https://jira.mariadb.org/browse/MXS-2408

      Is still on 2.3.16:

      2020-01-23 15:42:15 error : (872) Lost connection to the master server 'server', closing session. Lost connection to master server while waiting for a result. Connection has been idle for 0.0 seconds. Error caused by: #HY000: Lost connection to backend server: network error. Last close reason: <none>

      Attachments

        Issue Links

          Activity

            markus makela markus makela added a comment - - edited

            Since this was on MaxScale 2.3.16 which has the improved error logging, we know this event wasn't generated by MaxScale (no (Generated event) after the network error part). This means that it is not a regression of MXS-2408 or MXS-2410 and it is in fact a real network error event.

            The connection idle time calculation could be wrong if a network input event (EPOLLIN) occurs at the same time that a network error does (EPOLLERR). Currently the idle time is unconditionally updated whenever network input events arrive. By updating the idle time only when data is successfully read, we should be able to preserve the real idle time of the connection.

            markus makela markus makela added a comment - - edited Since this was on MaxScale 2.3.16 which has the improved error logging, we know this event wasn't generated by MaxScale (no (Generated event) after the network error part). This means that it is not a regression of MXS-2408 or MXS-2410 and it is in fact a real network error event. The connection idle time calculation could be wrong if a network input event (EPOLLIN) occurs at the same time that a network error does (EPOLLERR). Currently the idle time is unconditionally updated whenever network input events arrive. By updating the idle time only when data is successfully read, we should be able to preserve the real idle time of the connection.
            markus makela markus makela added a comment -

            Managed to find out a few cases where the network socket error is cleared before the error handler function gets to read it. With this fix, we should be able to see the actual error message that caused the connection to be closed.

            markus makela markus makela added a comment - Managed to find out a few cases where the network socket error is cleared before the error handler function gets to read it. With this fix, we should be able to see the actual error message that caused the connection to be closed.
            markus makela markus makela added a comment -

            Closing as Cannot Reproduce as this hasn't happened with latest releases.

            markus makela markus makela added a comment - Closing as Cannot Reproduce as this hasn't happened with latest releases.

            People

              markus makela markus makela
              claudio.nanni Claudio Nanni
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.