Uploaded image for project: 'MariaDB MaxScale'
  1. MariaDB MaxScale
  2. MXS-5188

Detect if SIGABRT is due to a watchdog timeout

Details

    Description

      Problem

      In MXS-5187 a SIGABRT was sent due to a potential systemd watchdog timeout. The stacktraces were a bit suspicious so it's theoretically possible for this to be something else.

      To be more certain of the source of a SIGABRT, MaxScale should somehow figure out if the SIGABRT relates to a watchdog timeout or something else.

      Solutions

      Scan systemd journal for messages

      This solution would give us with a 100% certainty an answer to the question. The problem with this approach is that due to the problems described in MXS-5196, the messages cannot be read without being a part of the systemd-journal group. This could be added but given that MaxScale usually does not need this (maxlog=1 is the default), it, for the time being, is better left to the end user to choose whether to allow MaxScale to read the log.

      Track when the last watchdog notification was sent

      The notification interval and the time when the last notification was sent is known by MaxScale. By logging this information in the signal handler, we'd be able to tell with a high likelihood whether the SIGABRT was due to a watchdog timeout simply by looking at when the last notification was sent and how often they should be sent. Since the notifications are sent twice as often as are needed, the difference in times should be very obvious.

      Attachments

        Issue Links

          Activity

            markus makela markus makela created issue -
            markus makela markus makela made changes -
            Field Original Value New Value
            Assignee markus makela [ <markus.makela ]
            markus makela markus makela made changes -
            markus makela markus makela made changes -
            Summary Scan systemd journal for watchdog timeout for SIGABRT Scan systemd journal on SIGABRT for watchdog timeout
            markus makela markus makela made changes -
            markus makela markus makela made changes -
            Description In MXS-5187 a SIGABRT was sent
            markus makela markus makela made changes -
            Description In MXS-5187 a SIGABRT was sent In MXS-5187 a SIGABRT was sent due to a potential systemd watchdog timeout. The stacktraces were a bit suspicious so it's theoretically possible for this to be something else.

            To be more certain of the source of a SIGABRT, MaxScale should somehow figure out if the SIGABRT relates to a watchdog timeout or something else.
            markus makela markus makela made changes -
            Summary Scan systemd journal on SIGABRT for watchdog timeout Detect if SIGABRT is due to a watchdog timeout
            markus makela markus makela made changes -
            Description In MXS-5187 a SIGABRT was sent due to a potential systemd watchdog timeout. The stacktraces were a bit suspicious so it's theoretically possible for this to be something else.

            To be more certain of the source of a SIGABRT, MaxScale should somehow figure out if the SIGABRT relates to a watchdog timeout or something else.
            h2. Problem
            In MXS-5187 a SIGABRT was sent due to a potential systemd watchdog timeout. The stacktraces were a bit suspicious so it's theoretically possible for this to be something else.

            To be more certain of the source of a SIGABRT, MaxScale should somehow figure out if the SIGABRT relates to a watchdog timeout or something else.

            h2. Solutions

            h3. Scan systemd journal for messages

            This solution would give us with a 100% certainty an answer to the question. The problem with this approach is that due to the problems described in MXS-5196, the messages cannot be read without being a part of the {{systemd-journal}} group. This could be added but given that MaxScale usually does not need this (maxlog=1 is the default), it, for the time being, is better left to the end user to choose whether to allow MaxScale to read the log.

            h3. Track when the last watchdog notification was sent

            The notification interval and the time when the last notification was sent is known by MaxScale. By logging this information in the signal handler, we'd be able to tell with a high likelihood whether the SIGABRT was due to a watchdog timeout simply by looking at when the last notification was sent and how often they should be sent. Since the notifications are sent twice as often as are needed, the difference in times should be very obvious.
            markus makela markus makela made changes -
            Fix Version/s 21.06 [ 26119 ]
            markus makela markus makela made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            markus makela markus makela made changes -
            Status In Progress [ 3 ] In Review [ 10002 ]
            johan.wikman Johan Wikman made changes -
            Rank Ranked higher
            markus makela markus makela made changes -
            Component/s Core [ 11600 ]
            Fix Version/s 21.06.17 [ 29842 ]
            Fix Version/s 22.08.14 [ 29843 ]
            Fix Version/s 23.02.11 [ 29844 ]
            Fix Version/s 23.08.7 [ 29845 ]
            Fix Version/s 24.02.3 [ 29846 ]
            Fix Version/s 24.08.1 [ 29917 ]
            Fix Version/s 21.06 [ 26119 ]
            Resolution Fixed [ 1 ]
            Status In Review [ 10002 ] Closed [ 6 ]

            People

              markus makela markus makela
              markus makela markus makela
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.