Uploaded image for project: 'MariaDB MaxScale'
  1. MariaDB MaxScale
  2. MXS-4135

MaxScale fails to start when listeners bound to specific IP address

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 6.2.4
    • Fix Version/s: 22.08.0
    • Component/s: Core
    • Labels:
      None
    • Environment:
      RHEL, possibly all other

      Description

      MaxScale runs under systemd. Its unit file comes configured like this:

      [Unit]
      After = network.service
      [Service]
      Retry = on-abort

      However, on RHEL at least (and possibly on Debian-like systems too) the network.service does not provide guarantees of actual network's availability. It may permit a bind to 0.0.0.0, but trying to bind to a specific IP address (like 1.2.3.4) will fail at this time - and so will MaxScale. Full availability of the network is only achieved after the completion of another service, the network-online.service.

      Because systemd is asynchronous, by the time MaxScale reaches its bind to 1.2.3.4, the network-online.service may already be completed too, resulting in proper MaxScale start-up - but this is not guaranteed. As a result, one sees intermittent start-up failures of MaxScale with TCP port bind error messages. Worse, MaxScale in these case fails and does not abort (different exit codes), hence, due to the "Retry" setting in its unit file, it will not be restarted after "RestartSec" seconds by systemd - resulting in complete outage of MaxScale until manual restart.

      Solutions:

      • Set network-online.target instead of network.service in the unit file. This will delay the start of MaxScale by few seconds during boot, but will completely prevent the mentioned failures.
      • Or, change the "Restart" setting from "on-abort" to "on-fail", which will make systemd restart MaxScale after "RestartSec" seconds (30 sec by default on RHEL) - by which time the network should be fully up.
      • Or, change the exit code of MaxScale when a TCP port bind fails from failure to abort.

        Attachments

          Activity

            People

            Assignee:
            markus makela markus makela
            Reporter:
            assen.totin Assen Totin (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.