Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-17571

Make systemd timeout behavior more compatible with long Galera SSTs

    XMLWordPrintable

Details

    Description

      SSTs can take several hours in many cases, but the current default value of TimeoutStartSec causes systemd to force the joiner node to timeout in about 90 seconds. It might make sense to disable systemd service's timeout by default instead.

      Depending on the systemd version, disabling the startup timeout means setting either TimeoutStartSec=0 (if systemd version <=228) or TimeoutStartSec=infinity (if systemd version >=229).

      In systemd 236 and later, the startup timeout can be extended by setting EXTEND_TIMEOUT_USEC:

      If a service of Type=notify sends "EXTEND_TIMEOUT_USEC=…", this may cause the start time to be extended beyond TimeoutStartSec=. The first receipt of this message must occur before TimeoutStartSec= is exceeded, and once the start time has exended beyond TimeoutStartSec=, the service manager will allow the service to continue to start, provided the service repeats "EXTEND_TIMEOUT_USEC=…" within the interval specified until the service startup status is finished by "READY=1". (see sd_notify(3)).

      https://www.freedesktop.org/software/systemd/man/systemd.service.html

      It looks like this approach was used to extend the startup timeout during SSTs while fixing MDEV-15607. It looks like this is the relevant commit:

      https://github.com/mariadb/server/commit/be5698265a4195586142d1a34fdd1cce9d95d8a1

      The relevant service_manager_extend_timeout function seems to be defined here:

      https://github.com/MariaDB/server/blob/be5698265a4195586142d1a34fdd1cce9d95d8a1/include/my_service_manager.h#L30

      And it sets the EXTEND_TIMEOUT_USEC environment variable mentioned in the systemd manual.

      However, a lot of users are still seeing startup timeouts during SSTs. The cause seems to be that most systemd installations are not yet using version 236 or later.

      The following documentation section that describes current behavior:

      https://mariadb.com/kb/en/library/introduction-to-state-snapshot-transfers-ssts/#ssts-and-systemd

      https://mariadb.com/kb/en/library/systemd/#configuring-the-systemd-service-timeout

      Attachments

        Issue Links

          Activity

            People

              jplindst Jan Lindström (Inactive)
              claudio.nanni Claudio Nanni
              Votes:
              9 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.