Details
Description
SSTs can take several hours in many cases, but the current default value of TimeoutStartSec causes systemd to force the joiner node to timeout in about 90 seconds. It might make sense to disable systemd service's timeout by default instead.
Depending on the systemd version, disabling the startup timeout means setting either TimeoutStartSec=0 (if systemd version <=228) or TimeoutStartSec=infinity (if systemd version >=229).
In systemd 236 and later, the startup timeout can be extended by setting EXTEND_TIMEOUT_USEC:
If a service of Type=notify sends "EXTEND_TIMEOUT_USEC=…", this may cause the start time to be extended beyond TimeoutStartSec=. The first receipt of this message must occur before TimeoutStartSec= is exceeded, and once the start time has exended beyond TimeoutStartSec=, the service manager will allow the service to continue to start, provided the service repeats "EXTEND_TIMEOUT_USEC=…" within the interval specified until the service startup status is finished by "READY=1". (see sd_notify(3)).
https://www.freedesktop.org/software/systemd/man/systemd.service.html
It looks like this approach was used to extend the startup timeout during SSTs while fixing MDEV-15607. It looks like this is the relevant commit:
https://github.com/mariadb/server/commit/be5698265a4195586142d1a34fdd1cce9d95d8a1
The relevant service_manager_extend_timeout function seems to be defined here:
And it sets the EXTEND_TIMEOUT_USEC environment variable mentioned in the systemd manual.
However, a lot of users are still seeing startup timeouts during SSTs. The cause seems to be that most systemd installations are not yet using version 236 or later.
The following documentation section that describes current behavior:
https://mariadb.com/kb/en/library/introduction-to-state-snapshot-transfers-ssts/#ssts-and-systemd
https://mariadb.com/kb/en/library/systemd/#configuring-the-systemd-service-timeout
Attachments
Issue Links
- relates to
-
MDEV-9202 Systemd timeout is not sufficient for larger servers
- Closed
-
MDEV-9520 xtrabackup-v2 to support systemd node provisioning
- Closed
-
MDEV-14705 systemd: EXTEND_TIMEOUT_USEC= to avoid startup and shutdown timeouts
- Closed
-
MDEV-15606 Galera can't perform SST in 10.2.13 if systemd in use due to timeout at startup
- Closed
-
MDEV-15607 mysqld crashed few after node is being joined with sst
- Closed
-
MDEV-17003 service_manager_extend_timeout() being called too often
- Closed
-
MDEV-17934 Make systemd timeout behavior more compatible with longer Galera recovery times
- Closed
-
MDEV-21231 notify systemd of long running SST to avoid timeout
- Stalled
-
MDEV-14705 systemd: EXTEND_TIMEOUT_USEC= to avoid startup and shutdown timeouts
- Closed