SSTs can take several hours in many cases, but the current default value of TimeoutStartSec causes systemd to force the joiner node to timeout in about 90 seconds. It might make sense to disable systemd service's timeout by default instead.
Depending on the systemd version, disabling the startup timeout means setting either TimeoutStartSec=0 (if systemd version <=228) or TimeoutStartSec=infinity (if systemd version >=229).
In systemd 236 and later, the startup timeout can be extended by setting EXTEND_TIMEOUT_USEC:
If a service of Type=notify sends "EXTEND_TIMEOUT_USEC=…", this may cause the start time to be extended beyond TimeoutStartSec=. The first receipt of this message must occur before TimeoutStartSec= is exceeded, and once the start time has exended beyond TimeoutStartSec=, the service manager will allow the service to continue to start, provided the service repeats "EXTEND_TIMEOUT_USEC=…" within the interval specified until the service startup status is finished by "READY=1". (see sd_notify(3)).
It looks like this approach was used to extend the startup timeout during SSTs while fixing
MDEV-15607. It looks like this is the relevant commit:
The relevant service_manager_extend_timeout function seems to be defined here:
And it sets the EXTEND_TIMEOUT_USEC environment variable mentioned in the systemd manual.
However, a lot of users are still seeing startup timeouts during SSTs. The cause seems to be that most systemd installations are not yet using version 236 or later.
The following documentation section that describes current behavior: