When Galera is enabled, MariaDB's systemd service executes the "galera_recovery" script as an ExecStartPre operation. See the following:
https://github.com/MariaDB/server/blob/ce8716a1ed786ff971b5e15c88385d50b649ec7f/support-files/mariadb.service.in#L71
The MariaDB systemd service has a default TimeoutStartSec value of 90 seconds, so if this ExecStartPre step takes longer than that, then this can cause startup to fail. For example, see the following failure from a syslog:
Sep 13 15:48:28 server1 systemd[1]: Starting MariaDB 10.2.16 database server...
|
Sep 13 15:49:58 server1 systemd[1]: mariadb.service: Start-pre operation timed out. Terminating.
|
Sep 13 15:49:58 server1 systemd[1]: Failed to start MariaDB 10.2.16 database server.
|
Sep 13 15:49:58 server1 systemd[1]: mariadb.service: Unit entered failed state.
|
Sep 13 15:49:58 server1 systemd[1]: mariadb.service: Failed with result 'timeout'.
|
galera_recovery has to perform server startup, so this step can take a while, especially if the server previously crashed, and it has to perform crash recovery. However, it looks like systemd timeouts should have been extended during server startup as part of MDEV-14705. Despite that, server versions with the fix for MDEV-14705 still see timeouts during ExecStartPre. Is it likely that important long-running startup functions were missed?
See also MDEV-17571 as another case where systemd timeout extensions didn't seem to work as intended.