Details
Description
When Galera is enabled, MariaDB's systemd service executes the "galera_recovery" script as an ExecStartPre operation. See the following:
The MariaDB systemd service has a default TimeoutStartSec value of 90 seconds, so if this ExecStartPre step takes longer than that, then this can cause startup to fail. For example, see the following failure from a syslog:
Sep 13 15:48:28 server1 systemd[1]: Starting MariaDB 10.2.16 database server...
|
Sep 13 15:49:58 server1 systemd[1]: mariadb.service: Start-pre operation timed out. Terminating.
|
Sep 13 15:49:58 server1 systemd[1]: Failed to start MariaDB 10.2.16 database server.
|
Sep 13 15:49:58 server1 systemd[1]: mariadb.service: Unit entered failed state.
|
Sep 13 15:49:58 server1 systemd[1]: mariadb.service: Failed with result 'timeout'.
|
galera_recovery has to perform server startup, so this step can take a while, especially if the server previously crashed, and it has to perform crash recovery. However, it looks like systemd timeouts should have been extended during server startup as part of MDEV-14705. Despite that, server versions with the fix for MDEV-14705 still see timeouts during ExecStartPre. Is it likely that important long-running startup functions were missed?
See also MDEV-17571 as another case where systemd timeout extensions didn't seem to work as intended.
Attachments
Issue Links
- relates to
-
MDEV-9202 Systemd timeout is not sufficient for larger servers
- Closed
-
MDEV-9520 xtrabackup-v2 to support systemd node provisioning
- Closed
-
MDEV-15607 mysqld crashed few after node is being joined with sst
- Closed
-
MDEV-17003 service_manager_extend_timeout() being called too often
- Closed
-
MDEV-14705 systemd: EXTEND_TIMEOUT_USEC= to avoid startup and shutdown timeouts
- Closed
-
MDEV-15607 mysqld crashed few after node is being joined with sst
- Closed
-
MDEV-17571 Make systemd timeout behavior more compatible with long Galera SSTs
- Closed