Details
Description
When Galera is enabled, MariaDB's systemd service executes the "galera_recovery" script as an ExecStartPre operation. See the following:
The MariaDB systemd service has a default TimeoutStartSec value of 90 seconds, so if this ExecStartPre step takes longer than that, then this can cause startup to fail. For example, see the following failure from a syslog:
Sep 13 15:48:28 server1 systemd[1]: Starting MariaDB 10.2.16 database server...
|
Sep 13 15:49:58 server1 systemd[1]: mariadb.service: Start-pre operation timed out. Terminating.
|
Sep 13 15:49:58 server1 systemd[1]: Failed to start MariaDB 10.2.16 database server.
|
Sep 13 15:49:58 server1 systemd[1]: mariadb.service: Unit entered failed state.
|
Sep 13 15:49:58 server1 systemd[1]: mariadb.service: Failed with result 'timeout'.
|
galera_recovery has to perform server startup, so this step can take a while, especially if the server previously crashed, and it has to perform crash recovery. However, it looks like systemd timeouts should have been extended during server startup as part of MDEV-14705. Despite that, server versions with the fix for MDEV-14705 still see timeouts during ExecStartPre. Is it likely that important long-running startup functions were missed?
See also MDEV-17571 as another case where systemd timeout extensions didn't seem to work as intended.
Attachments
Issue Links
- relates to
-
MDEV-9202 Systemd timeout is not sufficient for larger servers
- Closed
-
MDEV-9520 xtrabackup-v2 to support systemd node provisioning
- Closed
-
MDEV-15607 mysqld crashed few after node is being joined with sst
- Closed
-
MDEV-17003 service_manager_extend_timeout() being called too often
- Closed
-
MDEV-14705 systemd: EXTEND_TIMEOUT_USEC= to avoid startup and shutdown timeouts
- Closed
-
MDEV-15607 mysqld crashed few after node is being joined with sst
- Closed
-
MDEV-17571 Make systemd timeout behavior more compatible with long Galera SSTs
- Closed
I just noticed that EXTEND_TIMEOUT_USEC was added in systemd version 236:
https://lists.freedesktop.org/archives/systemd-devel/2017-December/039996.html
The most common OS that we tend to see for MariaDB with Galera is RHEL 7, and that still has systemd version 219:
[ec2-user@ip-172-30-0-249 ~]$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.2 (Maipo)
[ec2-user@ip-172-30-0-249 ~]$ sudo yum info systemd
Loaded plugins: amazon-id, rhui-lb, search-disabled-repos
Repodata is over 2 weeks old. Install yum-cron? Or run: yum makecache fast
Installed Packages
Name : systemd
Arch : x86_64
Version : 219
Release : 19.el7
Size : 21 M
Repo : installed
From repo : anaconda
Summary : A System and Service Manager
URL : http://www.freedesktop.org/wiki/Software/systemd
License : LGPLv2+ and MIT and GPLv2+
Description : systemd is a system and service manager for Linux, compatible with
: SysV and LSB init scripts. systemd provides aggressive parallelization
: capabilities, uses socket and D-Bus activation for starting services,
: offers on-demand starting of daemons, keeps track of processes using
: Linux cgroups, supports snapshotting and restoring of the system
: state, maintains mount and automount points and implements an
: elaborate transactional dependency-based service control logic. It can
: work as a drop-in replacement for sysvinit.
Available Packages
Name : systemd
Arch : x86_64
Version : 219
Release : 62.el7
Size : 5.1 M
Repo : rhui-REGION-rhel-server-releases/7Server/x86_64
Summary : A System and Service Manager
URL : http://www.freedesktop.org/wiki/Software/systemd
License : LGPLv2+ and MIT and GPLv2+
Description : systemd is a system and service manager for Linux, compatible with
: SysV and LSB init scripts. systemd provides aggressive parallelization
: capabilities, uses socket and D-Bus activation for starting services,
: offers on-demand starting of daemons, keeps track of processes using
: Linux cgroups, supports snapshotting and restoring of the system
: state, maintains mount and automount points and implements an
: elaborate transactional dependency-based service control logic. It can
: work as a drop-in replacement for sysvinit.
So even if
MDEV-14705fixed this problem for systemd versions 236 and above, a lot of users are using systemd versions that are much older, so they would not benefit from that functionality. That explains a lot.