Details
Description
When Galera is enabled, MariaDB's systemd service executes the "galera_recovery" script as an ExecStartPre operation. See the following:
The MariaDB systemd service has a default TimeoutStartSec value of 90 seconds, so if this ExecStartPre step takes longer than that, then this can cause startup to fail. For example, see the following failure from a syslog:
Sep 13 15:48:28 server1 systemd[1]: Starting MariaDB 10.2.16 database server...
|
Sep 13 15:49:58 server1 systemd[1]: mariadb.service: Start-pre operation timed out. Terminating.
|
Sep 13 15:49:58 server1 systemd[1]: Failed to start MariaDB 10.2.16 database server.
|
Sep 13 15:49:58 server1 systemd[1]: mariadb.service: Unit entered failed state.
|
Sep 13 15:49:58 server1 systemd[1]: mariadb.service: Failed with result 'timeout'.
|
galera_recovery has to perform server startup, so this step can take a while, especially if the server previously crashed, and it has to perform crash recovery. However, it looks like systemd timeouts should have been extended during server startup as part of MDEV-14705. Despite that, server versions with the fix for MDEV-14705 still see timeouts during ExecStartPre. Is it likely that important long-running startup functions were missed?
See also MDEV-17571 as another case where systemd timeout extensions didn't seem to work as intended.
Attachments
Issue Links
- relates to
-
MDEV-9202 Systemd timeout is not sufficient for larger servers
-
- Closed
-
-
MDEV-9520 xtrabackup-v2 to support systemd node provisioning
-
- Closed
-
-
MDEV-15607 mysqld crashed few after node is being joined with sst
-
- Closed
-
-
MDEV-17003 service_manager_extend_timeout() being called too often
-
- Closed
-
-
MDEV-14705 systemd: EXTEND_TIMEOUT_USEC= to avoid startup and shutdown timeouts
-
- Closed
-
-
MDEV-15607 mysqld crashed few after node is being joined with sst
-
- Closed
-
-
MDEV-17571 Make systemd timeout behavior more compatible with long Galera SSTs
-
- Closed
-
Activity
Field | Original Value | New Value |
---|---|---|
Link |
This issue relates to |
Link |
This issue relates to |
Link |
This issue relates to |
Link |
This issue relates to |
Link |
This issue relates to |
Fix Version/s | 10.1 [ 16100 ] | |
Fix Version/s | 10.2 [ 14601 ] | |
Fix Version/s | 10.3 [ 22126 ] | |
Fix Version/s | 10.4 [ 22408 ] | |
Assignee | Rasmus Johansson [ ratzpo ] |
Description |
When Galera is enabled, MariaDB's systemd service executes the "galera_recovery" script as an ExecStartPre operation. See the following:
https://github.com/MariaDB/server/blob/ce8716a1ed786ff971b5e15c88385d50b649ec7f/support-files/mariadb.service.in#L71 The MariaDB systemd service has a default TimeoutStartSec value of 90 seconds, so if this ExecStartPre step takes longer than that, then this can cause startup to fail. For example, see the following failure from a syslog: {noformat} Sep 13 15:48:28 server1 systemd[1]: Starting MariaDB 10.2.16 database server... Sep 13 15:49:58 server1 systemd[1]: mariadb.service: Start-pre operation timed out. Terminating. Sep 13 15:49:58 server1 systemd[1]: Failed to start MariaDB 10.2.16 database server. Sep 13 15:49:58 server1 systemd[1]: mariadb.service: Unit entered failed state. Sep 13 15:49:58 server1 systemd[1]: mariadb.service: Failed with result 'timeout'. {noformat} galera_recovery has to perform server startup, so this step can take a while, especially if the server previously crashed, and it has to perform crash recovery. However, it looks like systemd timeouts should have been extended during server startup as part of See also |
When Galera is enabled, MariaDB's systemd service executes the "galera_recovery" script as an ExecStartPre operation. See the following:
https://github.com/MariaDB/server/blob/ce8716a1ed786ff971b5e15c88385d50b649ec7f/support-files/mariadb.service.in#L71 The MariaDB systemd service has a default TimeoutStartSec value of 90 seconds, so if this ExecStartPre step takes longer than that, then this can cause startup to fail. For example, see the following failure from a syslog: {noformat} Sep 13 15:48:28 server1 systemd[1]: Starting MariaDB 10.2.16 database server... Sep 13 15:49:58 server1 systemd[1]: mariadb.service: Start-pre operation timed out. Terminating. Sep 13 15:49:58 server1 systemd[1]: Failed to start MariaDB 10.2.16 database server. Sep 13 15:49:58 server1 systemd[1]: mariadb.service: Unit entered failed state. Sep 13 15:49:58 server1 systemd[1]: mariadb.service: Failed with result 'timeout'. {noformat} galera_recovery has to perform server startup, so this step can take a while, especially if the server previously crashed, and it has to perform crash recovery. However, it looks like systemd timeouts should have been extended during server startup as part of See also |
Status | Open [ 1 ] | In Progress [ 3 ] |
Status | In Progress [ 3 ] | Stalled [ 10000 ] |
Assignee | Rasmus Johansson [ ratzpo ] | Jan Lindström [ jplindst ] |
Assignee | Jan Lindström [ jplindst ] | Rasmus Johansson [ ratzpo ] |
Fix Version/s | N/A [ 14700 ] | |
Fix Version/s | 10.2 [ 14601 ] | |
Fix Version/s | 10.1 [ 16100 ] | |
Fix Version/s | 10.3 [ 22126 ] | |
Fix Version/s | 10.4 [ 22408 ] | |
Resolution | Duplicate [ 3 ] | |
Status | Stalled [ 10000 ] | Closed [ 6 ] |
Workflow | MariaDB v3 [ 91110 ] | MariaDB v4 [ 155321 ] |
Zendesk Related Tickets | 127027 156845 114054 |
I just noticed that EXTEND_TIMEOUT_USEC was added in systemd version 236:
https://lists.freedesktop.org/archives/systemd-devel/2017-December/039996.html
The most common OS that we tend to see for MariaDB with Galera is RHEL 7, and that still has systemd version 219:
[ec2-user@ip-172-30-0-249 ~]$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.2 (Maipo)
[ec2-user@ip-172-30-0-249 ~]$ sudo yum info systemd
Loaded plugins: amazon-id, rhui-lb, search-disabled-repos
Repodata is over 2 weeks old. Install yum-cron? Or run: yum makecache fast
Installed Packages
Name : systemd
Arch : x86_64
Version : 219
Release : 19.el7
Size : 21 M
Repo : installed
From repo : anaconda
Summary : A System and Service Manager
URL : http://www.freedesktop.org/wiki/Software/systemd
License : LGPLv2+ and MIT and GPLv2+
Description : systemd is a system and service manager for Linux, compatible with
: SysV and LSB init scripts. systemd provides aggressive parallelization
: capabilities, uses socket and D-Bus activation for starting services,
: offers on-demand starting of daemons, keeps track of processes using
: Linux cgroups, supports snapshotting and restoring of the system
: state, maintains mount and automount points and implements an
: elaborate transactional dependency-based service control logic. It can
: work as a drop-in replacement for sysvinit.
Available Packages
Name : systemd
Arch : x86_64
Version : 219
Release : 62.el7
Size : 5.1 M
Repo : rhui-REGION-rhel-server-releases/7Server/x86_64
Summary : A System and Service Manager
URL : http://www.freedesktop.org/wiki/Software/systemd
License : LGPLv2+ and MIT and GPLv2+
Description : systemd is a system and service manager for Linux, compatible with
: SysV and LSB init scripts. systemd provides aggressive parallelization
: capabilities, uses socket and D-Bus activation for starting services,
: offers on-demand starting of daemons, keeps track of processes using
: Linux cgroups, supports snapshotting and restoring of the system
: state, maintains mount and automount points and implements an
: elaborate transactional dependency-based service control logic. It can
: work as a drop-in replacement for sysvinit.
So even if
MDEV-14705fixed this problem for systemd versions 236 and above, a lot of users are using systemd versions that are much older, so they would not benefit from that functionality. That explains a lot.