[MDEV-17571] Make systemd timeout behavior more compatible with long Galera SSTs Created: 2018-10-30  Updated: 2020-03-25  Resolved: 2020-01-22

Status: Closed
Project: MariaDB Server
Component/s: Galera, Galera SST, Packaging, wsrep
Affects Version/s: 10.1, 10.1.36, 10.2.18, 10.3.9, 10.3.11, 10.1.38, 10.2, 10.3
Fix Version/s: 10.1.44, 10.2.31, 10.3.22, 10.4.12, 10.5.1

Type: Bug Priority: Critical
Reporter: Claudio Nanni Assignee: Jan Lindström (Inactive)
Resolution: Fixed Votes: 9
Labels: systemd

Issue Links:
Blocks
Relates
relates to MDEV-9202 Systemd timeout is not sufficient for... Closed
relates to MDEV-9520 xtrabackup-v2 to support systemd node... Closed
relates to MDEV-14705 systemd: EXTEND_TIMEOUT_USEC= to avoi... Closed
relates to MDEV-15606 Galera can't perform SST in 10.2.13 i... Closed
relates to MDEV-15607 mysqld crashed few after node is bein... Closed
relates to MDEV-17003 service_manager_extend_timeout() bein... Closed
relates to MDEV-17934 Make systemd timeout behavior more co... Closed
relates to MDEV-21231 notify systemd of long running SST to... Stalled
relates to MDEV-14705 systemd: EXTEND_TIMEOUT_USEC= to avoi... Closed

 Description   

SSTs can take several hours in many cases, but the current default value of TimeoutStartSec causes systemd to force the joiner node to timeout in about 90 seconds. It might make sense to disable systemd service's timeout by default instead.

Depending on the systemd version, disabling the startup timeout means setting either TimeoutStartSec=0 (if systemd version <=228) or TimeoutStartSec=infinity (if systemd version >=229).

In systemd 236 and later, the startup timeout can be extended by setting EXTEND_TIMEOUT_USEC:

If a service of Type=notify sends "EXTEND_TIMEOUT_USEC=…", this may cause the start time to be extended beyond TimeoutStartSec=. The first receipt of this message must occur before TimeoutStartSec= is exceeded, and once the start time has exended beyond TimeoutStartSec=, the service manager will allow the service to continue to start, provided the service repeats "EXTEND_TIMEOUT_USEC=…" within the interval specified until the service startup status is finished by "READY=1". (see sd_notify(3)).

https://www.freedesktop.org/software/systemd/man/systemd.service.html

It looks like this approach was used to extend the startup timeout during SSTs while fixing MDEV-15607. It looks like this is the relevant commit:

https://github.com/mariadb/server/commit/be5698265a4195586142d1a34fdd1cce9d95d8a1

The relevant service_manager_extend_timeout function seems to be defined here:

https://github.com/MariaDB/server/blob/be5698265a4195586142d1a34fdd1cce9d95d8a1/include/my_service_manager.h#L30

And it sets the EXTEND_TIMEOUT_USEC environment variable mentioned in the systemd manual.

However, a lot of users are still seeing startup timeouts during SSTs. The cause seems to be that most systemd installations are not yet using version 236 or later.

The following documentation section that describes current behavior:

https://mariadb.com/kb/en/library/introduction-to-state-snapshot-transfers-ssts/#ssts-and-systemd

https://mariadb.com/kb/en/library/systemd/#configuring-the-systemd-service-timeout



 Comments   
Comment by Elena Stepanova [ 2018-10-30 ]

I would expect that instances where SST indeed takes several hours are minority comparing to all MariaDB server instances, so should we really adjust the configuration to something that can be considered a corner case? I don't have a strong opinion on the subject, serg, jplindst, what do you think?

Comment by Sergei Golubchik [ 2018-10-31 ]

1. I agree with Elena, that the default configuration should be optimized for the default case
2. Haven't systemd developers fixed it for us, you wrote yourself that the timeout is disabled in newer systemd.

Comment by Valerii Kravchuk [ 2018-10-31 ]

Default timeout is 90 seconds, there are just different ways to disable it depending on systemd version.

I think that even for plan non-Gelera MySQL or MariaDB instance timeout should be larger than that (maybe 600 or 900 seconds even), if not infinite. Hence this request is to add explicit setting (and comment on how to disable timeout) to the configuratioin file we include in our packages targeting systemd-based Linux distributions.

Comment by Richard Stracke [ 2018-10-31 ]

Databases tend to be bigger.

With a 10GB Network is highspeed 40MB/s, which is equivalent for 90 seconds = 3,6 GB without any overhead.
Even with a 100 GB network , it would be 36 GB.
This is not the default case in my opinion and the will be more in the future.

90 seconds limits the database size of MariaDB with galera without necessity.

In addtion the failed SST is not very easy to spot, if you not a skilled DBA.

Comment by Hartmut Holzgraefe [ 2018-10-31 ]

While I think that "infinity" is too long, the default 90 seconds is definitely too short for Galera.

But it seems that SystemD developer(s) actually thought of such scenarios, allowing services that take longer to start up to extend startup timeout dynamically:

"If a service of Type=notify sends "EXTEND_TIMEOUT_USEC=…", this may cause the start time to be extended beyond TimeoutStartSec=..."

<https://www.freedesktop.org/software/systemd/man/systemd.service.html>

So that actually looks like the correct way to go: having mysqld extend the systemd timeout while a still healthy SST is ongoing

Comment by Geoff Montee (Inactive) [ 2018-12-06 ]

jplindst tried to implement a fix using hholzgra's approach as part of MDEV-15607, but if people are still seeing timeouts during SSTs, then maybe there's an issue with the implementation? It looks like this is the relevant commit:

https://github.com/mariadb/server/commit/be5698265a4195586142d1a34fdd1cce9d95d8a1

The relevant service_manager_extend_timeout function seems to be defined here:

https://github.com/MariaDB/server/blob/be5698265a4195586142d1a34fdd1cce9d95d8a1/include/my_service_manager.h#L30

And it sets the EXTEND_TIMEOUT_USEC environment variable that hholzgra mentioned.

Comment by Geoff Montee (Inactive) [ 2018-12-07 ]

Loosely related: MDEV-17934

Comment by Geoff Montee (Inactive) [ 2018-12-07 ]

I just noticed that EXTEND_TIMEOUT_USEC was added in systemd version 236:

https://lists.freedesktop.org/archives/systemd-devel/2017-December/039996.html

The most common OS that we tend to see for MariaDB with Galera is RHEL 7, and that still has systemd version 219:

[ec2-user@ip-172-30-0-249 ~]$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.2 (Maipo)
[ec2-user@ip-172-30-0-249 ~]$ sudo yum info systemd
Loaded plugins: amazon-id, rhui-lb, search-disabled-repos
Repodata is over 2 weeks old. Install yum-cron? Or run: yum makecache fast
Installed Packages
Name        : systemd
Arch        : x86_64
Version     : 219
Release     : 19.el7
Size        : 21 M
Repo        : installed
From repo   : anaconda
Summary     : A System and Service Manager
URL         : http://www.freedesktop.org/wiki/Software/systemd
License     : LGPLv2+ and MIT and GPLv2+
Description : systemd is a system and service manager for Linux, compatible with
            : SysV and LSB init scripts. systemd provides aggressive parallelization
            : capabilities, uses socket and D-Bus activation for starting services,
            : offers on-demand starting of daemons, keeps track of processes using
            : Linux cgroups, supports snapshotting and restoring of the system
            : state, maintains mount and automount points and implements an
            : elaborate transactional dependency-based service control logic. It can
            : work as a drop-in replacement for sysvinit.
 
Available Packages
Name        : systemd
Arch        : x86_64
Version     : 219
Release     : 62.el7
Size        : 5.1 M
Repo        : rhui-REGION-rhel-server-releases/7Server/x86_64
Summary     : A System and Service Manager
URL         : http://www.freedesktop.org/wiki/Software/systemd
License     : LGPLv2+ and MIT and GPLv2+
Description : systemd is a system and service manager for Linux, compatible with
            : SysV and LSB init scripts. systemd provides aggressive parallelization
            : capabilities, uses socket and D-Bus activation for starting services,
            : offers on-demand starting of daemons, keeps track of processes using
            : Linux cgroups, supports snapshotting and restoring of the system
            : state, maintains mount and automount points and implements an
            : elaborate transactional dependency-based service control logic. It can
            : work as a drop-in replacement for sysvinit.

So even if MDEV-15607 fixed this problem for systemd versions 236 and above, a lot of users are using systemd versions that are much older, so they would not benefit from that functionality. That explains a lot.

Comment by Geoff Montee (Inactive) [ 2019-02-16 ]

This Percona blog post is relevant:

https://www.percona.com/blog/2019/02/12/debugging-mariadb-galera-cluster-sst-problems-a-tale-of-a-funny-experience/

Somehow, they're under the impression that this timeout used to be 900 seconds in MariaDB:

The MariaDB init script has changed its timeout from 900 seconds to 90 while MySQL Community and Percona Server has this value set to 15 mins.

But as far as I can tell from the git commit history, MariaDB's systemd unit file has never explicitly defined TimeoutSec or TimeoutStartSec, and it has used the systemd default value of 90 seconds for TimeoutStartSec since systemd support was added in MariaDB 10.1.

I guess the 900 seconds is a reference to the service startup timeout in mysql.server, which is used as the init script on distributions that don't support systemd.

https://github.com/MariaDB/server/blob/32062cc61cd00e4cd3b7939c8a09f9c3ac34ec76/support-files/mysql.server.sh#L48

It looks like Percona XtraDB Cluster sets an infinite timeout by default in its systemd unit file:

https://github.com/percona/percona-xtradb-cluster/blob/3f1af140a962e731d6cf9ae83c18c30cf26165d0/scripts/systemd/mysqld.service.in#L39

Should we set TimeoutStartSec to a higher value than 90 seconds by default for systemd versions that don't support EXTEND_TIMEOUT_USEC? It probably wouldn't hurt to at least set it to the old 900 second value from mysql.server that some users are used to.

Comment by Jan Lindström (Inactive) [ 2019-05-06 ]

axel Is there some way to get required settings for TimeoutStartSec and TimeoutSec to mariadb.service file ?

Comment by Axel Schwenke [ 2019-12-02 ]

I've read a lot of code and documentation lately. Let me try to summarize things:

  1. the MariaDB Server systemd unit never set a timeout for server startup. The old init script used to set it to 900 seconds, but for systemd we rely on the systemd default which is 90 seconds.
  2. those 90 seconds are in general considered to be too short, 900 seconds (the old value) seem to be more appropriate.
  3. there is a way to extend the timeout dynamically (from the running SST). This was implemented in MDEV-15607 - but it requires a recent version of systemd that not all users have. I also see that with the switching to Galera 4 the respective code was removed from sql/wsrep_sst.cc (commit 36a2a185fe18d31a644da46cfabd9757a379280c)

From the above I conclude the following steps:

  1. we should explicitly set a timeout of 900 seconds in our systemd unit file. This is trivial, but it will only help those users whose SST needs longer than 90 sec, but shorter than 900 sec.
  2. for all other users we already documented how to bypass the timeout, by setting it to 'infinite'. This is not considered a general solution, but rather a "last resort" action. And it should only be implemented by users who strictly need it.
  3. the general solution should be along MDEV-15607
Generated at Thu Feb 08 08:37:28 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.