[MDEV-21231] notify systemd of long running SST to avoid timeout - Jira

Details

Type: Bug
Status: Stalled (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: 10.4(EOL), 10.5
Fix Version/s: 10.5
Component/s: Galera SST
Labels:
None

Description

Systemd uses a timeout counter to catch misbehaving services. If a service hasn't notified systemd that it reached the "READY" state within that timeout, systemd will assume something went wrong and kill the service. The timeout is 90 seconds (systemd default), but can be set in the unit file (900 seconds in recent MariaDB Server releases) or in a user configuration file (as discussed in this knowledge base article).

When a Galera node recovers or simply joins a running cluster, the SST can take much longer than 90 (or 900) seconds, resulting in systemd diagnosing a service start timeout. From the user point of view the Galera node will simply be unable to start.

~~MDEV-15607~~ implemented a solution by notifying systemd with EXTEND_TIMEOUT_USEC messages that the service startup is still ongoing and that systemd should just continue waiting. Early in MariaDB 10.4 development, MariaDB switched to Galera 4. When that happened, the work from ~~MDEV-15607~~ was removed from the code base.

This issue is about re-adding the logic from ~~MDEV-15607~~ to the SST code in MariaDB server 10.4 and up. If the SST is taking longer than 90 seconds, then systemd must be notified that the service start is delayed. Such messages must be sent continuously until the SST has finished. Some safety margin should be left to the 90 sec timeout.

Attachments

Issue Links

relates to

MDEV-15607 mysqld crashed few after node is being joined with sst

Closed

MDEV-17571 Make systemd timeout behavior more compatible with long Galera SSTs

Closed

Activity

People

Assignee:: Seppo Jaakola

Reporter:: Axel Schwenke

Votes:: 4 Vote for this issue

Watchers:: 11 Start watching this issue

Dates

Created:: 2019-12-05 10:58

Updated:: 2024-09-10 15:06

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.