[MDEV-16425] New node in Galera can't fully sync - systemd timeout - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Minor
Resolution: Duplicate
Affects Version/s: 10.1.31
Fix Version/s: N/A
Component/s: Galera, Galera SST
Labels:
None
Environment:
RHEL 7

Description

So we have a galera 10.1.31 cluster - it has three nodes. They are running RHEL 7

We have about 10 databases hosted in that cluster - one of them is about 50GB.

We lost a node due to a mishap which is another story. But we cleaned up the lost node and were trying to restart mariadb with: systemctl restart mariadb
We observed that inside of /var/lib/mysql the size of this directory never got larger than about 11GB and that the rsync processes never completed. Looking further into journalctl, we saw that about every 90 seconds - we found that the mariadb.service would be restarted.

After some digging, I figured out that systemd has a default service start timeout of 90 seconds (at least on RHEL 7). Since the mariadb.service while syncing remains in the 'Activating' state and because there was so much data to sync while activating, the service would hit the timeout.

The way I fixed this was to edit this file:
/usr/lib/systemd/system/mariadb.service

And add these lines below the [Service] line:
RestartSec=86400
TimeoutSec=86400

Then ran:
systemctl daemon-reload
Systemctl restart mariadb

After about 5 minutes, the node was fully sync'd and operational - I then removed these timeouts.

This raises a concern though - a default installation of Galera should not timeout during initial sync of medium-sized databases.

I'm not sure what the best way to handle this is - I'm concerned about making the increased timeout part of the mariadb.service file permanently for all systemd users - because this would have negative outcomes if there were in-fact some kind of funk going on with the service.

Maybe systemd has other states that could be used for the syncronization phase that a new galera node goes through? Something we can set the timeout higher for?

Thanks,
Wayne

Attachments

Activity

Ascending order - Click to sort in descending order

Wayne Workman added a comment - 2018-06-12 15:53

These are the same:

Wayne Workman added a comment - 2018-06-12 15:53 These are the same: MDEV-16425 MDEV-15606

Jan Lindström (Inactive) added a comment - 2018-09-12 10:53

~~MDEV-15607~~ should fix this issue.

Jan Lindström (Inactive) added a comment - 2018-09-12 10:53 MDEV-15607 should fix this issue.

People

Assignee:: Jan Lindström (Inactive)

Reporter:: Wayne Workman

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2018-06-07 18:06

Updated:: 2018-09-12 10:53

Resolved:: 2018-09-12 10:53

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server