[MDEV-16425] New node in Galera can't fully sync - systemd timeout Created: 2018-06-07 Updated: 2018-09-12 Resolved: 2018-09-12 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Galera, Galera SST |
| Affects Version/s: | 10.1.31 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Minor |
| Reporter: | Wayne Workman | Assignee: | Jan Lindström (Inactive) |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Environment: |
RHEL 7 |
||
| Description |
|
So we have a galera 10.1.31 cluster - it has three nodes. They are running RHEL 7 We have about 10 databases hosted in that cluster - one of them is about 50GB. We lost a node due to a mishap which is another story. But we cleaned up the lost node and were trying to restart mariadb with: systemctl restart mariadb After some digging, I figured out that systemd has a default service start timeout of 90 seconds (at least on RHEL 7). Since the mariadb.service while syncing remains in the 'Activating' state and because there was so much data to sync while activating, the service would hit the timeout. The way I fixed this was to edit this file: And add these lines below the [Service] line: Then ran: After about 5 minutes, the node was fully sync'd and operational - I then removed these timeouts. This raises a concern though - a default installation of Galera should not timeout during initial sync of medium-sized databases. I'm not sure what the best way to handle this is - I'm concerned about making the increased timeout part of the mariadb.service file permanently for all systemd users - because this would have negative outcomes if there were in-fact some kind of funk going on with the service. Maybe systemd has other states that could be used for the syncronization phase that a new galera node goes through? Something we can set the timeout higher for? Thanks, |
| Comments |
| Comment by Wayne Workman [ 2018-06-12 ] |
|
These are the same: |
| Comment by Jan Lindström (Inactive) [ 2018-09-12 ] |
|
|