Details
-
Task
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Duplicate
-
None
Description
From the case
_It turned out that the corruption came from startup process not having enough time to load the extent map and being killed by systemd halfway, leaving the map in an inconsistent state.
We have tweaked the systemd units to have proper start and stop timeout in order to handle the situation from now on.
/lib/systemd/system/mcs-workernode@.service changed TimeoutStopSec to 1200 seconds
/lib/systemd/system/mcs-loadbrm.service added TimeoutStartSec of 1200 seconds_
From drtuy
the default timeout can be raised up to 40 minutes.
Attachments
Issue Links
- duplicates
-
MCOL-5105 Reduced systemd timeouts results in corrupted EM
- Closed