[MCOL-1610] DataRedundancy failover recovery leaves dbroot in limbo if gluster mount fails Created: 2018-07-30 Updated: 2023-10-26 Resolved: 2019-01-21 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | ? |
| Affects Version/s: | 1.1.5 |
| Fix Version/s: | Icebox |
| Type: | Bug | Priority: | Major |
| Reporter: | Ben Thompson (Inactive) | Assignee: | Ben Thompson (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Sprint: | 2018-15, 2018-16, 2018-17, 2018-18, 2018-19, 2018-20, 2018-21, 2019-01 |
| Description |
|
Example scenario similar to customer observed issue: 4PM / 1UM system with data redundancy PM2 fails and dbroot2 is moved to PM3 mcsadmin getstorageconfig System Storage Configuration Performance Module (DBRoot) Storage Type = DataRedundancy PM2 reconnects to PM1 and during the failover recovery of dbroot2 from PM3 to PM2 the mount command on gluster/dbroot2 to data2 on PM2 fails in some way. This will leave dbroot2 not mounted to PM3 or PM2 when dbrm attempts a reload/resume. While system expects it to still be mounted on PM3 Fix needs to modify procedure to ensure dbroot is always left mounted somewhere on failure in the recovery process. simple way to reproduce behavior is to disconnect PM2 and break the file permissions on the glusterfs mount to data2 so that it will fail on recovery. |
| Comments |
| Comment by Ben Thompson (Inactive) [ 2018-08-08 ] |
|
Simple way to force issue is to have a 4pm/1um setup follow these steps
The fix is that on this failure dbroot2 will be reconnected to pm3 and pm2 will fail to man disabled with no dbroots assigned to it. This ensures the system returns in a usable state. To manually recover from an issue like this user will need to follow this procedure:
|
| Comment by Daniel Lee (Inactive) [ 2018-08-22 ] |
|
Trying to reproduce the issue in 1.1.5-1. |
| Comment by Daniel Lee (Inactive) [ 2018-08-22 ] |
|
It looks like I closed the ticket by accident. Reopening it. |