[MCOL-435] Amazon AMi multi-node system didnt successfully restart after a stop/start Created: 2016-12-04 Updated: 2016-12-09 Resolved: 2016-12-09 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | None |
| Affects Version/s: | 1.0.6 |
| Fix Version/s: | 1.0.6 |
| Type: | Bug | Priority: | Major |
| Reporter: | David Hill (Inactive) | Assignee: | David Hill (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
aws multi-node system |
||
| Sprint: | 2016-24 |
| Description |
|
during the fixing of Component Status Last Status Change Module um1 MAN_OFFLINE Sun Dec 4 23:43:36 2016 Active Parent OAM Performance Module is 'pm1' MariaDB Columnstore Process statuses Process Module Status Last Status Change Process ID ProcessMonitor pm1 ACTIVE Sun Dec 4 23:43:05 2016 1437 ProcessMonitor pm2 ACTIVE Sun Dec 4 23:45:15 2016 11675 |
| Comments |
| Comment by David Hill (Inactive) [ 2016-12-04 ] |
|
One issue is the this : DBRMControllerNode pm2 AUTO_OFFLINE This process should be getting started on PM2. |
| Comment by David Hill (Inactive) [ 2016-12-05 ] |
|
Also determined that the ProcMon is PM2 was continuing restarting... This had been a problem in the past related to ProcMon trying to write to the log directory to update the alarms logs. The log directory is correctly permissioned, which fixed the previous issue... But it might be tried somehow the the log and alarm again. |
| Comment by David Hill (Inactive) [ 2016-12-07 ] |
|
fixed with changes in 1. Fixed a code issue in procmgr from an older checkin M oam/install_scripts/columnstore |
| Comment by David Hill (Inactive) [ 2016-12-07 ] |
|
please review, might need to discuss some of the changes |
| Comment by David Hill (Inactive) [ 2016-12-09 ] |
|
tested on build from 12/09, passed restart test |