[MCOL-1139] removemodule pm1 failed when module was AUTO_DISABLED Created: 2018-01-05  Updated: 2022-11-05  Resolved: 2022-11-05

Status: Closed
Project: MariaDB ColumnStore
Component/s: N/A
Affects Version/s: None
Fix Version/s: Icebox

Type: Bug Priority: Major
Reporter: David Hill (Inactive) Assignee: Unassigned
Resolution: Won't Do Votes: 0
Labels: None
Environment:

non-root amazon ami with EBS combo 3pm system



 Description   

in failover testing, stopped PM1 node after install since it was the master node. pm3 became the master.then tried to remove module pm1 and got an error

mcsadmin> getsystemi
getsysteminfo Fri Jan 5 16:35:18 2018

System 1.1.2

System and Module statuses

Component Status Last Status Change
------------ -------------------------- ------------------------
System ACTIVE Fri Jan 5 15:54:31 2018

Module pm1 AUTO_DISABLED/DEGRADED Fri Jan 5 15:53:44 2018
Module pm2 ACTIVE Fri Jan 5 15:54:31 2018
Module pm3 ACTIVE Fri Jan 5 15:54:22 2018

Active Parent OAM Performance Module is 'pm3'
Primary Front-End MariaDB ColumnStore Module is 'pm3'
MariaDB ColumnStore Replication Feature is enabled

MariaDB ColumnStore Process statuses

Process Module Status Last Status Change Process ID
------------------ ------ --------------- ------------------------ ----------
ProcessMonitor pm1 AUTO_OFFLINE Fri Jan 5 15:54:30 2018
ProcessManager pm1 AUTO_OFFLINE Fri Jan 5 15:54:30 2018
DBRMControllerNode pm1 AUTO_OFFLINE Fri Jan 5 15:54:30 2018
ServerMonitor pm1 AUTO_OFFLINE Fri Jan 5 15:54:30 2018
DBRMWorkerNode pm1 AUTO_OFFLINE Fri Jan 5 15:54:30 2018
DecomSvr pm1 AUTO_OFFLINE Fri Jan 5 15:54:30 2018
PrimProc pm1 AUTO_OFFLINE Fri Jan 5 15:54:30 2018
ExeMgr pm1 AUTO_OFFLINE Fri Jan 5 15:54:30 2018
WriteEngineServer pm1 AUTO_OFFLINE Fri Jan 5 15:54:30 2018
DDLProc pm1 AUTO_OFFLINE Fri Jan 5 15:54:30 2018
DMLProc pm1 AUTO_OFFLINE Fri Jan 5 15:54:30 2018
mysqld pm1 AUTO_OFFLINE Fri Jan 5 15:54:30 2018

ProcessMonitor pm2 ACTIVE Fri Jan 5 15:43:07 2018 15334
ProcessManager pm2 COLD_STANDBY Fri Jan 5 15:54:31 2018
DBRMControllerNode pm2 COLD_STANDBY Fri Jan 5 15:54:31 2018
ServerMonitor pm2 ACTIVE Fri Jan 5 15:43:22 2018 15820
DBRMWorkerNode pm2 ACTIVE Fri Jan 5 15:43:23 2018 15846
DecomSvr pm2 ACTIVE Fri Jan 5 15:43:26 2018 15877
PrimProc pm2 ACTIVE Fri Jan 5 15:43:30 2018 15885
ExeMgr pm2 ACTIVE Fri Jan 5 15:43:39 2018 16794
WriteEngineServer pm2 ACTIVE Fri Jan 5 15:43:43 2018 16815
DDLProc pm2 COLD_STANDBY Fri Jan 5 15:54:31 2018
DMLProc pm2 COLD_STANDBY Fri Jan 5 15:54:31 2018
mysqld pm2 ACTIVE Fri Jan 5 15:54:33 2018 15694

ProcessMonitor pm3 ACTIVE Fri Jan 5 15:43:08 2018 14322
ProcessManager pm3 ACTIVE Fri Jan 5 15:54:24 2018 14457
DBRMControllerNode pm3 ACTIVE Fri Jan 5 15:53:59 2018 17257
ServerMonitor pm3 ACTIVE Fri Jan 5 15:54:01 2018 17273
DBRMWorkerNode pm3 ACTIVE Fri Jan 5 15:54:01 2018 17304
DecomSvr pm3 ACTIVE Fri Jan 5 15:54:05 2018 17344
PrimProc pm3 ACTIVE Fri Jan 5 15:54:07 2018 17374
ExeMgr pm3 ACTIVE Fri Jan 5 15:54:11 2018 17476
WriteEngineServer pm3 ACTIVE Fri Jan 5 15:54:15 2018 17535
DDLProc pm3 ACTIVE Fri Jan 5 15:54:19 2018 17590
DMLProc pm3 ACTIVE Fri Jan 5 15:54:24 2018 17661
mysqld pm3 ACTIVE Fri Jan 5 15:54:02 2018 17056

Active Alarm Counts: Critical = 2, Major = 1, Minor = 0, Warning = 0, Info = 0
mcsadmin> getst
getstorageconfig Fri Jan 5 16:35:21 2018

System Storage Configuration

Performance Module (DBRoot) Storage Type = external
User Module Storage Type = internal
System Assigned DBRoot Count = 3
DBRoot IDs assigned to 'pm1' =
DBRoot IDs assigned to 'pm2' = 2
DBRoot IDs assigned to 'pm3' = 1, 3

Amazon EC2 Volume Name/Device Name/Amazon Device Name for DBRoot1: vol-072d7d6b55ee398e9, /dev/sdg, /dev/xvdg
Amazon EC2 Volume Name/Device Name/Amazon Device Name for DBRoot2: vol-0f2fa799bd8450525, /dev/sdh, /dev/xvdh
Amazon EC2 Volume Name/Device Name/Amazon Device Name for DBRoot3: vol-0a1492a16c1d4404e, /dev/sdi, /dev/xvdi

mcsadmin> removemodule pm1
removemodule Fri Jan 5 16:35:31 2018

!!!!! DESTRUCTIVE COMMAND !!!!!

This command removes module(s) from the MariaDB ColumnStore System
Do you want to proceed: (y or n) [n]: y

        • removeModule Failed : pm1 is not MAN_OFFLINE, DISABLED, or FAILED state.

mcsadmin>

----------------------------------------------------------------------------------------------------------------------
but this worked, but it left the storage in a bad state showing DBROOT #1 unassigned

mcsadmin> altersystem-di pm1
altersystem-disablemodule Fri Jan 5 16:40:18 2018

This command stops the processing of applications on a Module within the MariaDB ColumnStore System
Do you want to proceed: (y or n) [n]: y

Stopping Modules

Successful stop of Modules

Disabling Modules
Successful disable of Modules

mcsadmin> removemodule pm1
removemodule Fri Jan 5 16:40:34 2018

!!!!! DESTRUCTIVE COMMAND !!!!!

This command removes module(s) from the MariaDB ColumnStore System
Do you want to proceed: (y or n) [n]: y

Removing Module(s) pm1, please wait...

Remove Module successfully completed

mcsadmin> getst
getstorageconfig Fri Jan 5 16:40:54 2018

System Storage Configuration

Performance Module (DBRoot) Storage Type = external
User Module Storage Type = internal
System Assigned DBRoot Count = 3
DBRoot IDs assigned to 'pm2' = 2

DBRoot IDs unassigned = 1, 3

Amazon EC2 Volume Name/Device Name/Amazon Device Name for DBRoot2: vol-0f2fa799bd8450525, /dev/sdh, /dev/xvdh
Amazon EC2 Volume Name/Device Name/Amazon Device Name for DBRoot1: vol-072d7d6b55ee398e9, /dev/sdg, /dev/xvdg
Amazon EC2 Volume Name/Device Name/Amazon Device Name for DBRoot3: vol-0a1492a16c1d4404e, /dev/sdi, /dev/xvdi

mcsadmin>



 Comments   
Comment by David Hill (Inactive) [ 2018-01-25 ]

I discovered there are a number of issues when pm1 is removed from the system. A number of spots in the OAM code assumes a pm1 exist.. So with this being said, there is a bigger impact on fixing the bug and it should be pushed...

the getst issue shown earlyis related to to.

also issue where postConfigure will not allow an upgrade because it looks for MdouleIPAddr1-1-3 being populated. if you removemodule pm1, its not and thus postconfigure will fail.

there are probably other issues..

Comment by David Hill (Inactive) [ 2018-01-29 ]

Look at addressing this issue by changing a working pm to pm1 went pm1 goes bad..

Comment by David Hill (Inactive) [ 2018-02-01 ]

need to rething the remove pm1 logic

Comment by Todd Stoffel (Inactive) [ 2022-11-05 ]

Item is out of date. Closing due to inactivity. If you feel this was done in error please open a new ticket.

Generated at Thu Feb 08 02:26:29 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.