[MCOL-3998] FAILOVER: When hot-standby node is down, cold-standby node did not become hot-standby Created: 2020-05-12  Updated: 2023-10-26  Resolved: 2023-10-26

Status: Closed
Project: MariaDB ColumnStore
Component/s: ProcMgr
Affects Version/s: 1.4.4
Fix Version/s: Icebox

Type: Bug Priority: Critical
Reporter: Daniel Lee (Inactive) Assignee: Unassigned
Resolution: Won't Do Votes: 0
Labels: None

Attachments: Zip Archive logs.zip    
Issue Links:
Problem/Incident
is caused by MCOL-3842 1.4.2 centos 7 with gluster setup - p... Closed

 Description   

Build tests: 1.4.4-1 (Jenkins 20200508)

Stack: 3pm combo with glusterfs
OS: Centos 7.6

PM1=active
PM2=hot standby
PM3=cold standby

After two failovers, the stack does not have a hot-standby module.

1. Install a 3pm combo stack with glusterfs
2. take PM3 offline (vagrant halt -f pm3)
3. take PM3 back online (vagrant up pm3)
4. take PM2 offline (vagrant halt -f pm2)

PM3 should be in hot-standby state at this point, but it is in cold-standby. There is no hot-standby module in the stack now.

logs attached. No log files from PM2 since it was down

mcsadmin> getprocessstatus
getprocessstatus Tue May 12 16:35:13 2020

MariaDB ColumnStore Process statuses

Process Module Status Last Status Change Process ID
------------------ ------ --------------- ------------------------ ----------
ProcessMonitor pm1 ACTIVE Tue May 12 16:03:55 2020 4331
ProcessManager pm1 ACTIVE Tue May 12 16:04:01 2020 4523
DBRMControllerNode pm1 ACTIVE Tue May 12 16:29:34 2020 17891
ServerMonitor pm1 ACTIVE Tue May 12 16:04:59 2020 5799
DBRMWorkerNode pm1 ACTIVE Tue May 12 16:29:36 2020 17956
PrimProc pm1 ACTIVE Tue May 12 16:29:49 2020 18136
ExeMgr pm1 ACTIVE Tue May 12 16:33:57 2020 23916
WriteEngineServer pm1 ACTIVE Tue May 12 16:30:06 2020 18361
DDLProc pm1 ACTIVE Tue May 12 16:34:08 2020 24059
DMLProc pm1 ACTIVE Tue May 12 16:34:34 2020 24432
mysqld pm1 ACTIVE Tue May 12 16:30:45 2020 19281

ProcessMonitor pm2 AUTO_OFFLINE Tue May 12 16:33:52 2020
ProcessManager pm2 AUTO_OFFLINE Tue May 12 16:33:52 2020
DBRMControllerNode pm2 AUTO_OFFLINE Tue May 12 16:33:52 2020
ServerMonitor pm2 AUTO_OFFLINE Tue May 12 16:33:52 2020
DBRMWorkerNode pm2 AUTO_OFFLINE Tue May 12 16:33:52 2020
PrimProc pm2 AUTO_OFFLINE Tue May 12 16:33:52 2020
ExeMgr pm2 AUTO_OFFLINE Tue May 12 16:33:52 2020
WriteEngineServer pm2 AUTO_OFFLINE Tue May 12 16:33:52 2020
DDLProc pm2 AUTO_OFFLINE Tue May 12 16:33:52 2020
DMLProc pm2 AUTO_OFFLINE Tue May 12 16:33:52 2020
mysqld pm2 AUTO_OFFLINE Tue May 12 16:33:52 2020

ProcessMonitor pm3 ACTIVE Tue May 12 16:28:57 2020 1242
ProcessManager pm3 COLD_STANDBY Tue May 12 16:34:29 2020
DBRMControllerNode pm3 COLD_STANDBY Tue May 12 16:34:29 2020
ServerMonitor pm3 ACTIVE Tue May 12 16:29:04 2020 11089
DBRMWorkerNode pm3 ACTIVE Tue May 12 16:29:45 2020 11339
PrimProc pm3 ACTIVE Tue May 12 16:29:57 2020 11374
ExeMgr pm3 ACTIVE Tue May 12 16:34:02 2020 12573
WriteEngineServer pm3 ACTIVE Tue May 12 16:30:14 2020 11455
DDLProc pm3 COLD_STANDBY Tue May 12 16:34:29 2020
DMLProc pm3 COLD_STANDBY Tue May 12 16:34:29 2020
mysqld pm3 ACTIVE Tue May 12 16:34:29 2020 11863
mcsadmin> quit


Generated at Thu Feb 08 02:47:01 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.