Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Won't Fix
-
1.2.3
-
None
-
2um 2pm with local query enabled
Description
Reported by customer and reproduced:
System with multiple UMs and local query enabled, if UM1 goes down the ExeMgrs are all stopped and started as part of the recovery process. The ExeMgrs fail to start leaving the system in this state:
System BUSY_INIT Thu May 30 14:45:51 2019
Module um1 AUTO_DISABLED/DEGRADED Thu May 30 14:45:57 2019
Module um2 FAILED Thu May 30 14:48:20 2019
Module pm1 ACTIVE Thu May 30 14:48:02 2019
Module pm2 ACTIVE Thu May 30 14:48:03 2019
Active Parent OAM Performance Module is 'pm1'
Primary Front-End MariaDB ColumnStore Module is 'um2'
Local Query Feature is enabled
MariaDB ColumnStore Replication Feature is enabled
MariaDB ColumnStore set for Distributed Install
MariaDB ColumnStore Process statuses
Process Module Status Last Status Change Process ID
------------------ ------ --------------- ------------------------ ----------
ProcessMonitor um1 AUTO_OFFLINE Thu May 30 14:45:57 2019
ServerMonitor um1 AUTO_OFFLINE Thu May 30 14:45:57 2019
DBRMWorkerNode um1 AUTO_OFFLINE Thu May 30 14:45:57 2019
ExeMgr um1 AUTO_OFFLINE Thu May 30 14:45:57 2019
DDLProc um1 AUTO_OFFLINE Thu May 30 14:45:57 2019
DMLProc um1 AUTO_OFFLINE Thu May 30 14:45:57 2019
mysqld um1 AUTO_OFFLINE Thu May 30 14:45:57 2019
ProcessMonitor um2 ACTIVE Thu May 30 14:42:22 2019 7059
ServerMonitor um2 ACTIVE Thu May 30 14:42:48 2019 7497
DBRMWorkerNode um2 ACTIVE Thu May 30 14:47:19 2019 11086
ExeMgr um2 ACTIVE Thu May 30 14:47:50 2019 11270
DDLProc um2 COLD_STANDBY Thu May 30 14:46:48 2019
DMLProc um2 COLD_STANDBY Thu May 30 14:46:49 2019
mysqld um2 ACTIVE Thu May 30 14:48:24 2019 11521
ProcessMonitor pm1 ACTIVE Thu May 30 14:41:30 2019 9303
ProcessManager pm1 ACTIVE Thu May 30 14:41:36 2019 9427
DBRMControllerNode pm1 ACTIVE Thu May 30 14:47:16 2019 23967
ServerMonitor pm1 ACTIVE Thu May 30 14:42:42 2019 11653
DBRMWorkerNode pm1 ACTIVE Thu May 30 14:47:23 2019 24115
PrimProc pm1 ACTIVE Thu May 30 14:47:32 2019 24253
ExeMgr pm1 MAN_OFFLINE Thu May 30 14:45:59 2019
WriteEngineServer pm1 ACTIVE Thu May 30 14:47:45 2019 24491
mysqld pm1 ACTIVE Thu May 30 14:48:02 2019 24952
ProcessMonitor pm2 ACTIVE Thu May 30 14:42:32 2019 7669
ProcessManager pm2 HOT_STANDBY Thu May 30 14:42:33 2019 7765
DBRMControllerNode pm2 COLD_STANDBY Thu May 30 14:47:15 2019
ServerMonitor pm2 ACTIVE Thu May 30 14:42:53 2019 8137
DBRMWorkerNode pm2 ACTIVE Thu May 30 14:47:28 2019 10444
PrimProc pm2 ACTIVE Thu May 30 14:47:36 2019 10512
ExeMgr pm2 MAN_OFFLINE Thu May 30 14:45:59 2019
WriteEngineServer pm2 ACTIVE Thu May 30 14:47:46 2019 10589
mysqld pm2 ACTIVE Thu May 30 14:48:03 2019 10855
From pm1 logs when ExeMgr is trying to start back up
May 30 14:46:47 ip-172-31-38-221 ProcessMonitor[9303]: 47.487022 |0|0|0| E 18 CAL0000: Process location: not found
May 30 14:47:52 ip-172-31-38-221 ProcessMonitor[9303]: 52.591412 |0|0|0| E 18 CAL0000: Process location: not found
I think the issue is that in the a separate system install, the ExeMgr Process Configuration shows its running on UM. So the reason for the error above. Looks like there needs to be additional code to handle the local query option.
Process #7 Configuration information
ProcessName = ExeMgr
ModuleType = um
ProcessLocation = /usr/local/mariadb/columnstore/bin/ExeMgr
BootLaunch = 2
LaunchID = 30
DepModuleName1 = pm*
DepProcessName1 = PrimProc
RunType = LOADSHARE
LogFile = off