Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Cannot Reproduce
-
1.0.10
-
None
-
Virtual Box Version 5.1.26 r117224 (Qt5.6.2) - Running on OS X
Linux ubuntu-03 4.9.13
Description
Problem
When connecting via a client, get the following error:
ERROR 1815 (HY000): Internal error: IDB-2004: Cannot connect to ExeMgr.
|
Restarting the ColumnStore via mcsadmin did not clear this error.
mcsadmin> getSystemStatus
|
getsystemstatus Fri Aug 18 23:28:07 2017
|
|
|
System columnstore-1
|
|
|
System and Module statuses
|
|
|
Component Status Last Status Change
|
------------ -------------------------- ------------------------
|
System ACTIVE Fri Aug 18 23:26:37 2017
|
|
|
Module um1 MAN_OFFLINE Fri Aug 18 22:03:34 2017
|
Module um2 MAN_OFFLINE Fri Aug 18 22:03:37 2017
|
Module pm1 MAN_INIT Fri Aug 18 22:03:40 2017
|
Module pm2 MAN_OFFLINE Fri Aug 18 22:03:40 2017
|
|
|
Active Parent OAM Performance Module is 'pm1'
|
Primary Front-End MariaDB ColumnStore Module is 'um1'
|
Local Query Feature is enabled
|
MariaDB ColumnStore Replication Feature is enabled
|
mcsadmin> getProcessStatus
|
getprocessstatus Fri Aug 18 23:28:28 2017
|
|
|
MariaDB ColumnStore Process statuses
|
|
|
Process Module Status Last Status Change Process ID
|
------------------ ------ --------------- ------------------------ ----------
|
ProcessMonitor um1 ACTIVE Wed Aug 16 16:34:05 2017 1099
|
ServerMonitor um1 FAILED Fri Aug 18 23:19:42 2017 6793
|
DBRMWorkerNode um1 ACTIVE Fri Aug 18 23:19:22 2017 6825
|
ExeMgr um1 MAN_OFFLINE Fri Aug 18 23:18:58 2017
|
DDLProc um1 MAN_OFFLINE Fri Aug 18 23:18:58 2017
|
DMLProc um1 MAN_OFFLINE Fri Aug 18 23:18:58 2017
|
mysqld um1 ACTIVE Fri Aug 18 23:19:25 2017 6752
|
|
|
ProcessMonitor um2 ACTIVE Wed Aug 16 16:34:05 2017 1084
|
ServerMonitor um2 FAILED Fri Aug 18 23:19:41 2017 24821
|
DBRMWorkerNode um2 ACTIVE Fri Aug 18 23:19:29 2017 24853
|
ExeMgr um2 MAN_OFFLINE Fri Aug 18 23:19:01 2017
|
DDLProc um2 MAN_OFFLINE Fri Aug 18 23:19:01 2017
|
DMLProc um2 MAN_OFFLINE Fri Aug 18 23:19:01 2017
|
mysqld um2 ACTIVE Fri Aug 18 23:19:31 2017 24775
|
|
|
ProcessMonitor pm1 ACTIVE Wed Aug 16 16:33:55 2017 1267
|
ProcessManager pm1 ACTIVE Wed Aug 16 16:34:01 2017 1554
|
DBRMControllerNode pm1 ACTIVE Fri Aug 18 23:19:19 2017 23270
|
ServerMonitor pm1 FAILED Fri Aug 18 23:19:49 2017 23750
|
DBRMWorkerNode pm1 ACTIVE Fri Aug 18 23:19:21 2017 23353
|
DecomSvr pm1 ACTIVE Fri Aug 18 23:19:25 2017 23525
|
PrimProc pm1 ACTIVE Fri Aug 18 23:19:27 2017 23601
|
ExeMgr pm1 ACTIVE Fri Aug 18 23:19:48 2017 26317
|
WriteEngineServer pm1 ACTIVE Fri Aug 18 23:19:51 2017 26501
|
mysqld pm1 ACTIVE Fri Aug 18 23:19:48 2017 23024
|
|
|
ProcessMonitor pm2 ACTIVE Wed Aug 16 16:34:11 2017 1069
|
ProcessManager pm2 HOT_STANDBY Fri Aug 18 23:19:12 2017 19507
|
DBRMControllerNode pm2 COLD_STANDBY Fri Aug 18 23:19:30 2017
|
ServerMonitor pm2 FAILED Fri Aug 18 23:19:53 2017 19919
|
DBRMWorkerNode pm2 ACTIVE Fri Aug 18 23:19:38 2017 19974
|
DecomSvr pm2 ACTIVE Fri Aug 18 23:19:42 2017 19990
|
PrimProc pm2 ACTIVE Fri Aug 18 23:19:44 2017 20024
|
ExeMgr pm2 ACTIVE Fri Aug 18 23:19:48 2017 20053
|
WriteEngineServer pm2 ACTIVE Fri Aug 18 23:19:53 2017 20075
|
mysqld pm2 ACTIVE Fri Aug 18 23:19:33 2017 19770
|
Reproduce
The environment was created as follows
1. VirtualBox (2GB RAM assigned)
2. Docker Containers for 2 x UM and 2 x PM running Ubuntu
The customer was up and running, data loaded via cpimport.
We wanted to visualize the data, so ran MetaBase in another container via
docker run -d -p 3000:3000 --name metabase metabase/metabase
|
3. After MetaBase container has started, connect via port 3000 and connect to the ColumnStore cluster
4. During the connection to the ColumnSTore cluster, the cluster started to report the above error. Clearly something in the startup process or meta data gathering caused ColumnStore to fail
We built the cluster again and reproduced this a second time.
Solution
- gracefully fail and not leave the cluster compromised
- provide workaround to clear the error
Workaround
None. We had to rebuild the ColumnStore cluster.