[MCOL-1699] amazon EC2 install add module/dbroot failed to restart up Created: 2018-09-06  Updated: 2023-10-26  Resolved: 2019-03-04

Status: Closed
Project: MariaDB ColumnStore
Component/s: ?
Affects Version/s: 1.1.6
Fix Version/s: 1.1.6

Type: Bug Priority: Major
Reporter: David Hill (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

amazon ec2 AMI 1um/2pm starting setup



 Description   

adding a um and successful did s startsystem at the end.
added a pm and dbroot and startsystem failed.

mcsadmin> alterSystem-enableModule pm3
altersystem-enablemodule Thu Sep 6 16:17:22 2018

This command starts the processing of applications on a Module within the MariaDB ColumnStore System
Do you want to proceed: (y or n) [n]: y

Enabling Modules
Successful enable of Modules

Performance Module(s) Enabled, run movePmDbrootConfig or assignDbrootPmConfig to assign dbroots, if needed

mcsadmin> adddbroot 1
adddbroot Thu Sep 6 16:18:15 2018

Enter EBS storage size in GB, current setting is 50 :
Please enter: 50
Create AWS Volume for DBRoot #3
Formatting DBRoot #3, please wait...

New DBRoot IDs added = 3

mcsadmin> assignDbrootPmConfig 3 pm3
assigndbrootpmconfig Thu Sep 6 16:19:27 2018

DBRoot IDs assigned to 'pm3' =

Changes being applied

DBRoot IDs assigned to 'pm3' = 3

Successfully Assigned DBRoots

mcsadmin> starts
startsystem Thu Sep 6 16:19:32 2018

System being started, please wait...

TIMEOUT: ProcMon not responding to getSystemStatus

        • startSystem Failed : check log files

mcsadmin> getsystemi
getsysteminfo Thu Sep 6 16:24:26 2018

System 1.1.6

System and Module statuses

Component Status Last Status Change
------------ -------------------------- ------------------------
System FAILED Thu Sep 6 16:19:57 2018

Module um1 ACTIVE Thu Sep 6 16:18:03 2018
Module um2 ACTIVE Thu Sep 6 16:12:31 2018
Module pm1 ACTIVE Thu Sep 6 15:13:10 2018
Module pm2 ACTIVE Thu Sep 6 15:13:19 2018
Module pm3 FAILED Thu Sep 6 16:19:53 2018

Active Parent OAM Performance Module is 'pm1'
Primary Front-End MariaDB ColumnStore Module is 'um1'
MariaDB ColumnStore Replication Feature is enabled

MariaDB ColumnStore Process statuses

Process Module Status Last Status Change Process ID
------------------ ------ --------------- ------------------------ ----------
ProcessMonitor um1 ACTIVE Thu Sep 6 15:12:56 2018 2562
ServerMonitor um1 ACTIVE Thu Sep 6 15:13:09 2018 2939
DBRMWorkerNode um1 ACTIVE Thu Sep 6 16:17:37 2018 22228
ExeMgr um1 MAN_OFFLINE Thu Sep 6 16:17:53 2018
DDLProc um1 ACTIVE Thu Sep 6 15:13:26 2018 4534
DMLProc um1 ACTIVE Thu Sep 6 16:12:15 2018 20580
mysqld um1 ACTIVE Thu Sep 6 16:18:03 2018 22593

ProcessMonitor um2 ACTIVE Thu Sep 6 16:10:37 2018 2347
ServerMonitor um2 ACTIVE Thu Sep 6 16:12:24 2018 3627
DBRMWorkerNode um2 ACTIVE Thu Sep 6 16:17:42 2018 5124
ExeMgr um2 MAN_OFFLINE Thu Sep 6 16:17:53 2018
DDLProc um2 COLD_STANDBY Thu Sep 6 16:18:06 2018
DMLProc um2 COLD_STANDBY Thu Sep 6 16:18:07 2018
mysqld um2 ACTIVE Thu Sep 6 16:18:05 2018 5468

ProcessMonitor pm1 ACTIVE Thu Sep 6 15:12:12 2018 4101
ProcessManager pm1 ACTIVE Thu Sep 6 15:12:19 2018 4238
DBRMControllerNode pm1 ACTIVE Thu Sep 6 16:17:34 2018 18315
ServerMonitor pm1 ACTIVE Thu Sep 6 15:13:04 2018 5200
DBRMWorkerNode pm1 ACTIVE Thu Sep 6 16:17:46 2018 18639
DecomSvr pm1 ACTIVE Thu Sep 6 15:13:08 2018 5379
PrimProc pm1 ACTIVE Thu Sep 6 16:17:55 2018 18988
WriteEngineServer pm1 ACTIVE Thu Sep 6 16:18:04 2018 19370

ProcessMonitor pm2 ACTIVE Thu Sep 6 15:12:56 2018 2027
ProcessManager pm2 HOT_STANDBY Thu Sep 6 15:12:57 2018 2152
DBRMControllerNode pm2 COLD_STANDBY Thu Sep 6 15:13:09 2018
ServerMonitor pm2 ACTIVE Thu Sep 6 15:13:13 2018 2252
DBRMWorkerNode pm2 ACTIVE Thu Sep 6 16:17:51 2018 6970
DecomSvr pm2 ACTIVE Thu Sep 6 15:13:17 2018 2283
PrimProc pm2 ACTIVE Thu Sep 6 16:17:56 2018 6992
WriteEngineServer pm2 ACTIVE Thu Sep 6 16:18:05 2018 7028

ProcessMonitor pm3 ACTIVE Thu Sep 6 16:16:29 2018 1933
ProcessManager pm3 MAN_OFFLINE Thu Sep 6 16:17:23 2018
DBRMControllerNode pm3 MAN_OFFLINE Thu Sep 6 16:17:23 2018
ServerMonitor pm3 MAN_OFFLINE Thu Sep 6 16:17:23 2018
DBRMWorkerNode pm3 MAN_OFFLINE Thu Sep 6 16:17:23 2018
DecomSvr pm3 MAN_OFFLINE Thu Sep 6 16:17:23 2018
PrimProc pm3 MAN_OFFLINE Thu Sep 6 16:17:23 2018
WriteEngineServer pm3 MAN_OFFLINE Thu Sep 6 16:17:23 2018

Sep 6 16:16:10 ip-172-31-32-225 ProcessMonitor[1933]: 10.529831 |0|0|0| D 18 CAL0000: checkDataMount called
Sep 6 16:16:10 ip-172-31-32-225 ProcessMonitor[1933]: 10.535693 |0|0|0| W 18 CAL0000: No dbroots are configured in Columnstore.xml file
Sep 6 16:16:10 ip-172-31-32-225 ProcessMonitor[1933]: 10.535720 |0|0|0| W 18 CAL0000: ERROR: checkDataMount to failed, module is disabled, continuing

[root@ip-172-31-32-225 columnstore]# cat err.log
Sep 6 16:19:53 ip-172-31-32-225 ProcessMonitor[1933]: 53.580757 |0|0|0| E 18 CAL0000: checkDataMount: mount failed, DBRoot: /home/mariadb-user/mariadb/columnstore/data3
Sep 6 16:19:53 ip-172-31-32-225 ProcessMonitor[1933]: 53.582201 |0|0|0| C 18 CAL0000: checkDataMount error, startmodule failed
[root@ip-172-31-32-225 columnstore]#

pm1 logs for the add dbroot

Sep 6 16:18:07 ip-172-31-46-167 ProcessManager[4238]: 07.966607 |0|0|0| I 17 CAL0000: ENABLEMODULE: ACK back to sender
Sep 6 16:18:20 ip-172-31-46-167 oamcpp[20155]: 20.859796 |0|0|0| W 08 CAL0006: MCSVolume: attachvolume failed: available
Sep 6 16:18:26 ip-172-31-46-167 oamcpp[30141]: 26.146397 |0|0|0| D 08 CAL0000: addDbroot format cmd: sudo mkfs.ext2 -F /dev/xvdi > /tmp/format.log 2>&1



 Comments   
Comment by David Hill (Inactive) [ 2018-09-06 ]

maybe related to this code check

Sep 6 16:16:09 ip-172-31-32-225 ProcessMonitor[1933]: 09.473800 |0|0|0| D 18 CAL0000: Cloud setting = amazon-vpc

if ( cloud == "amazon-ec2" ) {
if(!aMonitor.amazonIPCheck())

{ log.writeLog(__LINE__, "ERROR: amazonIPCheck failed, exiting", LOG_TYPE_CRITICAL); sleep(2); string cmd = startup::StartUp::installDir() + "/bin/columnstore stop > /dev/null 2>&1"; system(cmd.c_str()); exit(1); }

}

Comment by David Hill (Inactive) [ 2018-09-06 ]

Sep 6 20:36:00 ip-172-31-40-34 ProcessMonitor[2018]: 00.021990 |0|0|0| D 18 CAL0000: amazonVolumeCheck function called for DBRoot3
Sep 6 20:36:00 ip-172-31-40-34 ProcessMonitor[2018]: 00.612712 |0|0|0| D 18 CAL0000: amazonVolumeCheck: getEC2VolumeStatus: failed
Sep 6 20:36:00 ip-172-31-40-34 ProcessMonitor[2018]: 00.612773 |0|0|0| W 18 CAL0000: amazonVolumeCheck function failed, volume not attached and not available: vol-04a0503c47a2b8adc
Sep 6 20:36:00 ip-172-31-40-34 ProcessMonitor[2018]: 00.612812 |0|0|0| D 18 CAL0000: checkDataMount called
Sep 6 20:36:00 ip-172-31-40-34 ProcessMonitor[2018]: 00.618521 |0|0|0| D 18 CAL0000: unmountExtraDBroots called
Sep 6 20:36:00 ip-172-31-40-34 ProcessMonitor[2018]: 00.630752 |0|0|0| D 18 CAL0000: flushInodeCache failed to open file
Sep 6 20:36:00 ip-172-31-40-34 ProcessMonitor[2018]: 00.637825 |0|0|0| D 18 CAL0000: unmountExtraDBroots finished
Sep 6 20:36:00 ip-172-31-40-34 ProcessMonitor[2018]: 00.641273 |0|0|0| D 18 CAL0000: flushInodeCache failed to open file
Sep 6 20:36:01 ip-172-31-40-34 ProcessMonitor[2018]: 01.164630 |0|0|0| E 18 CAL0000: checkDataMount: mount failed, DBRoot: /home/mariadb-user/mariadb/columnstore/data3
Sep 6 20:36:01 ip-172-31-40-34 ProcessMonitor[2018]: 01.166703 |0|0|0| C 18 CAL0000: checkDataMount error, startmodule failed
Sep 6 20:36:01 ip-172-31-40-34 ProcessMonitor[2018]: 01.166762 |0|0|0| I 18 CAL0000: STARTALL: ACK back to ProcMgr, return status = 1

Comment by David Hill (Inactive) [ 2018-09-11 ]

added error checking and found issue, return code not being seen correctly

// run script to get Volume Status
string cmd = InstallDir + "/bin/MCSVolumeCmds.sh describe " + volumeName + " > /tmp/getVolumeStatus_" + volumeName;
int ret = system(cmd.c_str());
writeLog("ret/WEXITSTATUS(ret) : " + itoa(ret) + ":" + itoa(WEXITSTATUS(ret)), LOG_TYPE_DEBUG);

if (WEXITSTATUS(ret) != 0 ){
writeLog("failed", LOG_TYPE_DEBUG);
return "failed";

Sep 11 15:47:02 ip-172-31-38-8 ProcessMonitor[1931]: 02.001055 |0|0|0| D 18 CAL0000: amazonVolumeCheck function called for DBRoot2
Sep 11 15:47:02 ip-172-31-38-8 oamcpp[1931]: 02.557268 |0|0|0| D 08 CAL0000: ret/WEXITSTATUS(ret) : -1:255
Sep 11 15:47:02 ip-172-31-38-8 oamcpp[1931]: 02.557328 |0|0|0| D 08 CAL0000: failed
Sep 11 15:47:02 ip-172-31-38-8 ProcessMonitor[1931]: 02.557358 |0|0|0| D 18 CAL0000: amazonVolumeCheck: getEC2VolumeStatus: failed
Sep 11 15:47:02 ip-172-31-38-8 ProcessMonitor[1931]: 02.557381 |0|0|0| W 18 CAL0000: amazonVolumeCheck function failed, volume not attached and not available: vol-0d7ec60962135c38e

./MCSVolumeCmds.sh describe vol-0d7ec60962135c38e
available
[mariadb-user@ip-172-31-38-8 bin]$ echo $?
0
[mariadb-user@ip-172-31-38-8 bin]$

Comment by David Hill (Inactive) [ 2018-09-11 ]

fixed

mcsadmin> addmodule pm 1
addmodule Tue Sep 11 20:33:01 2018

Enter the 'User' Password or 'ssh' if configured with ssh-keys
Please enter: ssh

Adding Modules pm2, please wait...
Add Module(s) successfully completed
addModule Command Successfully completed: Modules are Disabled, run alterSystem-enableModule command to enable them

mcsadmin> adddbroot 1
adddbroot Tue Sep 11 20:39:03 2018

Enter EBS storage size in GB, current setting is 50 :
Please enter: 50
Create AWS Volume for DBRoot #2
Formatting DBRoot #2, please wait...

New DBRoot IDs added = 2

mcsadmin> alterSystem-enableModule pm2
altersystem-enablemodule Tue Sep 11 20:39:46 2018

This command starts the processing of applications on a Module within the MariaDB ColumnStore System
Do you want to proceed: (y or n) [n]: y

Enabling Modules
Successful enable of Modules

Performance Module(s) Enabled, run movePmDbrootConfig or assignDbrootPmConfig to assign dbroots, if needed

mcsadmin> assignDbrootPmConfig 2 pm2
assigndbrootpmconfig Tue Sep 11 20:40:31 2018

DBRoot IDs assigned to 'pm2' =

Changes being applied

DBRoot IDs assigned to 'pm2' = 2

Successfully Assigned DBRoots

mcsadmin> starts
startsystem Tue Sep 11 20:40:35 2018

System being started, please wait...
Successful start of System

mcsadmin>

Comment by Daniel Lee (Inactive) [ 2019-03-04 ]

Build verified: 1.1.6-1

Generated at Thu Feb 08 02:30:42 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.