Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-1130

Can't restart System, When our server relocation?

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Cannot Reproduce
    • 1.0.7
    • Icebox
    • MariaDB Server, ProcMgr
    • None
    • CentOS 6.5 mariadb-columnstore-1.0.7-1-x86_64

    Description

      After server relocation,We can't restart system,and get error log as below:
      mcsadmin> restartsystem
      restartsystem Fri Dec 22 09:22:14 2017
      Do you want to proceed: (y or n) [n]: y

      System being restarted now ...

            • restartSystem Failed : API Failure return in restartSystem API

      ----------------
      [root@HLServer110 columnstore]# cat err.log
      Dec 22 09:22:23 HLServer110 ProcessManager[3268]: 23.450253 |0|0|0| E 17 CAL0000: line: 6255 Error running DBRM save_brm

      ----------------
      [root@HLServer110 columnstore]# cat warning.log
      Dec 22 09:22:19 HLServer110 ProcessManager[3268]: 19.033411 |0|0|0| W 17 CAL0000: pm2 module failed to stop!!

      ----------------
      [root@HLServer110 columnstore]# cat info.log
      Dec 22 09:22:18 HLServer110 ProcessManager[3268]: 18.015530 |0|0|0| I 17 CAL0000: MSG RECEIVED: Restart System request...
      Dec 22 09:22:18 HLServer110 ProcessMonitor[3177]: 18.023963 |0|0|0| I 18 CAL0000: MSG RECEIVED: Stop All process request...
      Dec 22 09:22:18 HLServer110 ProcessMonitor[3177]: 18.024402 |0|0|0| I 18 CAL0000: STOPALL: ACK back to ProcMgr, STATUS_UPDATE only performed
      Dec 22 09:22:18 HLServer110 ProcessManager[3268]: 18.034577 |0|0|0| I 17 CAL0000: RESTARTSYSTEM: Request Stop Module on pm2
      Dec 22 09:22:19 HLServer110 ProcessManager[3268]: 19.033531 |0|0|0| I 17 CAL0000: RESTARTSYSTEM: ACK received from Process-Monitor, return status = 0
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.034908 |0|0|0| I 18 CAL0000: MSG RECEIVED: Stop All process request...
      Dec 22 09:22:21 HLServer110 ProcessMonitor[3177]: 21.361166 |0|0|0| I 18 CAL0000: STOPALL: ACK back to ProcMgr, return status = 0
      Dec 22 09:22:24 HLServer110 ProcessManager[3268]: 24.343899 |0|0|0| I 17 CAL0000: RESTARTSYSTEM: ACK received from Process-Monitor for stopModule requests, return status = 1
      Dec 22 09:22:24 HLServer110 ProcessManager[3268]: 24.344078 |0|0|0| I 17 CAL0000: RESTARTSYSTEM: ACK back to sender
      Dec 22 09:22:24 HLServer110 ProcessManager[3268]: 24.344171 |0|0|0| I 17 CAL0000: Restart System Completed, status = 1

      ----------------
      [root@HLServer110 columnstore]# tail -n 100 debug.log
      Dec 22 09:22:18 HLServer110 ProcessManager[3268]: 18.016406 |0|0|0| D 17 CAL0000: Set System State = MAN_OFFLINE
      Dec 22 09:22:18 HLServer110 ProcessMonitor[3177]: 18.017023 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set System State = MAN_OFFLINE
      Dec 22 09:22:18 HLServer110 ProcessManager[3268]: 18.022752 |0|0|0| D 17 CAL0000: pm1 module is stopped by request.
      Dec 22 09:22:18 HLServer110 ProcessManager[3268]: 18.022852 |0|0|0| D 17 CAL0000: Set Module pm1 State = 2
      Dec 22 09:22:18 HLServer110 ProcessMonitor[3177]: 18.023380 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Module pm1 State = MAN_INIT
      Dec 22 09:22:18 HLServer110 ProcessManager[3268]: 18.023511 |0|0|0| D 17 CAL0000: sendMsgProcMon: Process module pm1
      Dec 22 09:22:18 HLServer110 ProcessManager[3268]: 18.026061 |0|0|0| D 17 CAL0000: pm2 module is stopped by request.
      Dec 22 09:22:18 HLServer110 ProcessManager[3268]: 18.026153 |0|0|0| D 17 CAL0000: Set Module pm2 State = 2
      Dec 22 09:22:18 HLServer110 ProcessMonitor[3177]: 18.026692 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Module pm2 State = MAN_INIT
      Dec 22 09:22:18 HLServer110 ProcessManager[3268]: 18.032138 |0|0|0| D 17 CAL0000: sendMsgProcMon: Process module pm2
      Dec 22 09:22:18 HLServer110 ProcessManager[3268]: 18.034787 |0|0|0| D 17 CAL0000: pm2 module is stopped by request.
      Dec 22 09:22:18 HLServer110 ProcessManager[3268]: 18.034865 |0|0|0| D 17 CAL0000: Set Module pm2 State = 2
      Dec 22 09:22:18 HLServer110 ProcessMonitor[3177]: 18.035424 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Module pm2 State = MAN_INIT
      Dec 22 09:22:18 HLServer110 ProcessManager[3268]: 18.040640 |0|0|0| D 17 CAL0000: sendMsgProcMon: Process module pm2
      Dec 22 09:22:19 HLServer110 ProcessManager[3268]: 19.033708 |0|0|0| D 17 CAL0000: pm1 module is stopped by request.
      Dec 22 09:22:19 HLServer110 ProcessManager[3268]: 19.033782 |0|0|0| D 17 CAL0000: Set Module pm1 State = 2
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.034357 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Module pm1 State = MAN_INIT
      Dec 22 09:22:19 HLServer110 ProcessManager[3268]: 19.034482 |0|0|0| D 17 CAL0000: sendMsgProcMon: Process module pm1
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.036065 |0|0|0| D 18 CAL0000: STOPPING Process: DMLProc
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.036214 |0|0|0| D 18 CAL0000: StatusUpdate of Process DMLProc State = 0 PID = 0
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.036310 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 25 on device DMLProc
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.038223 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/DMLProc State = MAN_OFFLINE
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.038370 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/DMLProc State = MAN_OFFLINE PID = 0
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.041100 |0|0|0| D 18 CAL0000: Send SET Alarm ID 21 on device DMLProc
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.057360 |0|0|0| D 18 CAL0000: Pkill Process just to make sure: DMLProc*
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.057490 |0|0|0| D 18 CAL0000: STOPPING Process: DDLProc
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.057618 |0|0|0| D 18 CAL0000: StatusUpdate of Process DDLProc State = 0 PID = 0
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.057712 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 25 on device DDLProc
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.059941 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/DDLProc State = MAN_OFFLINE
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.060084 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/DDLProc State = MAN_OFFLINE PID = 0
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.061109 |0|0|0| D 18 CAL0000: Send SET Alarm ID 21 on device DDLProc
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.079658 |0|0|0| D 18 CAL0000: Pkill Process just to make sure: DDLProc*
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.079785 |0|0|0| D 18 CAL0000: STOPPING Process: WriteEngineServer
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.079902 |0|0|0| D 18 CAL0000: StatusUpdate of Process WriteEngineServer State = 0 PID = 0
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.079960 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 25 on device WriteEngineServer
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.082031 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/WriteEngineServer State = MAN_OFFLINE
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.082157 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/WriteEngineServer State = MAN_OFFLINE PID = 0
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.082897 |0|0|0| D 18 CAL0000: Send SET Alarm ID 21 on device WriteEngineServer
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.100912 |0|0|0| D 18 CAL0000: Pkill Process just to make sure: WriteEngineServ*
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.101037 |0|0|0| D 18 CAL0000: STOPPING Process: ExeMgr
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.101147 |0|0|0| D 18 CAL0000: StatusUpdate of Process ExeMgr State = 0 PID = 0
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.101246 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 25 on device ExeMgr
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.102941 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/ExeMgr State = MAN_OFFLINE
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.103065 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/ExeMgr State = MAN_OFFLINE PID = 0
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.104149 |0|0|0| D 18 CAL0000: Send SET Alarm ID 21 on device ExeMgr
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.122374 |0|0|0| D 18 CAL0000: Pkill Process just to make sure: ExeMgr*
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.122497 |0|0|0| D 18 CAL0000: STOPPING Process: PrimProc
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.122606 |0|0|0| D 18 CAL0000: StatusUpdate of Process PrimProc State = 0 PID = 0
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.122690 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 25 on device PrimProc
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.124526 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/PrimProc State = MAN_OFFLINE
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.124654 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/PrimProc State = MAN_OFFLINE PID = 0
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.125791 |0|0|0| D 18 CAL0000: Send SET Alarm ID 21 on device PrimProc
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.143851 |0|0|0| D 18 CAL0000: Pkill Process just to make sure: PrimProc*
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.143976 |0|0|0| D 18 CAL0000: STOPPING Process: DecomSvr
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.144092 |0|0|0| D 18 CAL0000: StatusUpdate of Process DecomSvr State = 0 PID = 0
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.144154 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 25 on device DecomSvr
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.145723 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/DecomSvr State = MAN_OFFLINE
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.145879 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/DecomSvr State = MAN_OFFLINE PID = 0
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.147011 |0|0|0| D 18 CAL0000: Send SET Alarm ID 21 on device DecomSvr
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.165526 |0|0|0| D 18 CAL0000: Pkill Process just to make sure: DecomSvr*
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.165645 |0|0|0| D 18 CAL0000: STOPPING Process: DBRMWorkerNode
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.165756 |0|0|0| D 18 CAL0000: StatusUpdate of Process DBRMWorkerNode State = 0 PID = 0
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.165849 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 25 on device DBRMWorkerNode
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.167865 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/DBRMWorkerNode State = MAN_OFFLINE
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.167979 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/DBRMWorkerNode State = MAN_OFFLINE PID = 0
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.169042 |0|0|0| D 18 CAL0000: Send SET Alarm ID 21 on device DBRMWorkerNode
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.187243 |0|0|0| D 18 CAL0000: Pkill Process just to make sure: workernode*
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.187363 |0|0|0| D 18 CAL0000: STOPPING Process: ServerMonitor
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.187475 |0|0|0| D 18 CAL0000: StatusUpdate of Process ServerMonitor State = 0 PID = 0
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.187559 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 25 on device ServerMonitor
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.189784 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/ServerMonitor State = MAN_OFFLINE
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.189941 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/ServerMonitor State = MAN_OFFLINE PID = 0
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.190853 |0|0|0| D 18 CAL0000: Send SET Alarm ID 21 on device ServerMonitor
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.208791 |0|0|0| D 18 CAL0000: Pkill Process just to make sure: ServerMonitor*
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.208913 |0|0|0| D 18 CAL0000: STOPPING Process: DBRMControllerNode
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.209031 |0|0|0| D 18 CAL0000: StatusUpdate of Process DBRMControllerNode State = 0 PID = 0
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.209116 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 25 on device DBRMControllerNode
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.211139 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/DBRMControllerNode State = MAN_OFFLINE
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.211254 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/DBRMControllerNode State = MAN_OFFLINE PID = 0
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.212667 |0|0|0| D 18 CAL0000: Send SET Alarm ID 21 on device DBRMControllerNode
      Dec 22 09:22:19 HLServer110 ProcessMonitor[3177]: 19.230747 |0|0|0| D 18 CAL0000: Pkill Process just to make sure: controllernode*
      Dec 22 09:22:21 HLServer110 ProcessMonitor[3177]: 21.268807 |0|0|0| D 18 CAL0000: BRM reset_locks script run
      Dec 22 09:22:21 HLServer110 ProcessMonitor[3177]: 21.269607 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/mysqld State = MAN_OFFLINE
      Dec 22 09:22:21 HLServer110 ProcessMonitor[3177]: 21.269735 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/mysqld State = MAN_OFFLINE PID = 0
      Dec 22 09:22:21 HLServer110 ProcessMonitor[3177]: 21.359965 |0|0|0| D 18 CAL0000: Stop MySQL Process
      Dec 22 09:22:21 HLServer110 ProcessMonitor[3177]: 21.360778 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Module pm1 State = MAN_OFFLINE
      Dec 22 09:22:21 HLServer110 ProcessMonitor[3177]: 21.360897 |0|0|0| D 18 CAL0000: unmountExtraDBroots called
      Dec 22 09:22:21 HLServer110 ProcessManager[3268]: 21.361308 |0|0|0| D 17 CAL0000: pm1 module is successfully stopped.
      Dec 22 09:22:21 HLServer110 ProcessManager[3268]: 21.365243 |0|0|0| D 17 CAL0000: Running reset_locks
      Dec 22 09:22:23 HLServer110 ProcessManager[3268]: 23.401485 |0|0|0| D 17 CAL0000: Ran reset_locks
      Dec 22 09:22:23 HLServer110 ProcessManager[3268]: 23.401677 |0|0|0| D 17 CAL0000: Running DBRM save_brm
      Dec 22 09:22:24 HLServer110 ProcessManager[3268]: 24.343707 |0|0|0| D 17 CAL0000: Successfully ran DBRM clearShm

      How can i recovery this question?

      Attachments

        Activity

          People

            Unassigned Unassigned
            chenxinglong Allen Chan
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.