Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-434

DecomSvr fails to cleanly start when a Amazon Instance is stopped/started

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 1.0.6
    • 1.0.6
    • None
    • None
    • AWS system
    • 2016-24

    Description

      DecomSvr fails to start the initially after a instance is stop/started
      After a number of retries, it did go ACTIVE

      Issue found when MCOL-422 was fixed

      Component Status Last Status Change
      ------------ -------------------------- ------------------------
      System ACTIVE Sun Dec 4 21:15:59 2016
      Module pm1 ACTIVE Sun Dec 4 21:15:56 2016
      MariaDB Columnstore Process statuses
      Process Module Status Last Status Change Process ID
      ------------------ ------ --------------- ------------------------ ----------
      ProcessMonitor pm1 ACTIVE Sun Dec 4 21:15:00 2016 1245
      ProcessManager pm1 ACTIVE Sun Dec 4 21:15:06 2016 4005
      DBRMControllerNode pm1 ACTIVE Sun Dec 4 21:15:14 2016 5479
      ServerMonitor pm1 ACTIVE Sun Dec 4 21:15:16 2016 5870
      DBRMWorkerNode pm1 ACTIVE Sun Dec 4 21:15:16 2016 6142
      DecomSvr pm1 AUTO_INIT Sun Dec 4 21:17:07 2016 16905
      PrimProc pm1 ACTIVE Sun Dec 4 21:15:25 2016 9215
      ExeMgr pm1 ACTIVE Sun Dec 4 21:15:29 2016 9438
      WriteEngineServer pm1 ACTIVE Sun Dec 4 21:15:33 2016 9518
      DDLProc pm1 ACTIVE Sun Dec 4 21:15:53 2016 15795
      DMLProc pm1 ACTIVE Sun Dec 4 21:15:58 2016 15903
      mysqld pm1 ACTIVE Sun Dec 4 21:15:58 2016 4712

      STARTING Process: DecomSvr
      Dec 4 21:17:37 ip-172-30-0-34 ProcessMonitor[1245]: 37.384443 |0|0|0| D 18 CAL0000: Process location: //home/mariadb-user/mariadb/columnstore/bin/DecomSvr
      Dec 4 21:17:37 ip-172-30-0-34 ProcessMonitor[1245]: 37.386610 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 25 on device DecomSvr
      Dec 4 21:17:37 ip-172-30-0-34 ProcessMonitor[1245]: 37.390396 |0|0|0| D 18 CAL0000: Pkill Process just to make sure: DecomSvr*
      Dec 4 21:17:37 ip-172-30-0-34 ProcessMonitor[1245]: 37.390447 |0|0|0| D 18 CAL0000: StatusUpdate of Process DecomSvr State = 3 PID = 0
      Dec 4 21:17:37 ip-172-30-0-34 ProcessMonitor[1245]: 37.393405 |0|0|0| D 18 CAL0000: Send SET Alarm ID 13 on device DecomSvr
      Dec 4 21:17:37 ip-172-30-0-34 ProcessMonitor[1245]: 37.400608 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 27 on device DBRM
      Dec 4 21:17:39 ip-172-30-0-34 ProcessMonitor[1245]: 39.409193 |0|0|0| D 18 CAL0000: StatusUpdate of Process DecomSvr State = 4 PID = 17400
      Dec 4 21:17:39 ip-172-30-0-34 ProcessMonitor[1245]: 39.416406 |0|0|0| E 18 CAL0000: EXCEPTION ERROR on setProcessStatus: Caught unknown exception!
      Dec 4 21:17:39 ip-172-30-0-34 ProcessMonitor[1245]: 39.416470 |0|0|0| D 18 CAL0000: StatusUpdate of Process DecomSvr State = 21 PID = 17400
      Dec 4 21:17:39 ip-172-30-0-34 ProcessMonitor[1245]: 39.422558 |0|0|0| E 18 CAL0000: EXCEPTION ERROR on setProcessStatus: Caught unknown exception!
      Dec 4 21:17:39 ip-172-30-0-34 ProcessMonitor[1245]: 39.422605 |0|0|0| D 18 CAL0000: DecomSvr PID is 17400
      Dec 4 21:17:39 ip-172-30-0-34 ProcessMonitor[1245]: 39.422716 |0|0|0| D 18 CAL0000: Inform Process Mgr that process was restarted: DecomSvr
      Dec 4 21:17:39 ip-172-30-0-34 ProcessMonitor[1245]: 39.422736 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 21 on device DecomSvr
      Dec 4 21:17:39 ip-172-30-0-34 ProcessMonitor[1245]: 39.423338 |0|0|0| I 18 CAL0000: Calpont Process DecomSvr restarted successfully!!
      Dec 4 21:17:39 ip-172-30-0-34 ProcessMonitor[1245]: 39.428955 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 13 on device DecomSvr
      Dec 4 21:17:39 ip-172-30-0-34 ProcessMonitor[1245]: 39.437494 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 25 on device DecomSvr
      Dec 4 21:17:39 ip-172-30-0-34 ProcessManager[4005]: 39.482441 |0|0|0| I 17 CAL0000: MSG RECEIVED: Process Restarted on pm1/DecomSvr
      Dec 4 21:18:04 ip-172-30-0-34 ProcessMonitor[1245]: 04.531056 |0|0|0| C 18 CAL0000: DecomSvr/17400 failed to init in 20 seconds, force killing it so it can restart
      Dec 4 21:18:04 ip-172-30-0-34 ProcessMonitor[1245]: 04.531098 |0|0|0| D 18 CAL0000: Send SET Alarm ID 25 on device DecomSvr
      Dec 4 21:18:09 ip-172-30-0-34 ProcessMonitor[1245]: 09.562675 |0|0|0| C 18 CAL0000: *****Calpont Process Restarting: DecomSvr, old PID = 17400
      Dec 4 21:18:09 ip-172-30-0-34 ProcessMonitor[1245]: 09.562797 |0|0|0| D 18 CAL0000: STOPPING Process: DecomSvr
      Dec 4 21:18:09 ip-172-30-0-34 ProcessMonitor[1245]: 09.562851 |0|0|0| D 18 CAL0000: StatusUpdate of Process DecomSvr State = 1 PID = 0
      Dec 4 21:18:09 ip-172-30-0-34 ProcessMonitor[1245]: 09.562905 |0|0|0| D 18 CAL0000: Send SET Alarm ID 13 on device DecomSvr
      Dec 4 21:18:09 ip-172-30-0-34 ProcessMonitor[1245]: 09.569512 |0|0|0| D 18 CAL0000: Pkill Process just to make sure: DecomSvr*
      Dec 4 21:18:09 ip-172-30-0-34 ProcessMonitor[1245]: 09.569562 |0|0|0| D 18 CAL0000: STARTING Process: DecomSvr
      Dec 4 21:18:09 ip-172-30-0-34 ProcessMonitor[1245]: 09.569586 |0|0|0| D 18 CAL0000: Process location: //home/mariadb-user/mariadb/columnstore/bin/DecomSvr
      Dec 4 21:18:09 ip-172-30-0-34 ProcessMonitor[1245]: 09.571560 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 25 on device DecomSvr
      Dec 4 21:18:09 ip-172-30-0-34 ProcessMonitor[1245]: 09.575237 |0|0|0| D 18 CAL0000: Pkill Process just to make sure: DecomSvr*
      Dec 4 21:18:09 ip-172-30-0-34 ProcessMonitor[1245]: 09.575289 |0|0|0| D 18 CAL0000: StatusUpdate of Process DecomSvr State = 3 PID = 0
      Dec 4 21:18:09 ip-172-30-0-34 ProcessMonitor[1245]: 09.577098 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/DecomSvr State = AUTO_OFFLINE
      Dec 4 21:18:09 ip-172-30-0-34 ProcessMonitor[1245]: 09.577138 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/DecomSvr State = AUTO_OFFLINE PID = 0
      Dec 4 21:18:09 ip-172-30-0-34 ProcessMonitor[1245]: 09.587201 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/DecomSvr State = AUTO_INIT
      Dec 4 21:18:09 ip-172-30-0-34 ProcessMonitor[1245]: 09.587243 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/DecomSvr State = AUTO_INIT PID = 0
      Dec 4 21:18:09 ip-172-30-0-34 ProcessMonitor[1245]: 09.590043 |0|0|0| D 18 CAL0000: Send SET Alarm ID 13 on device DecomSvr
      Dec 4 21:18:09 ip-172-30-0-34 ProcessMonitor[1245]: 09.596244 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 27 on device DBRM
      Dec 4 21:18:11 ip-172-30-0-34 ProcessMonitor[1245]: 11.594404 |0|0|0| D 18 CAL0000: StatusUpdate of Process DecomSvr State = 4 PID = 18021
      Dec 4 21:18:11 ip-172-30-0-34 ProcessMonitor[1245]: 11.601588 |0|0|0| E 18 CAL0000: EXCEPTION ERROR on setProcessStatus: Caught unknown exception!
      Dec 4 21:18:11 ip-172-30-0-34 ProcessMonitor[1245]: 11.601659 |0|0|0| D 18 CAL0000: StatusUpdate of Process DecomSvr State = 21 PID = 18021
      Dec 4 21:18:11 ip-172-30-0-34 ProcessMonitor[1245]: 11.607957 |0|0|0| E 18 CAL0000: EXCEPTION ERROR on setProcessStatus: Caught unknown exception!
      Dec 4 21:18:11 ip-172-30-0-34 ProcessMonitor[1245]: 11.608004 |0|0|0| D 18 CAL0000: DecomSvr PID is 18021
      Dec 4 21:18:11 ip-172-30-0-34 ProcessMonitor[1245]: 11.608091 |0|0|0| D 18 CAL0000: Inform Process Mgr that process was restarted: DecomSvr
      Dec 4 21:18:11 ip-172-30-0-34 ProcessMonitor[1245]: 11.608119 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 13 on device DecomSvr
      Dec 4 21:18:11 ip-172-30-0-34 ProcessMonitor[1245]: 11.608767 |0|0|0| I 18 CAL0000: Calpont Process DecomSvr restarted successfully!!
      Dec 4 21:18:11 ip-172-30-0-34 ProcessMonitor[1245]: 11.614615 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 25 on device DecomSvr
      Dec 4 21:18:11 ip-172-30-0-34 ProcessMonitor[1245]: 11.621288 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 21 on device DecomSvr
      Dec 4 21:18:11 ip-172-30-0-34 ProcessManager[4005]: 11.668342 |0|0|0| I 17 CAL0000: MSG RECEIVED: Process Restarted on pm1/DecomSvr
      Dec 4 21:18:36 ip-172-30-0-34 ProcessMonitor[1245]: 36.716680 |0|0|0| C 18 CAL0000: DecomSvr/18021 failed to init in 20 seconds, force killing it so it can restart
      Dec 4 21:18:36 ip-172-30-0-34 ProcessMonitor[1245]: 36.716722 |0|0|0| D 18 CAL0000: Send SET Alarm ID 25 on device DecomSvr
      Dec 4 21:18:41 ip-172-30-0-34 ProcessMonitor[1245]: 41.747118 |0|0|0| C 18 CAL0000: *****Calpont Process Restarting: DecomSvr, old PID = 18021
      Dec 4 21:18:41 ip-172-30-0-34 ProcessMonitor[1245]: 41.747227 |0|0|0| D 18 CAL0000: STOPPING Process: DecomSvr
      Dec 4 21:18:41 ip-172-30-0-34 ProcessMonitor[1245]: 41.747272 |0|0|0| D 18 CAL0000: StatusUpdate of Process DecomSvr State = 1 PID = 0
      Dec 4 21:18:41 ip-172-30-0-34 ProcessMonitor[1245]: 41.747315 |0|0|0| D 18 CAL0000: Send SET Alarm ID 13 on device DecomSvr
      Dec 4 21:18:41 ip-172-30-0-34 ProcessMonitor[1245]: 41.754309 |0|0|0| D 18 CAL0000: Pkill Process just to make sure: DecomSvr*
      Dec 4 21:18:41 ip-172-30-0-34 ProcessMonitor[1245]: 41.754361 |0|0|0| D 18 CAL0000: STARTING Process: DecomSvr
      Dec 4 21:18:41 ip-172-30-0-34 ProcessMonitor[1245]: 41.754400 |0|0|0| D 18 CAL0000: Process location: //home/mariadb-user/mariadb/columnstore/bin/DecomSvr
      Dec 4 21:18:41 ip-172-30-0-34 ProcessMonitor[1245]: 41.756584 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 25 on device DecomSvr
      Dec 4 21:18:41 ip-172-30-0-34 ProcessMonitor[1245]: 41.760057 |0|0|0| D 18 CAL0000: Pkill Process just to make sure: DecomSvr*
      Dec 4 21:18:41 ip-172-30-0-34 ProcessMonitor[1245]: 41.760108 |0|0|0| D 18 CAL0000: StatusUpdate of Process DecomSvr State = 3 PID = 0
      Dec 4 21:18:41 ip-172-30-0-34 ProcessMonitor[1245]: 41.762082 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/DecomSvr State = AUTO_OFFLINE
      Dec 4 21:18:41 ip-172-30-0-34 ProcessMonitor[1245]: 41.762127 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/DecomSvr State = AUTO_OFFLINE PID = 0
      Dec 4 21:18:41 ip-172-30-0-34 ProcessMonitor[1245]: 41.763604 |0|0|0| D 18 CAL0000: Send SET Alarm ID 13 on device DecomSvr
      Dec 4 21:18:41 ip-172-30-0-34 ProcessMonitor[1245]: 41.771972 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/DecomSvr State = AUTO_INIT
      Dec 4 21:18:41 ip-172-30-0-34 ProcessMonitor[1245]: 41.772017 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/DecomSvr State = AUTO_INIT PID = 0
      Dec 4 21:18:41 ip-172-30-0-34 ProcessMonitor[1245]: 41.780036 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 27 on device DBRM
      Dec 4 21:18:43 ip-172-30-0-34 ProcessMonitor[1245]: 43.779175 |0|0|0| D 18 CAL0000: StatusUpdate of Process DecomSvr State = 4 PID = 18691
      Dec 4 21:18:43 ip-172-30-0-34 ProcessMonitor[1245]: 43.792351 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/DecomSvr State = ACTIVE
      Dec 4 21:18:43 ip-172-30-0-34 ProcessMonitor[1245]: 43.792429 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/DecomSvr State = ACTIVE PID = 18691
      Dec 4 21:18:43 ip-172-30-0-34 ProcessMonitor[1245]: 43.792446 |0|0|0| D 18 CAL0000: StatusUpdate of Process DecomSvr State = 21 PID = 18691
      Dec 4 21:18:43 ip-172-30-0-34 ProcessMonitor[1245]: 43.804486 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/DecomSvr State = PID_UPDATE
      Dec 4 21:18:43 ip-172-30-0-34 ProcessMonitor[1245]: 43.804553 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/DecomSvr State = PID_UPDATE PID = 18691
      Dec 4 21:18:43 ip-172-30-0-34 ProcessMonitor[1245]: 43.804569 |0|0|0| D 18 CAL0000: DecomSvr PID is 18691
      Dec 4 21:18:43 ip-172-30-0-34 ProcessMonitor[1245]: 43.804665 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 21 on device DecomSvr
      Dec 4 21:18:43 ip-172-30-0-34 ProcessMonitor[1245]: 43.804668 |0|0|0| D 18 CAL0000: Inform Process Mgr that process was restarted: DecomSvr
      Dec 4 21:18:43 ip-172-30-0-34 ProcessMonitor[1245]: 43.805249 |0|0|0| I 18 CAL0000: Calpont Process DecomSvr restarted successfully!!
      Dec 4 21:18:43 ip-172-30-0-34 ProcessMonitor[1245]: 43.811381 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 13 on device DecomSvr
      Dec 4 21:18:43 ip-172-30-0-34 ProcessMonitor[1245]: 43.820111 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 25 on device DecomSvr
      Dec 4 21:18:43 ip-172-30-0-34 ProcessManager[4005]: 43.862092 |0|0|0| I 17 CAL0000: MSG RECEIVED: Process Restarted on pm1/DecomSvr
      [root@ip-172-30-0-34 columnstore]#

      Attachments

        Activity

          People

            hill David Hill (Inactive)
            hill David Hill (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.