Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-2215

MCS remained MAN_OFFLINE after OAM restartSystem

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Won't Do
    • 1.2.3
    • N/A
    • N/A
    • None
    • mcs single server, centOS7

    Description

      MCS remained MAN_OFFLINE after OAM restartSystem

      that problem is observed during BulkLoad activities
      done with cpimport ;
      MCS was restarted due to Swap Space usage over Major threashold, perform OAM command restartSystem
      but PrimProc,ExeMgr,WriteEngineServer,DDLProc,DMLProc remained MAN_OFFLINE

      the problem is that mcs system does not recover automatically as it's expected and it's needed
      it's needed recover mcs manually ; that will increase significantly mcs system downtime

      note : if try to recover with mcsadmin restart system
      are returned messages about table locks but locks are not returned and operation loop over
      The following tables are locked:
      Could not connect to PMS1: Connection refused
      Could not connect to PMS1: Connection refused
      Could not connect to PMS1: Connection refused
      Could not connect to PMS1: Connection refused

      how to repeat :
      run autopilot concurrency test module

      concurrency.concurCpimport
      concurrency.concurDDL
      concurrency.concurDML
      concurrency.queryAndDataMod  ==>MCS  become MAN_OFFLINE
      

      mcsadmin getsystemi
      getsysteminfo   Wed Mar  6 14:51:09 2019
       
      System columnstore-1
       
      System and Module statuses
       
      Component     Status                       Last Status Change
      ------------  --------------------------   ------------------------
      System        ACTIVE                       Tue Mar  5 20:53:32 2019
       
      Module pm1    FAILED                       Tue Mar  5 20:53:26 2019
       
       
      MariaDB ColumnStore Process statuses
       
      Process             Module    Status            Last Status Change        Process ID
      ------------------  ------    ---------------   ------------------------  ----------
      ProcessMonitor      pm1       ACTIVE            Fri Mar  1 14:14:24 2019        5616
      ProcessManager      pm1       ACTIVE            Fri Mar  1 14:14:30 2019        5771
      DBRMControllerNode  pm1       ACTIVE            Tue Mar  5 20:53:08 2019       19745
      ServerMonitor       pm1       FAILED            Tue Mar  5 20:53:39 2019       21541
      DBRMWorkerNode      pm1       ACTIVE            Tue Mar  5 20:53:10 2019       20022
      PrimProc            pm1       MAN_OFFLINE       Tue Mar  5 20:52:54 2019
      ExeMgr              pm1       MAN_OFFLINE       Tue Mar  5 20:52:54 2019
      WriteEngineServer   pm1       MAN_OFFLINE       Tue Mar  5 20:52:54 2019
      DDLProc             pm1       MAN_OFFLINE       Tue Mar  5 20:52:54 2019
      DMLProc             pm1       MAN_OFFLINE       Tue Mar  5 20:52:53 2019
      mysqld              pm1       ACTIVE            Tue Mar  5 20:53:06 2019       19397
       
      Active Alarm Counts: Critical = 1, Major = 3, Minor = 6, Warning = 0, Info = 0
      [root@cps queryAndDataMod]#
      
      

      Mar  5 20:48:44 cps1 cpimport.bin[14506]: 44.948994 |0|0|0| I 34 CAL0082: End BulkLoad: JobId-35369; status-SUCCESS
      Mar  5 20:48:45 cps1 cpimport.bin[15877]: 45.021926 |0|0|0| I 34 CAL0086: Initiating BulkLoad: -P pm1-15877 -u7a3e6085-d0c2-4cc9-a401-3d4f7b9f018d mytest lineitem /root/autopilot__1_2_3/data/source/dbt3/10g/lineitem.tbl
      Mar  5 20:48:45 cps1 cpimport.bin[15877]: 45.078612 |0|0|0| I 34 CAL0081: Start BulkLoad: JobId-35369; db-mytest
      Mar  5 20:48:52 cps1 ServerMonitor[8828]: 52.779394 |0|0|0| I 09 CAL0000: Swap above Minor Memory threshold with a percentage of  79
      Mar  5 20:49:52 cps1 ServerMonitor[8828]: 52.783352 |0|0|0| I 09 CAL0000: Swap above Minor Memory threshold with a percentage of  79
      Mar  5 20:50:40 cps1 cpimport.bin[15877]: 40.895053 |0|0|0| I 34 CAL0083: BulkLoad: JobId-35369; finished loading table mytest.lineitem; 59986052 rows inserted
      Mar  5 20:50:40 cps1 writeengine[15877]: 40.895091 |0|0|0| I 19 CAL0008: Bulkload |Job: /usr/local/mariadb/columnstore/data/bulk/tmpjob/35369_D20190305_T204845_S059001_Job_35369.xml |For table mytest.lineitem: 59986052 rows processed and 59986052 rows inserted.
      Mar  5 20:50:41 cps1 cpimport.bin[15877]: 41.332351 |0|0|0| I 34 CAL0082: End BulkLoad: JobId-35369; status-SUCCESS
      Mar  5 20:50:41 cps1 cpimport.bin[17304]: 41.405682 |0|0|0| I 34 CAL0086: Initiating BulkLoad: -P pm1-17304 -u9d4f1717-3b9a-4008-b16f-77b03c5849a6 mytest lineitem /root/autopilot__1_2_3/data/source/dbt3/10g/lineitem.tbl
      Mar  5 20:50:41 cps1 cpimport.bin[17304]: 41.460399 |0|0|0| I 34 CAL0081: Start BulkLoad: JobId-35369; db-mytest
      Mar  5 20:50:52 cps1 ServerMonitor[8828]: 52.787134 |0|0|0| I 09 CAL0000: Swap above Minor Memory threshold with a percentage of  79
      Mar  5 20:51:52 cps1 ServerMonitor[8828]: 52.790870 |0|0|0| I 09 CAL0000: Swap above Minor Memory threshold with a percentage of  79
      Mar  5 20:52:34 cps1 cpimport.bin[17304]: 34.705426 |0|0|0| I 34 CAL0083: BulkLoad: JobId-35369; finished loading table mytest.lineitem; 59986052 rows inserted
      Mar  5 20:52:34 cps1 writeengine[17304]: 34.705466 |0|0|0| I 19 CAL0008: Bulkload |Job: /usr/local/mariadb/columnstore/data/bulk/tmpjob/35369_D20190305_T205041_S441865_Job_35369.xml |For table mytest.lineitem: 59986052 rows processed and 59986052 rows inserted.
      Mar  5 20:52:34 cps1 cpimport.bin[17304]: 34.732453 |0|0|0| I 34 CAL0082: End BulkLoad: JobId-35369; status-SUCCESS
      Mar  5 20:52:34 cps1 cpimport.bin[18659]: 34.805807 |0|0|0| I 34 CAL0086: Initiating BulkLoad: -P pm1-18659 -u6b6d16e2-c008-4e73-9cd4-c49ce02f98a4 mytest lineitem /root/autopilot__1_2_3/data/source/dbt3/10g/lineitem.tbl
      Mar  5 20:52:34 cps1 cpimport.bin[18659]: 34.863442 |0|0|0| I 34 CAL0081: Start BulkLoad: JobId-35369; db-mytest
      Mar  5 20:52:52 cps1 ServerMonitor[8828]: 52.796726 |0|0|0| I 09 CAL0000: Swap above Major Memory threshold with a percentage of  80
      Mar  5 20:52:52 cps1 ServerMonitor[8828]: 52.798578 |0|0|0| I 09 CAL0000: Swap  usage at percentage of  80 , Alarm set:  11
      Mar  5 20:52:52 cps1 ServerMonitor[8828]: 52.798699 |0|0|0| C 09 CAL0000: Swap Space usage over Major threashold, perform OAM command  restartSystem
      Mar  5 20:52:52 cps1 ProcessManager[5771]: 52.921227 |0|0|0| I 17 CAL0000: MSG RECEIVED: Restart System request...
      Mar  5 20:52:52 cps1 ProcessManager[5771]: 52.922234 |0|0|0| D 17 CAL0000: Set System State = MAN_OFFLINE
      Mar  5 20:52:52 cps1 ProcessMonitor[5616]: 52.947620 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set System State = MAN_OFFLINE
      Mar  5 20:52:52 cps1 ProcessManager[5771]: 52.953978 |0|0|0| D 17 CAL0000: pm1 module is stopped by request.
      Mar  5 20:52:52 cps1 ProcessManager[5771]: 52.954062 |0|0|0| D 17 CAL0000: Set Module pm1 State = 2
      Mar  5 20:52:52 cps1 ProcessMonitor[5616]: 52.954602 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Module pm1 State = MAN_INIT
      Mar  5 20:52:52 cps1 ProcessManager[5771]: 52.956541 |0|0|0| D 17 CAL0000: sendMsgProcMon: Process module pm1
      Mar  5 20:52:52 cps1 ProcessMonitor[5616]: 52.956818 |0|0|0| I 18 CAL0000: MSG RECEIVED: Stop All process request...
      Mar  5 20:52:52 cps1 ProcessMonitor[5616]: 52.957017 |0|0|0| I 18 CAL0000: STOPALL: ACK back to ProcMgr, STATUS_UPDATE only performed
      Mar  5 20:52:52 cps1 ProcessManager[5771]: 52.957440 |0|0|0| D 17 CAL0000: pm1 module is stopped by request.
      Mar  5 20:52:52 cps1 ProcessManager[5771]: 52.957507 |0|0|0| D 17 CAL0000: Set Module pm1 State = 2
      Mar  5 20:52:52 cps1 ProcessMonitor[5616]: 52.958007 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Module pm1 State = MAN_INIT
      Mar  5 20:52:52 cps1 ProcessManager[5771]: 52.958053 |0|0|0| D 17 CAL0000: sendMsgProcMon: Process module pm1
      Mar  5 20:52:53 cps1 ProcessMonitor[5616]: 53.957394 |0|0|0| I 18 CAL0000: MSG RECEIVED: Stop All process request...
      Mar  5 20:52:53 cps1 ProcessMonitor[5616]: 53.959396 |0|0|0| D 18 CAL0000: STOPPING Process: DMLProc
      Mar  5 20:52:53 cps1 ProcessMonitor[5616]: 53.959528 |0|0|0| D 18 CAL0000: StatusUpdate of Process DMLProc State = 0 PID = 0
      Mar  5 20:52:53 cps1 ProcessMonitor[5616]: 53.959680 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 25 on device DMLProc
      Mar  5 20:52:53 cps1 ProcessMonitor[5616]: 53.963007 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/DMLProc State = MAN_OFFLINE
      Mar  5 20:52:53 cps1 ProcessMonitor[5616]: 53.963055 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/DMLProc State = MAN_OFFLINE PID = 0
      Mar  5 20:52:53 cps1 ProcessMonitor[5616]: 53.963548 |0|0|0| D 18 CAL0000: Send SET Alarm ID 21 on device DMLProc
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.007261 |0|0|0| D 18 CAL0000: Pkill Process just to make sure: DMLProc\*
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.007381 |0|0|0| D 18 CAL0000: STOPPING Process: DDLProc
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.007497 |0|0|0| D 18 CAL0000: StatusUpdate of Process DDLProc State = 0 PID = 0
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.009627 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 25 on device DDLProc
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.011274 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/DDLProc State = MAN_OFFLINE
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.011325 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/DDLProc State = MAN_OFFLINE PID = 0
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.011544 |0|0|0| D 18 CAL0000: Send SET Alarm ID 21 on device DDLProc
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.020483 |0|0|0| D 18 CAL0000: Pkill Process just to make sure: DDLProc\*
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.020570 |0|0|0| D 18 CAL0000: STOPPING Process: WriteEngineServer
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.020681 |0|0|0| D 18 CAL0000: StatusUpdate of Process WriteEngineServer State = 0 PID = 0
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.020832 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 25 on device WriteEngineServer
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.024301 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/WriteEngineServer State = MAN_OFFLINE
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.024385 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/WriteEngineServer State = MAN_OFFLINE PID = 0
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.024625 |0|0|0| D 18 CAL0000: Send SET Alarm ID 21 on device WriteEngineServer
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.034639 |0|0|0| D 18 CAL0000: Pkill Process just to make sure: WriteEngineServ\*
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.034743 |0|0|0| D 18 CAL0000: STOPPING Process: ExeMgr
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.034860 |0|0|0| D 18 CAL0000: StatusUpdate of Process ExeMgr State = 0 PID = 0
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.035014 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 25 on device ExeMgr
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.037896 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/ExeMgr State = MAN_OFFLINE
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.037942 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/ExeMgr State = MAN_OFFLINE PID = 0
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.039418 |0|0|0| D 18 CAL0000: Send SET Alarm ID 21 on device ExeMgr
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.047127 |0|0|0| D 18 CAL0000: Pkill Process just to make sure: ExeMgr\*
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.047236 |0|0|0| D 18 CAL0000: STOPPING Process: PrimProc
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.047349 |0|0|0| D 18 CAL0000: StatusUpdate of Process PrimProc State = 0 PID = 0
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.047512 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 25 on device PrimProc
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.051587 |0|0|0| D 18 CAL0000: Send SET Alarm ID 21 on device PrimProc
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.052857 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/PrimProc State = MAN_OFFLINE
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.052905 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/PrimProc State = MAN_OFFLINE PID = 0
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.060337 |0|0|0| D 18 CAL0000: Pkill Process just to make sure: PrimProc\*
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.060444 |0|0|0| D 18 CAL0000: STOPPING Process: DBRMWorkerNode
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.060558 |0|0|0| D 18 CAL0000: StatusUpdate of Process DBRMWorkerNode State = 0 PID = 0
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.060734 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 25 on device DBRMWorkerNode
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.063894 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/DBRMWorkerNode State = MAN_OFFLINE
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.063973 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/DBRMWorkerNode State = MAN_OFFLINE PID = 0
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.064995 |0|0|0| D 18 CAL0000: Send SET Alarm ID 21 on device DBRMWorkerNode
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.114250 |0|0|0| D 18 CAL0000: Pkill Process just to make sure: workernode\*
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.114391 |0|0|0| D 18 CAL0000: STOPPING Process: ServerMonitor
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.114513 |0|0|0| D 18 CAL0000: StatusUpdate of Process ServerMonitor State = 0 PID = 0
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.114706 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 25 on device ServerMonitor
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.118545 |0|0|0| D 18 CAL0000: Send SET Alarm ID 21 on device ServerMonitor
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.119126 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/ServerMonitor State = MAN_OFFLINE
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.119176 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/ServerMonitor State = MAN_OFFLINE PID = 0
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.128425 |0|0|0| D 18 CAL0000: Pkill Process just to make sure: ServerMonitor\*
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.128535 |0|0|0| D 18 CAL0000: STOPPING Process: DBRMControllerNode
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.128654 |0|0|0| D 18 CAL0000: StatusUpdate of Process DBRMControllerNode State = 0 PID = 0
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.128805 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 25 on device DBRMControllerNode
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.131962 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/DBRMControllerNode State = MAN_OFFLINE
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.132024 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/DBRMControllerNode State = MAN_OFFLINE PID = 0
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.132910 |0|0|0| D 18 CAL0000: Send SET Alarm ID 21 on device DBRMControllerNode
      Mar  5 20:52:54 cps1 ProcessMonitor[5616]: 54.142770 |0|0|0| D 18 CAL0000: Pkill Process just to make sure: controllernode\*
      Mar  5 20:52:54 cps1 joblist[18659]: 54.261420 |0|0|0| C 05 CAL0000: /data/buildbot/bb-worker/centos7/mariadb-columnstore-engine/dbcon/joblist/distributedenginecomm.cpp @ 432 DEC: lost connection to 127.0.0.1
      Mar  5 20:52:56 cps1 ProcessMonitor[5616]: 56.233321 |0|0|0| D 18 CAL0000: BRM reset_locks script run
      Mar  5 20:52:56 cps1 ProcessMonitor[5616]: 56.234507 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/mysqld State = MAN_OFFLINE
      Mar  5 20:52:56 cps1 ProcessMonitor[5616]: 56.234546 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/mysqld State = MAN_OFFLINE PID = 0
      Mar  5 20:52:56 cps1 ProcessMonitor[5616]: 56.565656 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/mysqld State = ACTIVE
      Mar  5 20:52:56 cps1 ProcessMonitor[5616]: 56.565710 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/mysqld State = ACTIVE PID = 8645
      Mar  5 20:52:58 cps1 ProcessMonitor[5616]: 58.457269 |0|0|0| D 18 CAL0000: Stop MySQL Process
      Mar  5 20:52:58 cps1 ProcessMonitor[5616]: 58.457906 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Module pm1 State = MAN_OFFLINE
      Mar  5 20:52:58 cps1 ProcessMonitor[5616]: 58.458159 |0|0|0| D 18 CAL0000: unmountExtraDBroots called
      Mar  5 20:52:58 cps1 ProcessMonitor[5616]: 58.459606 |0|0|0| I 18 CAL0000: STOPALL: ACK back to ProcMgr, return status = 0
      Mar  5 20:52:58 cps1 ProcessManager[5771]: 58.459688 |0|0|0| D 17 CAL0000: pm1 module is successfully stopped.
      Mar  5 20:52:58 cps1 ProcessManager[5771]: 58.461268 |0|0|0| D 17 CAL0000: Running reset_locks
      Mar  5 20:53:00 cps1 ProcessManager[5771]: 00.503752 |0|0|0| D 17 CAL0000: Ran reset_locks
      Mar  5 20:53:00 cps1 ProcessManager[5771]: 00.503807 |0|0|0| D 17 CAL0000: Running DBRM save_brm
      Mar  5 20:53:00 cps1 ProcessManager[5771]: 00.629356 |0|0|0| D 17 CAL0000: Successfully ran DBRM save_brm
      Mar  5 20:53:01 cps1 ProcessManager[5771]: 01.020186 |0|0|0| D 17 CAL0000: Successfully ran DBRM clearShm
      Mar  5 20:53:01 cps1 ProcessManager[5771]: 01.020269 |0|0|0| I 17 CAL0000: RESTARTSYSTEM: ACK received from Process-Monitor for stopModule requests, return status = 0
      Mar  5 20:53:01 cps1 ProcessManager[5771]: 01.020367 |0|0|0| D 17 CAL0000: distributeConfigFile called for system file = Columnstore.xml
      Mar  5 20:53:01 cps1 ProcessManager[5771]: 01.024175 |0|0|0| D 17 CAL0000: startSystemThread launched
      Mar  5 20:53:01 cps1 ProcessManager[5771]: 01.025318 |0|0|0| D 17 CAL0000: Set System State = MAN_INIT
      Mar  5 20:53:01 cps1 ProcessMonitor[5616]: 01.025810 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set System State = MAN_INIT
      Mar  5 20:53:01 cps1 ProcessManager[5771]: 01.037515 |0|0|0| D 17 CAL0000: Set System State = MAN_INIT
      Mar  5 20:53:01 cps1 ProcessMonitor[5616]: 01.038060 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set System State = MAN_INIT
      Mar  5 20:53:01 cps1 ProcessManager[5771]: 01.039232 |0|0|0| D 17 CAL0000: getStandbyModule called
      Mar  5 20:53:01 cps1 ProcessManager[5771]: 01.040409 |0|0|0| D 17 CAL0000: updateWorkerNodeconfig Started
      Mar  5 20:53:01 cps1 ProcessManager[5771]: 01.221603 |0|0|0| D 17 CAL0000: updateWorkerNodeconfig completed
      Mar  5 20:53:01 cps1 ProcessManager[5771]: 01.249727 |0|0|0| D 17 CAL0000: updatePMSconfig Started
      Mar  5 20:53:01 cps1 ProcessManager[5771]: 01.258934 |0|0|0| D 17 CAL0000: updatePMSconfig completed
      Mar  5 20:53:01 cps1 ProcessManager[5771]: 01.260201 |0|0|0| D 17 CAL0000: setPMProcIPs called for pm1
      Mar  5 20:53:01 cps1 ProcessManager[5771]: 01.263919 |0|0|0| D 17 CAL0000: setPMProcIPs for DDLProc: no update needed
      Mar  5 20:53:01 cps1 ProcessManager[5771]: 01.264935 |0|0|0| D 17 CAL0000: setPMProcIPs for DMLProc: no update needed, exiting function
      Mar  5 20:53:01 cps1 ProcessManager[5771]: 01.265022 |0|0|0| D 17 CAL0000: distributeConfigFile called for system file = Columnstore.xml
      Mar  5 20:53:01 cps1 ProcessManager[5771]: 01.269957 |0|0|0| D 17 CAL0000: Start Module pm1
      Mar  5 20:53:01 cps1 ProcessManager[5771]: 01.272772 |0|0|0| D 17 CAL0000: Set Module pm1 State = 2
      Mar  5 20:53:01 cps1 ProcessMonitor[5616]: 01.273284 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Module pm1 State = MAN_INIT
      Mar  5 20:53:01 cps1 ProcessManager[5771]: 01.273681 |0|0|0| D 17 CAL0000: sendMsgProcMon: Process module pm1
      Mar  5 20:53:01 cps1 ProcessMonitor[5616]: 01.274635 |0|0|0| I 18 CAL0000: MSG RECEIVED: Start All process request...
      Mar  5 20:53:01 cps1 ProcessMonitor[5616]: 01.675140 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/mysqld State = MAN_OFFLINE
      Mar  5 20:53:01 cps1 ProcessMonitor[5616]: 01.675188 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/mysqld State = MAN_OFFLINE PID = 0
      Mar  5 20:53:03 cps1 ProcessMonitor[5616]: 03.588103 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/mysqld State = ACTIVE
      Mar  5 20:53:03 cps1 ProcessMonitor[5616]: 03.588165 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/mysqld State = ACTIVE PID = 0
      Mar  5 20:53:03 cps1 ProcessMonitor[5616]: 03.588186 |0|0|0| D 18 CAL0000: createDataDirs called
      Mar  5 20:53:03 cps1 ProcessMonitor[5616]: 03.659938 |0|0|0| D 18 CAL0000: checkDataMount called
      Mar  5 20:53:03 cps1 ProcessMonitor[5616]: 03.668396 |0|0|0| D 18 CAL0000: checkSpecialProcessState on : DBRMControllerNode
      Mar  5 20:53:03 cps1 ProcessMonitor[5616]: 03.668438 |0|0|0| D 18 CAL0000: checkSpecialProcessState status return : 2
      Mar  5 20:53:03 cps1 ProcessMonitor[5616]: 03.668495 |0|0|0| D 18 CAL0000: STARTING Process: DBRMControllerNode
      Mar  5 20:53:03 cps1 ProcessMonitor[5616]: 03.668510 |0|0|0| D 18 CAL0000: Process location: /usr/local/mariadb/columnstore/bin/controllernode
      Mar  5 20:53:03 cps1 ProcessMonitor[5616]: 03.669090 |0|0|0| D 18 CAL0000: Dependent process of ProcessManager/pm1 is 4
      Mar  5 20:53:04 cps1 cpimport.bin[18659]: 04.218878 |0|0|0| E 34 CAL0087: BulkLoad Error: updateDctnryStore: error adding rows to store file for OID-35388; DBRoot-1; part-138; seg-1;  a BRM Allocate extent error. [BRM error status: network error]
      Mar  5 20:53:04 cps1 cpimport.bin[18659]: 04.218990 |0|0|0| E 34 CAL0087: BulkLoad Error: parseDict: error parsing section1:  OID-35385;  a BRM Allocate extent error.
      Mar  5 20:53:04 cps1 cpimport.bin[18659]: 04.219036 |0|0|0| E 34 CAL0087: BulkLoad Error: Bulkload Parse (thread 1) Failed for Table mytest.lineitem during parsing.  Terminating this job.
      Mar  5 20:53:04 cps1 controllernode[18659]: 04.219263 |0|0|0| E 29 CAL0000: DBRM: error: SessionManager::getSystemState() failed (network)
      Mar  5 20:53:05 cps1 controllernode[18659]: 05.219767 |0|0|0| E 29 CAL0000: DBRM: error: SessionManager::getSystemState() failed (network)
      Mar  5 20:53:05 cps1 writeengine[18659]: 05.232724 |0|0|0| I 19 CAL0084: ClearTableLock: Starting bulk rollback for table mytest.lineitem (OID-35369); lock-8211; initiated by cpimport.bin.
      Mar  5 20:53:05 cps1 writeengine[18659]: 05.233177 |0|0|0| E 19 CAL0085: ClearTableLock: Ending bulk rollback for table mytest.lineitem (OID-35369); lock-8211; initiated by cpimport.bin. (rollback failed; Bulk rollback for table mytest.lineitem (OID-35369) not performed; BRM error getting read-write state.).
      Mar  5 20:53:05 cps1 writeengine[18659]: 05.233202 |0|0|0| I 19 CAL0085: ClearTableLock: Ending bulk rollback for table mytest.lineitem (OID-35369); lock-8211; initiated by cpimport.bin. (rollback failed; Bulk rollback for table mytest.lineitem (OID-35369) not performed; BRM error getting read-write state.).
      Mar  5 20:53:05 cps1 cpimport.bin[18659]: 05.233259 |0|0|0| E 34 CAL0087: BulkLoad Error: Error rolling back table mytest.lineitem; Bulk rollback for table mytest.lineitem (OID-35369) not performed; BRM error getting read-write state.
      Mar  5 20:53:05 cps1 cpimport.bin[18659]: 05.233567 |0|0|0| I 34 CAL0082: End BulkLoad: JobId-35369; status-FAILED
      Mar  5 20:53:05 cps1 cpimport.bin[19467]: 05.324721 |0|0|0| I 34 CAL0086: Initiating BulkLoad: -P pm1-19467 -ufc644995-f313-45dc-b1db-abd6284386f6 mytest lineitem /root/autopilot__1_2_3/data/source/dbt3/10g/lineitem.tbl
      Mar  5 20:53:05 cps1 controllernode[19467]: 05.331589 |0|0|0| E 29 CAL0000: DBRM: error: SessionManager::getSystemState() failed (network)
      Mar  5 20:53:05 cps1 cpimport.bin[19467]: 05.331630 |0|0|0| E 34 CAL0087: BulkLoad Error: System is not ready.  Verify that InfiniDB is up and ready before running cpimport.
      Mar  5 20:53:05 cps1 cpimport.bin[19467]: 05.331647 |0|0|0| I 34 CAL0082: End BulkLoad: JobId-0; status-FAILED
      
      

      Attachments

        1. cpimport.log
          497 kB
          Zdravelina Sokolovska
        2. mcs_debug.log
          665 kB
          Zdravelina Sokolovska

        Activity

          People

            Unassigned Unassigned
            winstone Zdravelina Sokolovska (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.