Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-3995

FAILOVER: DMLProc gets restarted after both system and query are set to be ready

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Minor
    • Resolution: Fixed
    • 1.4.4
    • Icebox
    • DMLProc
    • None

    Description

      Build tested: 1.4.4-1 (Jenkins 20200508)

      Test info:

      OS: centos7
      configuration: 3pm glusterfs

      When the stack is in a ready/normal state, I took of PM3 offline (vagrant halt -f PM3), failover completed. According to the debug.log on PM1, DMLProc got restarted after both "setSystemQueryReady" and "setSystemReady" have been set to 1. That took 4 to 5 seconds in my test system for system to be in ACTIVE state again.

      This could be an issue if the user or a script checks for "setSystemQueryReady" or "setSystemReady" to determine when to resume database operations.

      I waited for 15 minutes and there is no new entries to the debug.log file.

      entries from debug.log on PM1. May 11 17:06:03 localhost ProcessMonitor[4241]: 03.236053 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/DMLProc State = ACTIVE
      May 11 17:06:03 localhost ProcessMonitor[4241]: 03.236116 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/DMLProc State = ACTIVE PID = 26210
      May 11 17:06:03 localhost ProcessMonitor[4241]: 03.236146 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set System State = ACTIVE
      May 11 17:06:04 localhost ProcessManager[4460]: 04.195846 |0|0|0| D 17 CAL0000: restartProcessType: Restart all DMLProc
      May 11 17:06:04 localhost ProcessManager[4460]: 04.203838 |0|0|0| D 17 CAL0000: setPMProcIPs for DDLProc: no update needed
      May 11 17:06:04 localhost ProcessManager[4460]: 04.206111 |0|0|0| D 17 CAL0000: setPMProcIPs for DMLProc: no update needed, exiting function
      May 11 17:06:04 localhost ProcessManager[4460]: 04.229316 |0|0|0| D 17 CAL0000: sendMsgProcMon: Process module pm2
      May 11 17:06:04 localhost ProcessManager[4460]: 04.230269 |0|0|0| D 17 CAL0000: pm2 distributeConfigFile success.
      May 11 17:06:05 localhost ProcessManager[4460]: 05.258996 |0|0|0| D 17 CAL0000: sendMsgProcMon: Process module pm1
      May 11 17:06:05 localhost ProcessMonitor[4241]: 05.264048 |0|0|0| I 18 CAL0000: MSG RECEIVED: Restart process request on DMLProc
      May 11 17:06:05 localhost ProcessMonitor[4241]: 05.264654 |0|0|0| D 18 CAL0000: STOPPING Process: DMLProc
      May 11 17:06:05 localhost ProcessMonitor[4241]: 05.266150 |0|0|0| D 18 CAL0000: StatusUpdate of Process DMLProc State = 0 PID = 0
      May 11 17:06:05 localhost ProcessMonitor[4241]: 05.274368 |0|0|0| D 18 CAL0000: Send SET Alarm ID 21 on device DMLProc
      May 11 17:06:05 localhost ProcessMonitor[4241]: 05.290405 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/DMLProc State = MAN_OFFLINE
      May 11 17:06:05 localhost ProcessMonitor[4241]: 05.290542 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/DMLProc State = MAN_OFFLINE PID = 0
      May 11 17:06:05 localhost ProcessMonitor[4241]: 05.292294 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 25 on device DMLProc
      May 11 17:06:05 localhost ProcessMonitor[4241]: 05.300044 |0|0|0| D 18 CAL0000: Pkill Process just to make sure: DMLProc*
      May 11 17:06:05 localhost ProcessMonitor[4241]: 05.301055 |0|0|0| D 18 CAL0000: checkSpecialProcessState on : DMLProc
      May 11 17:06:05 localhost ProcessMonitor[4241]: 05.301147 |0|0|0| D 18 CAL0000: checkSpecialProcessState status return : 2
      May 11 17:06:05 localhost ProcessMonitor[4241]: 05.302110 |0|0|0| D 18 CAL0000: STARTING Process: DMLProc
      May 11 17:06:05 localhost ProcessMonitor[4241]: 05.302198 |0|0|0| D 18 CAL0000: Process location: /usr/bin/DMLProc
      May 11 17:06:05 localhost ProcessMonitor[4241]: 05.304145 |0|0|0| D 18 CAL0000: Dependent process of WriteEngineServer/pm1 is 4
      May 11 17:06:05 localhost ProcessMonitor[4241]: 05.310126 |0|0|0| D 18 CAL0000: Dependent process of WriteEngineServer/pm2 is 4
      May 11 17:06:05 localhost ProcessMonitor[4241]: 05.315259 |0|0|0| D 18 CAL0000: Dependent process of WriteEngineServer/pm3 is 1
      May 11 17:06:05 localhost ProcessMonitor[4241]: 05.320612 |0|0|0| D 18 CAL0000: Dependent process of DBRMWorkerNode/pm1 is 4
      May 11 17:06:05 localhost ProcessMonitor[4241]: 05.321640 |0|0|0| D 18 CAL0000: Dependent process of DDLProc/pm1 is 4
      May 11 17:06:05 localhost ProcessMonitor[4241]: 05.323067 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 27 on device DBRM
      May 11 17:06:05 localhost ProcessMonitor[4241]: 05.332454 |0|0|0| D 18 CAL0000: Pkill Process just to make sure: DMLProc*
      May 11 17:06:05 localhost ProcessMonitor[4241]: 05.332616 |0|0|0| D 18 CAL0000: StatusUpdate of Process DMLProc State = 2 PID = 0
      May 11 17:06:05 localhost ProcessMonitor[4241]: 05.334010 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/DMLProc State = MAN_INIT
      May 11 17:06:05 localhost ProcessMonitor[4241]: 05.334067 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/DMLProc State = MAN_INIT PID = 0
      May 11 17:06:06 localhost ProcessMonitor[4241]: 06.395273 |0|0|0| D 18 CAL0000: StatusUpdate of Process DMLProc State = 21 PID = 26474
      May 11 17:06:06 localhost ProcessMonitor[4241]: 06.404018 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/DMLProc State = PID_UPDATE
      May 11 17:06:06 localhost ProcessMonitor[4241]: 06.404204 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/DMLProc State = PID_UPDATE PID = 26474
      May 11 17:06:06 localhost ProcessMonitor[4241]: 06.404861 |0|0|0| D 18 CAL0000: DMLProc PID is 26474
      May 11 17:06:06 localhost ProcessMonitor[4241]: 06.408141 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 13 on device DMLProc
      May 11 17:06:06 localhost ProcessMonitor[4241]: 06.409844 |0|0|0| I 18 CAL0000: RESTART: ACK back to ProcMgr, return status = 0
      May 11 17:06:06 localhost ProcessManager[4460]: 06.410425 |0|0|0| D 17 CAL0000: DMLProc process is restarted by request.
      May 11 17:06:06 localhost ProcessManager[4460]: 06.410537 |0|0|0| D 17 CAL0000: restartProcessType: Start ACK received from Process-Monitor, return status = 0
      May 11 17:06:06 localhost ProcessMonitor[4241]: 06.416732 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 21 on device DMLProc
      May 11 17:06:06 localhost ProcessManager[4460]: 06.422331 |0|0|0| D 17 CAL0000: setPMProcIPs for DMLProc: no update needed, exiting function
      May 11 17:06:06 localhost ProcessMonitor[4241]: 06.424366 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 25 on device DMLProc
      May 11 17:06:06 localhost ProcessManager[4460]: 06.435322 |0|0|0| D 17 CAL0000: setSystemQueryReady = 1
      May 11 17:06:06 localhost ProcessManager[4460]: 06.441486 |0|0|0| D 17 CAL0000: setSystemReady = 1
      May 11 17:06:06 localhost ProcessManager[4460]: 06.453221 |0|0|0| D 17 CAL0000: Starting up STANDBY process on module pm2
      May 11 17:06:06 localhost ProcessManager[4460]: 06.453452 |0|0|0| D 17 CAL0000: Set Module pm2 State = 3
      May 11 17:06:06 localhost ProcessMonitor[4241]: 06.454456 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Module pm2 State = AUTO_INIT
      May 11 17:06:06 localhost ProcessManager[4460]: 06.461328 |0|0|0| D 17 CAL0000: sendMsgProcMon: Process module pm2
      May 11 17:06:06 localhost ProcessMonitor[4241]: 06.489270 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm2/mysqld State = ACTIVE
      May 11 17:06:06 localhost ProcessMonitor[4241]: 06.489385 |0|0|0| D 18 CAL0000: statusControl: Set Process pm2/mysqld State = ACTIVE PID = 5190
      May 11 17:06:06 localhost ProcessMonitor[4241]: 06.801228 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set DBroot 2 State = ACTIVE
      May 11 17:06:06 localhost ProcessMonitor[4241]: 06.822063 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm2/DBRMControllerNode State = COLD_STANDBY
      May 11 17:06:06 localhost ProcessMonitor[4241]: 06.822140 |0|0|0| D 18 CAL0000: statusControl: Set Process pm2/DBRMControllerNode State = COLD_STANDBY PID = 0
      May 11 17:06:07 localhost ProcessMonitor[4241]: 07.057937 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm2/DDLProc State = COLD_STANDBY
      May 11 17:06:07 localhost ProcessMonitor[4241]: 07.058119 |0|0|0| D 18 CAL0000: statusControl: Set Process pm2/DDLProc State = COLD_STANDBY PID = 0
      May 11 17:06:07 localhost ProcessMonitor[4241]: 07.077326 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm2/DMLProc State = COLD_STANDBY
      May 11 17:06:07 localhost ProcessMonitor[4241]: 07.077440 |0|0|0| D 18 CAL0000: statusControl: Set Process pm2/DMLProc State = COLD_STANDBY PID = 0
      May 11 17:06:07 localhost ProcessManager[4460]: 07.080933 |0|0|0| D 17 CAL0000: Set Module pm2 State = 4
      May 11 17:06:07 localhost ProcessMonitor[4241]: 07.082815 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Module pm2 State = ACTIVE
      May 11 17:06:07 localhost ProcessManager[4460]: 07.082977 |0|0|0| D 17 CAL0000: pm2 module is started by request.
      May 11 17:06:07 localhost ProcessManager[4460]: 07.101254 |0|0|0| D 17 CAL0000: pingDeviceThread: ACK received from 'pm2' Process-Monitor, return status = 0
      May 11 17:06:07 localhost ProcessMonitor[4241]: 07.499303 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/DMLProc State = BUSY_INIT
      May 11 17:06:07 localhost ProcessMonitor[4241]: 07.499385 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/DMLProc State = BUSY_INIT PID = 26474
      May 11 17:06:07 localhost ProcessMonitor[4241]: 07.499416 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set System State = BUSY_INIT
      May 11 17:06:07 localhost DMLProc[26474]: 07.502570 |0|0|0| I 20 CAL0002: DMLProc starts rollbackAll.
      May 11 17:06:09 localhost joblist[26474]: 09.242350 |0|0|0| E 05 CAL0000: /home/jenkins/workspace/10.4e-RPM-DEV/label/rhel-7/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX_ON_ES_BACKUP_DEBUGSOURCE/storage/columnstore/columnstore/writeengine/client/we_clients.cpp @ 307 Could not connect to pm3_WriteEngineServer: InetStreamSocket::connect: connect() error: No route to host to: InetStreamSocket: sd: 14 inet: 10.0.0.17 port: 8630
      May 11 17:06:09 localhost joblist[26474]: 09.242601 |0|0|0| E 05 CAL0000: /home/jenkins/workspace/10.4e-RPM-DEV/label/rhel-7/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX_ON_ES_BACKUP_DEBUGSOURCE/storage/columnstore/columnstore/writeengine/client/we_clients.cpp @ 307 Could not connect to pm3_WriteEngineServer: InetStreamSocket::connect: connect() error: No route to host to: InetStreamSocket: sd: 15 inet: 10.0.0.17 port: 8630
      May 11 17:06:10 localhost oamcpp[26474]: 10.317852 |0|0|0| E 08 CAL0000: OamCache::checkReload shows state for pm3 as AUTO_DISABLED
      May 11 17:06:10 localhost DMLProc[26474]: 10.333957 |0|0|0| I 20 CAL0002: DMLProc will rollback 0 tables.
      May 11 17:06:10 localhost DMLProc[26474]: 10.351686 |0|0|0| I 20 CAL0002: DMLProc finished rollbackAll.
      May 11 17:06:10 localhost ProcessMonitor[4241]: 10.376650 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set Process pm1/DMLProc State = ACTIVE
      May 11 17:06:10 localhost ProcessMonitor[4241]: 10.376711 |0|0|0| D 18 CAL0000: statusControl: Set Process pm1/DMLProc State = ACTIVE PID = 26474
      May 11 17:06:10 localhost ProcessMonitor[4241]: 10.376741 |0|0|0| D 18 CAL0000: statusControl: REQUEST RECEIVED: Set System State = ACTIVE

      Attachments

        Activity

          People

            Unassigned Unassigned
            dleeyh Daniel Lee (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.