Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-3308

Cannot move DBRoot to resurrected PM after automatic fail-over

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.2.3
    • Fix Version/s: Icebox
    • Component/s: oam
    • Labels:
      None

      Description

      Newly installed system to test a setup for a prospect,1 UM + 3 PM. 3 DBRoots, initially one per PM. PM1 is the OAM. Turning off PM3 resulted in automatic fail-over so DBRoot3 got attached to PM1 and queries were processed properly. (We only have 1 database with one small testing table.)

      After PM3 was booted again, it came up in MAN_DISABLED state (guess this was expected?).

      To initiate a DBRoot3 move back to PM3, we first had to activate the module with "alterSystem-EnableModule pm3", after which PM3 changed state to MAN_OFFLINE state. To be able to initiate a DBRoot move, we next had to stop the system processing with "stopSystem", after which the whole system state became MAN_OFFLINE.

      Component Status Last Status Change
      ------------ -------------------------- ------------------------
      System MAN_OFFLINE Tue May 14 17:43:37 2019

      Module um1 MAN_OFFLINE Tue May 14 17:43:31 2019
      Module pm1 MAN_OFFLINE Tue May 14 17:43:34 2019
      Module pm2 MAN_OFFLINE Tue May 14 17:43:31 2019
      Module pm3 MAN_OFFLINE Tue May 14 17:44:45 2019

      We then triggered the move (DBRoot3 from PM1 to PM3):

      mcsadmin> movePmDbrootConfig pm1 3 pm3
      movepmdbrootconfig Tue May 14 17:45:07 2019

      DBRoot IDs currently assigned to 'pm1' = 1, 3
      DBRoot IDs currently assigned to 'pm3' =

      DBroot IDs being moved, please wait...

      DBRoot IDs newly assigned to 'pm1' = 1, 3
      DBRoot IDs newly assigned to 'pm3' =

      As can be seen, the DBRoot was not moved. Starting the system was not posisble, because PM3 has no DBRoot attached.

      May 14 17:19:54 p2w1 ProcessManager[12373]: 54.438248 |0|0|0| C 17 CAL0000: startSystemThread failed: Module 'pm3' has no DBRoots assigned to it

      We had to manually disable PM3 in order to start the system, which then came up and began processing queries.

      The error log has no entries related to the movePmDbrootConfig command. The debug logs has some, which seem to suggest that the move was successful. One line stands out (I put it in bold), after the unmountDBRoot for DBRoot3 is sent to the PM1 (correct), the mountDBRoot is sent again to pm1 (?!). Am I missing anything here?

      May 14 17:46:09 p2w1 oamcpp[6898]: 09.591518 |0|0|0| D 08 CAL0000: manualMovePmDbroot: 3 from pm1 to pm3
      May 14 17:46:09 p2w1 oamcpp[6898]: 09.604461 |0|0|0| D 08 CAL0000: mountDBRoot api, umount dbroot3
      May 14 17:46:09 p2w1 ProcessManager[12373]: 09.611753 |0|0|0| I 17 CAL0000: MSG RECEIVED: Unmount dbroot : 3
      May 14 17:46:09 p2w1 ProcessManager[12373]: 09.616267 |0|0|0| D 17 CAL0000: send unmountDBRoot to pm: 3/pm1
      May 14 17:46:09 p2w1 ProcessManager[12373]: 09.616330 |0|0|0| D 17 CAL0000: sendMsgProcMon: Process module pm1
      May 14 17:46:09 p2w1 ProcessMonitor[12243]: 09.616562 |0|0|0| I 18 CAL0000: MSG RECEIVED: Unmount DBRoot: 3
      May 14 17:46:09 p2w1 ProcessMonitor[12243]: 09.621860 |0|0|0| D 18 CAL0000: flushInodeCache successful
      May 14 17:46:09 p2w1 ProcessMonitor[12243]: 09.798351 |0|0|0| I 18 CAL0000: PROCUNMOUNT: ACK back to ProcMgr, status: 0
      May 14 17:46:09 p2w1 ProcessManager[12373]: 09.798408 |0|0|0| I 17 CAL0000: UnMount Completed status: 0
      May 14 17:46:09 p2w1 oamcpp[6898]: 09.804563 |0|0|0| D 08 CAL0000: mountDBRoot api, mount dbroot3
      May 14 17:46:09 p2w1 ProcessManager[12373]: 09.830309 |0|0|0| I 17 CAL0000: MSG RECEIVED: mount dbroot : 3
      May 14 17:46:09 p2w1 ProcessManager[12373]: 09.834871 |0|0|0| D 17 CAL0000: send mountDBRoot to pm: 3/pm1
      May 14 17:46:10 p2w1 ProcessManager[12373]: 10.216917 |0|0|0| I 17 CAL0000: Mount Completed status: 0
      May 14 17:46:10 p2w1 ProcessManager[12373]: 10.224347 |0|0|0| I 17 CAL0000: MSG RECEIVED: Distribute Config File system/Columnstore.xml
      May 14 17:46:10 p2w1 ProcessManager[12373]: 10.224415 |0|0|0| D 17 CAL0000: distributeConfigFile called for system file = Columnstore.xml
      May 14 17:46:10 p2w1 ProcessManager[12373]: 10.301994 |0|0|0| D 17 CAL0000: sendMsgProcMon: Process module um1
      May 14 17:46:10 p2w1 ProcessManager[12373]: 10.302527 |0|0|0| D 17 CAL0000: um1 distributeConfigFile success.
      May 14 17:46:10 p2w1 ProcessManager[12373]: 10.307771 |0|0|0| D 17 CAL0000: sendMsgProcMon: Process module pm2
      May 14 17:46:10 p2w1 ProcessManager[12373]: 10.308426 |0|0|0| D 17 CAL0000: pm2 distributeConfigFile success.
      May 14 17:46:10 p2w1 ProcessManager[12373]: 10.313716 |0|0|0| D 17 CAL0000: sendMsgProcMon: Process module pm3
      May 14 17:46:10 p2w1 ProcessManager[12373]: 10.314540 |0|0|0| D 17 CAL0000: pm3 distributeConfigFile success.
      May 14 17:46:10 p2w1 ProcessManager[12373]: 10.314594 |0|0|0| I 17 CAL0000: Distribute Config File Completed system/Columnstore.xml

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              assen.totin Assen Totin
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.