Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-3842

1.4.2 centos 7 with gluster setup - pm failover and movePmDbrootConfig fail

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 1.4.2
    • 1.4.4
    • None
    • None
    • centos 7 3pm with gluster
    • 2020-4, 2020-5, 2020-6, 2020-7

    Description

      Reported by customer and reproduce by support (dh).

      Did successfully install of a 3pm combo system on centos 7 with gluster having 3 copies configuration.

      I reproduced these 2 problems, which the customer reported

      1. PM3 failover testing - when PM3 recovered, the dbroot assignment failed to move dbroot 3 back to pm3

      2. tried the manual move and it also failed

      first took pm3 down to see if that works
      After about 3-4 minutes, it made it to a good state

      Component Status Last Status Change
      ------------ -------------------------- ------------------------
      System ACTIVE Thu Feb 27 22:47:37 2020

      Module pm1 ACTIVE Thu Feb 27 22:47:34 2020
      Module pm2 ACTIVE Thu Feb 27 22:47:35 2020
      Module pm3 AUTO_DISABLED/DEGRADED Thu Feb 27 22:44:46 2020

      Active Parent OAM Performance Module is 'pm1'
      Primary Front-End MariaDB ColumnStore Module is 'pm2'
      MariaDB ColumnStore Replication Feature is enabled

      MariaDB ColumnStore Process statuses

      Process Module Status Last Status Change Process ID
      ------------------ ------ --------------- ------------------------ ----------
      ProcessMonitor pm1 ACTIVE Thu Feb 27 22:39:22 2020 16519
      ProcessManager pm1 ACTIVE Thu Feb 27 22:39:28 2020 16860
      DBRMControllerNode pm1 ACTIVE Thu Feb 27 22:46:45 2020 29783
      ServerMonitor pm1 ACTIVE Thu Feb 27 22:40:51 2020 19012
      DBRMWorkerNode pm1 ACTIVE Thu Feb 27 22:46:47 2020 29858
      PrimProc pm1 ACTIVE Thu Feb 27 22:46:55 2020 29986
      ExeMgr pm1 ACTIVE Thu Feb 27 22:47:20 2020 30313
      WriteEngineServer pm1 ACTIVE Thu Feb 27 22:47:08 2020 30148
      DDLProc pm1 ACTIVE Thu Feb 27 22:47:28 2020 30476
      DMLProc pm1 ACTIVE Thu Feb 27 22:47:37 2020 30561
      mysqld pm1 ACTIVE Thu Feb 27 22:47:34 2020 30758

      ProcessMonitor pm2 ACTIVE Thu Feb 27 22:40:28 2020 16217
      ProcessManager pm2 HOT_STANDBY Thu Feb 27 22:42:00 2020 17466
      DBRMControllerNode pm2 COLD_STANDBY Thu Feb 27 22:46:44 2020
      ServerMonitor pm2 ACTIVE Thu Feb 27 22:40:55 2020 16854
      DBRMWorkerNode pm2 ACTIVE Thu Feb 27 22:46:51 2020 19436
      PrimProc pm2 ACTIVE Thu Feb 27 22:46:59 2020 19471
      ExeMgr pm2 ACTIVE Thu Feb 27 22:47:24 2020 19634
      WriteEngineServer pm2 ACTIVE Thu Feb 27 22:47:12 2020 19574
      DDLProc pm2 COLD_STANDBY Thu Feb 27 22:47:30 2020
      DMLProc pm2 COLD_STANDBY Thu Feb 27 22:47:32 2020
      mysqld pm2 ACTIVE Thu Feb 27 22:47:35 2020 19849

      ProcessMonitor pm3 AUTO_OFFLINE Thu Feb 27 22:44:46 2020
      ProcessManager pm3 AUTO_OFFLINE Thu Feb 27 22:44:46 2020
      DBRMControllerNode pm3 AUTO_OFFLINE Thu Feb 27 22:44:46 2020
      ServerMonitor pm3 AUTO_OFFLINE Thu Feb 27 22:44:46 2020
      DBRMWorkerNode pm3 AUTO_OFFLINE Thu Feb 27 22:44:46 2020
      PrimProc pm3 AUTO_OFFLINE Thu Feb 27 22:44:46 2020
      ExeMgr pm3 AUTO_OFFLINE Thu Feb 27 22:44:46 2020
      WriteEngineServer pm3 AUTO_OFFLINE Thu Feb 27 22:44:46 2020
      DDLProc pm3 AUTO_OFFLINE Thu Feb 27 22:44:46 2020
      DMLProc pm3 AUTO_OFFLINE Thu Feb 27 22:44:46 2020
      mysqld pm3 AUTO_OFFLINE Thu Feb 27 22:44:46 2020

      Active Alarm Counts: Critical = 7, Major = 1, Minor = 0, Warning = 0, Info = 0

      mcsadmin> getst
      getstorageconfig Thu Feb 27 22:48:40 2020

      System Storage Configuration

      Performance Module (DBRoot) Storage Type = DataRedundancy
      System Assigned DBRoot Count = 3
      DBRoot IDs assigned to 'pm1' = 1, 3
      DBRoot IDs assigned to 'pm2' = 2
      DBRoot IDs assigned to 'pm3' =

      Data Redundant Configuration

      Copies Per DBroot = 3
      DBRoot #1 has copies on PMs = 1 2 3
      DBRoot #2 has copies on PMs = 1 2 3
      DBRoot #3 has copies on PMs = 1 2 3

      Brought pm3 back up and it failed to bring PM3 back in the system

      mcsadmin> getsystemi
      getsysteminfo Thu Feb 27 22:54:59 2020

      System columnstore-1

      System and Module statuses

      Component Status Last Status Change
      ------------ -------------------------- ------------------------
      System DEGRADED Thu Feb 27 22:54:40 2020

      Module pm1 ACTIVE Thu Feb 27 22:54:32 2020
      Module pm2 ACTIVE Thu Feb 27 22:54:33 2020
      Module pm3 MAN_DISABLED Thu Feb 27 22:53:26 2020

      Active Parent OAM Performance Module is 'pm1'
      Primary Front-End MariaDB ColumnStore Module is 'pm2'
      MariaDB ColumnStore Replication Feature is enabled

      MariaDB ColumnStore Process statuses

      Process Module Status Last Status Change Process ID
      ------------------ ------ --------------- ------------------------ ----------
      ProcessMonitor pm1 ACTIVE Thu Feb 27 22:39:22 2020 16519
      ProcessManager pm1 ACTIVE Thu Feb 27 22:39:28 2020 16860
      DBRMControllerNode pm1 ACTIVE Thu Feb 27 22:53:40 2020 3918
      ServerMonitor pm1 ACTIVE Thu Feb 27 22:40:51 2020 19012
      DBRMWorkerNode pm1 ACTIVE Thu Feb 27 22:53:42 2020 3971
      PrimProc pm1 ACTIVE Thu Feb 27 22:53:50 2020 4117
      ExeMgr pm1 ACTIVE Thu Feb 27 22:54:15 2020 4464
      WriteEngineServer pm1 ACTIVE Thu Feb 27 22:54:04 2020 4302
      DDLProc pm1 ACTIVE Thu Feb 27 22:54:23 2020 4623
      DMLProc pm1 ACTIVE Thu Feb 27 22:54:29 2020 4721
      mysqld pm1 ACTIVE Thu Feb 27 22:54:29 2020 4942

      ProcessMonitor pm2 ACTIVE Thu Feb 27 22:40:28 2020 16217
      ProcessManager pm2 HOT_STANDBY Thu Feb 27 22:42:00 2020 17466
      DBRMControllerNode pm2 COLD_STANDBY Thu Feb 27 22:53:39 2020
      ServerMonitor pm2 ACTIVE Thu Feb 27 22:40:55 2020 16854
      DBRMWorkerNode pm2 ACTIVE Thu Feb 27 22:53:46 2020 21398
      PrimProc pm2 ACTIVE Thu Feb 27 22:53:54 2020 21451
      ExeMgr pm2 ACTIVE Thu Feb 27 22:54:19 2020 21588
      WriteEngineServer pm2 ACTIVE Thu Feb 27 22:54:08 2020 21529
      DDLProc pm2 COLD_STANDBY Thu Feb 27 22:54:25 2020
      DMLProc pm2 COLD_STANDBY Thu Feb 27 22:54:27 2020
      mysqld pm2 ACTIVE Thu Feb 27 22:54:30 2020 21817

      ProcessMonitor pm3 ACTIVE Thu Feb 27 22:51:38 2020 4492
      ProcessManager pm3 AUTO_OFFLINE Thu Feb 27 22:44:46 2020
      DBRMControllerNode pm3 AUTO_OFFLINE Thu Feb 27 22:44:46 2020
      ServerMonitor pm3 AUTO_OFFLINE Thu Feb 27 22:44:46 2020
      DBRMWorkerNode pm3 AUTO_OFFLINE Thu Feb 27 22:44:46 2020
      PrimProc pm3 AUTO_OFFLINE Thu Feb 27 22:44:46 2020
      ExeMgr pm3 AUTO_OFFLINE Thu Feb 27 22:44:46 2020
      WriteEngineServer pm3 AUTO_OFFLINE Thu Feb 27 22:44:46 2020
      DDLProc pm3 AUTO_OFFLINE Thu Feb 27 22:44:46 2020
      DMLProc pm3 AUTO_OFFLINE Thu Feb 27 22:44:46 2020
      mysqld pm3 AUTO_OFFLINE Thu Feb 27 22:44:46 2020

      Active Alarm Counts: Critical = 7, Major = 0, Minor = 0, Warning = 0, Info = 0
      mcsadmin> getst
      getstorageconfig Thu Feb 27 22:55:07 2020

      System Storage Configuration

      Performance Module (DBRoot) Storage Type = DataRedundancy
      System Assigned DBRoot Count = 3
      DBRoot IDs assigned to 'pm1' = 1, 3
      DBRoot IDs assigned to 'pm2' = 2
      DBRoot IDs assigned to 'pm3' =

      Data Redundant Configuration

      Copies Per DBroot = 3
      DBRoot #1 has copies on PMs = 1 2 3
      DBRoot #2 has copies on PMs = 1 2 3
      DBRoot #3 has copies on PMs = 1 2 3

      Then I tried the enable and move and got the same error you did

      mcsadmin> stopsystem
      stopsystem Thu Feb 27 22:56:09 2020

      This command stops the processing of applications on all Modules within the MariaDB ColumnStore System

      Checking for active transactions
      Do you want to proceed: (y or n) [n]: y

      System being stopped now...
      Successful stop of System

      NOTE: These module(s) are DISABLED: pm3

      mcsadmin> altersystem-enablemodule pm3
      altersystem-enablemodule Thu Feb 27 22:56:40 2020

      This command starts the processing of applications on a Module within the MariaDB ColumnStore System
      Do you want to proceed: (y or n) [n]: y

      Enabling Modules
      Successful enable of Modules

      Performance Module(s) Enabled, run movePmDbrootConfig or assignDbrootPmConfig to assign dbroots, if needed

      mcsadmin> movePmDbrootConfig pm1 1 pm3
      movepmdbrootconfig Thu Feb 27 22:57:02 2020

            • movePmDbrootConfig Failed : Can't move dbroot #1

      mcsadmin> movePmDbrootConfig pm1 3 pm3
      movepmdbrootconfig Thu Feb 27 22:57:06 2020

      DBRoot IDs currently assigned to 'pm1' = 1, 3
      DBRoot IDs currently assigned to 'pm3' =

      DBroot IDs being moved, please wait...

            • glusterctl API exception: API Failure return in GLUSTER_ASSIGN API
              FAILURE: Error assigning gluster dbroot# 3 to pm3
            • glusterctl API exception: API Failure return in GLUSTER_ASSIGN API
              FAILURE: Error reassigning gluster dbroot# 3 to pm1
            • manualMovePmDbroot Failed : API Failure
              mcsadmin>

      Attachments

        Issue Links

          Activity

            People

              dleeyh Daniel Lee (Inactive)
              hill David Hill (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.