Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-891

Get "Could not get a ExeMgr connection." - restart did not clear the error

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Cannot Reproduce
    • 1.0.10
    • Icebox
    • ExeMgr
    • None
    • Virtual Box Version 5.1.26 r117224 (Qt5.6.2) - Running on OS X
      Linux ubuntu-03 4.9.13

    Description

      Problem

      When connecting via a client, get the following error:

      ERROR 1815 (HY000): Internal error: IDB-2004: Cannot connect to ExeMgr.
      

      Restarting the ColumnStore via mcsadmin did not clear this error.

      mcsadmin> getSystemStatus
      getsystemstatus   Fri Aug 18 23:28:07 2017
       
      System columnstore-1
       
      System and Module statuses
       
      Component     Status                       Last Status Change
      ------------  --------------------------   ------------------------
      System        ACTIVE                       Fri Aug 18 23:26:37 2017
       
      Module um1    MAN_OFFLINE                  Fri Aug 18 22:03:34 2017
      Module um2    MAN_OFFLINE                  Fri Aug 18 22:03:37 2017
      Module pm1    MAN_INIT                     Fri Aug 18 22:03:40 2017
      Module pm2    MAN_OFFLINE                  Fri Aug 18 22:03:40 2017
       
      Active Parent OAM Performance Module is 'pm1'
      Primary Front-End MariaDB ColumnStore Module is 'um1'
      Local Query Feature is enabled
      MariaDB ColumnStore Replication Feature is enabled
      

      mcsadmin> getProcessStatus
      getprocessstatus   Fri Aug 18 23:28:28 2017
       
      MariaDB ColumnStore Process statuses
       
      Process             Module    Status            Last Status Change        Process ID
      ------------------  ------    ---------------   ------------------------  ----------
      ProcessMonitor      um1       ACTIVE            Wed Aug 16 16:34:05 2017        1099
      ServerMonitor       um1       FAILED            Fri Aug 18 23:19:42 2017        6793
      DBRMWorkerNode      um1       ACTIVE            Fri Aug 18 23:19:22 2017        6825
      ExeMgr              um1       MAN_OFFLINE       Fri Aug 18 23:18:58 2017
      DDLProc             um1       MAN_OFFLINE       Fri Aug 18 23:18:58 2017
      DMLProc             um1       MAN_OFFLINE       Fri Aug 18 23:18:58 2017
      mysqld              um1       ACTIVE            Fri Aug 18 23:19:25 2017        6752
       
      ProcessMonitor      um2       ACTIVE            Wed Aug 16 16:34:05 2017        1084
      ServerMonitor       um2       FAILED            Fri Aug 18 23:19:41 2017       24821
      DBRMWorkerNode      um2       ACTIVE            Fri Aug 18 23:19:29 2017       24853
      ExeMgr              um2       MAN_OFFLINE       Fri Aug 18 23:19:01 2017
      DDLProc             um2       MAN_OFFLINE       Fri Aug 18 23:19:01 2017
      DMLProc             um2       MAN_OFFLINE       Fri Aug 18 23:19:01 2017
      mysqld              um2       ACTIVE            Fri Aug 18 23:19:31 2017       24775
       
      ProcessMonitor      pm1       ACTIVE            Wed Aug 16 16:33:55 2017        1267
      ProcessManager      pm1       ACTIVE            Wed Aug 16 16:34:01 2017        1554
      DBRMControllerNode  pm1       ACTIVE            Fri Aug 18 23:19:19 2017       23270
      ServerMonitor       pm1       FAILED            Fri Aug 18 23:19:49 2017       23750
      DBRMWorkerNode      pm1       ACTIVE            Fri Aug 18 23:19:21 2017       23353
      DecomSvr            pm1       ACTIVE            Fri Aug 18 23:19:25 2017       23525
      PrimProc            pm1       ACTIVE            Fri Aug 18 23:19:27 2017       23601
      ExeMgr              pm1       ACTIVE            Fri Aug 18 23:19:48 2017       26317
      WriteEngineServer   pm1       ACTIVE            Fri Aug 18 23:19:51 2017       26501
      mysqld              pm1       ACTIVE            Fri Aug 18 23:19:48 2017       23024
       
      ProcessMonitor      pm2       ACTIVE            Wed Aug 16 16:34:11 2017        1069
      ProcessManager      pm2       HOT_STANDBY       Fri Aug 18 23:19:12 2017       19507
      DBRMControllerNode  pm2       COLD_STANDBY      Fri Aug 18 23:19:30 2017
      ServerMonitor       pm2       FAILED            Fri Aug 18 23:19:53 2017       19919
      DBRMWorkerNode      pm2       ACTIVE            Fri Aug 18 23:19:38 2017       19974
      DecomSvr            pm2       ACTIVE            Fri Aug 18 23:19:42 2017       19990
      PrimProc            pm2       ACTIVE            Fri Aug 18 23:19:44 2017       20024
      ExeMgr              pm2       ACTIVE            Fri Aug 18 23:19:48 2017       20053
      WriteEngineServer   pm2       ACTIVE            Fri Aug 18 23:19:53 2017       20075
      mysqld              pm2       ACTIVE            Fri Aug 18 23:19:33 2017       19770
      

      Reproduce

      The environment was created as follows
      1. VirtualBox (2GB RAM assigned)
      2. Docker Containers for 2 x UM and 2 x PM running Ubuntu

      The customer was up and running, data loaded via cpimport.

      We wanted to visualize the data, so ran MetaBase in another container via

      docker run -d -p 3000:3000 --name metabase metabase/metabase
      

      3. After MetaBase container has started, connect via port 3000 and connect to the ColumnStore cluster

      4. During the connection to the ColumnSTore cluster, the cluster started to report the above error. Clearly something in the startup process or meta data gathering caused ColumnStore to fail

      We built the cluster again and reproduced this a second time.

      Solution

      • gracefully fail and not leave the cluster compromised
      • provide workaround to clear the error

      Workaround

      None. We had to rebuild the ColumnStore cluster.

      Attachments

        1. pm1_configReport.txt
          4 kB
          Daniel Jackman
        2. pm1_hardwareReport.txt
          7 kB
          Daniel Jackman
        3. pm1_logReport.tar.gz
          2 kB
          Daniel Jackman
        4. pm1_resourceReport.txt
          3 kB
          Daniel Jackman
        5. pm1_softwareReport.txt
          0.8 kB
          Daniel Jackman
        6. pm2_configReport.txt
          0.4 kB
          Daniel Jackman
        7. pm2_logReport.tar.gz
          1 kB
          Daniel Jackman
        8. pm2_resourceReport.txt
          2 kB
          Daniel Jackman
        9. pm2_softwareReport.txt
          1 kB
          Daniel Jackman
        10. um1_configReport.txt
          35 kB
          Daniel Jackman
        11. um1_dbmsReport.txt
          28 kB
          Daniel Jackman
        12. um1_hardwareReport.txt
          7 kB
          Daniel Jackman
        13. um1_logReport.tar.gz
          1 kB
          Daniel Jackman
        14. um1_logReport.txt
          0.7 kB
          Daniel Jackman
        15. um1_mysqllogReport.tar.gz
          5 kB
          Daniel Jackman
        16. um1_resourceReport.txt
          3 kB
          Daniel Jackman
        17. um1_softwareReport.txt
          1 kB
          Daniel Jackman
        18. um2_configReport.txt
          3 kB
          Daniel Jackman
        19. um2_hardwareReport.txt
          7 kB
          Daniel Jackman
        20. um2_logReport.tar.gz
          1 kB
          Daniel Jackman
        21. um2_resourceReport.txt
          2 kB
          Daniel Jackman
        22. um2_softwareReport.txt
          1 kB
          Daniel Jackman

        Activity

          People

            Unassigned Unassigned
            danoj Daniel Jackman (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.