[MCOL-440] startsystem doesn't complete if module server down Created: 2016-12-05  Updated: 2023-10-26  Resolved: 2020-04-15

Status: Closed
Project: MariaDB ColumnStore
Component/s: ?
Affects Version/s: 1.0.5
Fix Version/s: N/A

Type: Bug Priority: Minor
Reporter: David Thompson (Inactive) Assignee: Andrew Hutchings (Inactive)
Resolution: Won't Fix Votes: 0
Labels: None


 Description   

To reproduce on a 1um 2pm server with local disk on pms:

  • System up and running across all 3 servers
  • shutdown um1 server.
  • verify um1 module state is AUTO_DISABLED/DEGRADED in getSystemStatus
  • ma shutdownSystem <pwd>
  • ma startSystem will log:

    startsystem   Mon Dec  5 10:45:54 2016
    startSystem command, 'columnstore' service is down, sending command to
    start the 'columnstore' service on all modules
     
       Module 'um1' is disabled and will not be started
     
       System being started, please wait.........
    

    and spin for a while. I assume that it is still trying to start um1 despite it having detected that it is disabled.

Restarting the um1 server allowed the startsystem to complete and did not require an alterSystem-enableModule um1 which is nice.

In this case the system would not be all that useful but i'd assume it would behave the same in 2um setup with only 1 um module down which would allow for a functioning system.

If the um1 server was really down for a long time then removing the module for um1 (assuming a functional um2) would also help recovery.

However the main point of the bug here is that the system is stating that it won't start um1 but then does anyway and hangs.



 Comments   
Comment by Daniel Lee (Inactive) [ 2019-10-28 ]

Build tested: 1.2.5-1

The issue still exist as described.

Comment by Todd Stoffel (Inactive) [ 2020-04-15 ]

OAM is being deprecated and replaced by an enhanced API and the MaxScale orchestration project.

Generated at Thu Feb 08 02:21:05 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.