[MCOL-3589] Taken down module status doesn't propagate to OAM / show in mcsadmin getsystemstatus Created: 2019-11-05  Updated: 2023-10-26  Resolved: 2019-11-20

Status: Closed
Project: MariaDB ColumnStore
Component/s: ?
Affects Version/s: 1.4.0
Fix Version/s: 1.4.1

Type: Bug Priority: Critical
Reporter: Jens Röwekamp (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: SkySQLMVP
Environment:

Multi node ColumnStore 1.4.0 - with and without storage manager

Git version engine: 1f47534


Issue Links:
Duplicate
is duplicated by MCOL-3564 um1 can't recover from `columnstore s... Closed

 Description   

In a 1UM xPM setup, the ColumnStore service gets stopped on PM2 through /usr/local/mariadb/columnstore/bin/columnstore stop.
After that UM1 and PM1 (which is the parent OAM module) still see PM2 as ACTIVE through mcsadmin getsystemstatus instead of DOWN.
The system state is further still ACTIVE instead of DEGRADED.

How to reproduce:

[root@cs-test-mdb-cs-pm-module-0 /]# mcsadmin getsystemstatus
getsystemstatus   Tue Nov  5 17:44:41 2019
 
System columnstore-1
 
System and Module statuses
 
Component     Status                       Last Status Change
------------  --------------------------   ------------------------
System        ACTIVE                       Tue Nov  5 17:42:48 2019
 
Module um1    ACTIVE                       Tue Nov  5 17:42:44 2019
Module pm1    ACTIVE                       Tue Nov  5 17:42:25 2019
Module pm2    ACTIVE                       Tue Nov  5 17:42:33 2019
 
Active Parent OAM Performance Module is 'pm1'
[root@cs-test-mdb-cs-pm-module-0 /]#

[root@cs-test-mdb-cs-um-module-0 /]# mcsadmin getsystemstatus
 
WARNING: running on non Parent OAM Module, can't make configuration changes in this session.
         Access Console from 'pm1' if you need to make changes.
 
getsystemstatus   Tue Nov  5 17:45:23 2019
 
System columnstore-1
 
System and Module statuses
 
Component     Status                       Last Status Change
------------  --------------------------   ------------------------
System        ACTIVE                       Tue Nov  5 17:42:48 2019
 
Module um1    ACTIVE                       Tue Nov  5 17:42:44 2019
Module pm1    ACTIVE                       Tue Nov  5 17:42:25 2019
Module pm2    ACTIVE                       Tue Nov  5 17:42:33 2019
 
Active Parent OAM Performance Module is 'pm1'
[root@cs-test-mdb-cs-um-module-0 /]# 

[root@cs-test-mdb-cs-pm-module-1 /]# mcsadmin getsystemstatus
 
WARNING: running on non Parent OAM Module, can't make configuration changes in this session.
         Access Console from 'pm1' if you need to make changes.
 
getsystemstatus   Tue Nov  5 17:46:01 2019
 
System columnstore-1
 
System and Module statuses
 
Component     Status                       Last Status Change
------------  --------------------------   ------------------------
System        ACTIVE                       Tue Nov  5 17:42:48 2019
 
Module um1    ACTIVE                       Tue Nov  5 17:42:44 2019
Module pm1    ACTIVE                       Tue Nov  5 17:42:25 2019
Module pm2    ACTIVE                       Tue Nov  5 17:42:33 2019
 
Active Parent OAM Performance Module is 'pm1'
[root@cs-test-mdb-cs-pm-module-1 /]#

[root@cs-test-mdb-cs-pm-module-1 /]# /usr/local/mariadb/columnstore/bin/columnstore stop
Shutting down MariaDB Columnstore Database Platform
[root@cs-test-mdb-cs-pm-module-1 /]#

[root@cs-test-mdb-cs-pm-module-1 /]# mcsadmin getsystemstatus
 
WARNING: running on non Parent OAM Module, can't make configuration changes in this session.
         Access Console from 'pm1' if you need to make changes.
 
getsystemstatus   Tue Nov  5 17:47:08 2019
 
System columnstore-1
 
System and Module statuses
 
Component     Status                       Last Status Change
------------  --------------------------   ------------------------
System        INITIAL
 
 
[root@cs-test-mdb-cs-pm-module-1 /]#

[root@cs-test-mdb-cs-pm-module-0 /]# mcsadmin getsystemstatus
getsystemstatus   Tue Nov  5 17:47:27 2019
 
System columnstore-1
 
System and Module statuses
 
Component     Status                       Last Status Change
------------  --------------------------   ------------------------
System        ACTIVE                       Tue Nov  5 17:42:48 2019
 
Module um1    ACTIVE                       Tue Nov  5 17:42:44 2019
Module pm1    ACTIVE                       Tue Nov  5 17:42:25 2019
Module pm2    ACTIVE                       Tue Nov  5 17:42:33 2019
 
Active Parent OAM Performance Module is 'pm1'
[root@cs-test-mdb-cs-pm-module-0 /]#

[root@cs-test-mdb-cs-um-module-0 /]# mcsadmin getsystemstatus
 
WARNING: running on non Parent OAM Module, can't make configuration changes in this session.
         Access Console from 'pm1' if you need to make changes.
 
getsystemstatus   Tue Nov  5 17:47:41 2019
 
System columnstore-1
 
System and Module statuses
 
Component     Status                       Last Status Change
------------  --------------------------   ------------------------
System        ACTIVE                       Tue Nov  5 17:42:48 2019
 
Module um1    ACTIVE                       Tue Nov  5 17:42:44 2019
Module pm1    ACTIVE                       Tue Nov  5 17:42:25 2019
Module pm2    ACTIVE                       Tue Nov  5 17:42:33 2019
 
Active Parent OAM Performance Module is 'pm1'
[root@cs-test-mdb-cs-um-module-0 /]#



 Comments   
Comment by Andrew Hutchings (Inactive) [ 2019-11-06 ]

Set to 1.4.1 to investigate

Comment by Andrew Hutchings (Inactive) [ 2019-11-12 ]

Confirmed in current 1.4.1 with AWS 2 node combined UM/PM. "columnstore stop" doesn't appear to be doing anything, the node's processes stay up.

Comment by Andrew Hutchings (Inactive) [ 2019-11-12 ]

The columnstore script is missing quite a few parts, looking at the history it has been that way since InfiniDB days. It currently just kills ProcMon and ProcMgr on the node and shuts down the local MariaDB, leaving everything else running and nothing to report status. To fix this we need mcsadmin to have ProcMgr's STOPMODULE implemented as a call and then hook it into this script.

Comment by Andrew Hutchings (Inactive) [ 2019-11-15 ]

For QA:

Ben has added a "stopmodule" command to ColumnStore to stop a specific module. The "columnstore stop" command now uses this.

Comment by Daniel Lee (Inactive) [ 2019-11-20 ]

Build verified: 1.4.1-1

Server
/root/ColumnStore/buildColumnstoreFromGithubSource/server
commit 6cedb671e99038f1a10e0d8504f835aaabed9780
Author: Marko Mäkelä <marko.makela@mariadb.com>

Engine
commit 0f86a3ab14a530183c4fc30b752f8c54c89f13d2
Merge: 7c6a086 2275f4f
Author: benthompson15 <ben.thompson.015@gmail.com>
Date: Tue Nov 19 23:07:31 2019 +0100

mcsadmin> getsystemstatus
getsystemstatus Wed Nov 20 17:34:24 2019

System vagrantTestStack

System and Module statuses

Component Status Last Status Change
------------ -------------------------- ------------------------
System ACTIVE Wed Nov 20 17:15:32 2019

Module um1 ACTIVE Wed Nov 20 17:15:29 2019
Module pm1 ACTIVE Wed Nov 20 17:15:09 2019
Module pm2 MAN_OFFLINE Wed Nov 20 17:32:22 2019

Generated at Thu Feb 08 02:43:51 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.